Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 63017 times)

langwadt · « **Reply #275 on:** January 21, 2019, 08:32:21 pm »

Quote from: legacy on January 20, 2019, 12:56:49 pm

So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

FSK, really

that's the free market, you can set your price to what ever you want and people decide if they think it is worth paying that price

Nominal Animal · « **Reply #276 on:** January 21, 2019, 09:04:45 pm »

Incorrect assumptions do tend to be infuriating.

But.. how do you Frequency Shift Key assumptions?

KL27x · « **Reply #277 on:** January 22, 2019, 12:48:01 am »

Quote

So what I find really irritating is the assumption that, because someone wants to release his/her projects for free then we should do the same, and, worse still, I do find even more irritating the assumption that, because Atmel can release something for free, then we should do the same.

This is even more annoying when the person saying it makes $$ from YT videos, and the only creations/inventions they have so generously shared might have been good for some views and hits but ultimately belong only in a dumpster.

cepwin · « **Reply #278 on:** January 23, 2019, 02:21:44 am »

I finally got to watch the videos...very interesting. Cool board too. I also thought his presentation was very clear. My only complaint was that the red for the instruction codes were very hard to read.

David Hess · « **Reply #279 on:** January 23, 2019, 07:18:23 pm »

Quote from: brucehoult on December 31, 2018, 11:50:49 pm

The minor thing wrong with Alpha was the lack of 8 and 16 bit loads and stores. That's -- again -- more of a code size problem than a speed problem, but anyway they fixed it in the 2nd (21164) generation.

A major thing, and this will sound familiar for Itanium, Mill, and perhaps RISC-V, was that Alpha was designed assuming an alternative to out-of-order execution which so far has not been found.

Another serious problem was the weak memory ordering; it seems great in theory but sure makes things difficult. PowerPC and ARM suffer from this as well. But everybody loves to recompile for every new implementation. Right? RIGHT? Hello?

PS - I really hate doing searches on subjects like these and findings posts from 13 years ago (!) where I discussed them.

legacy · « **Reply #280 on:** January 23, 2019, 07:50:48 pm »

the worst effects of memory reordering can only be observed when lock-free programming techniques are used.

brucehoult · « **Reply #281 on:** January 23, 2019, 10:02:22 pm »

Quote from: David Hess on January 23, 2019, 07:18:23 pm

A major thing, and this will sound familiar for Itanium, Mill, and perhaps RISC-V, was that Alpha was designed assuming an alternative to out-of-order execution which so far has not been found.

I don't understand what you mean by this.

Certainly, Itanium and Mill and both something like VLIW with "instructions" containing a number of operations that have been proven to be safe to execute at the same time.

The semantics of RISC-V are that the program must appear to other code on the same core as if the instructions were executed sequentially. I'd have thought Alpha was the same.

Quote

Another serious problem was the weak memory ordering; it seems great in theory but sure makes things difficult. PowerPC and ARM suffer from this as well. But everybody loves to recompile for every new implementation. Right? RIGHT? Hello?

You don't have to recompile for a new implementation if you followed the published memory consistency rules.

If you just hacked things until your program seemed to work then you might need to.

The RISC-V memory model has been developed by a group of industry and academic experts who are very familiar with the problems with the ARM and Alpha models. There is some almost two year old information here: https://riscv.org/2017/04/risc-v-memory-consistency-model/ The experts have since finished their work and the RISC-V memory model has been ratified -- in fact it's the first thing to be ratified, even before the base instruction set.

I think this is a great example of the strength of the RISC-V governance approach. Things take a bit longer than just having half a dozen people in a room at some company have a meeting and make a decision, but it's also much more likely to be correct.

Other good examples are the work of the "Fast Interrupts" working group and the Vector Extension working group.

legacy · « **Reply #282 on:** January 25, 2019, 01:09:06 pm »

Quote from: David Hess on January 23, 2019, 07:18:23 pm

PowerPC and ARM suffer from this as well. But everybody loves to recompile for every new implementation. Right? RIGHT? Hello?

OT:
I have just wasted two weeks of my time at updating { HPPA{2.0BE}, PPC{32BE}, MIPS{32LE, 32BE, 3BE } } Linux stage4's because new packages and libraries need to be of the same C++ ABI.

Code: [Select]


 [4] gcc-4.1.2 (needed for legacy reasons)
 [5] gcc-4.3.6 (needed for legacy reasons)
 [6] gcc-4.4.7 (needed for legacy reasons)
 [7] gcc-4.5.4 (needed for legacy reasons)
 [8] gcc-6.4.0 <----- compiled with this
 [9] gcc-7.3.0 <----- now I am with this

Things compiled by GCC-v6.4.0 are completely screwed up against things compiled by GCC-v7.3.0.
It basically means stage1-3,4 need to be completely wiped and rebuild from scratch

Electricity, effort, time, and money wasted

Digging deeper, I see changes on the C++ ABI, so things compiled by different versions of the compiler cannot work together.

There is a similar reason for being obliged to recompile the "memory barrier" support offered by C++11/$version, since $version_A is different from $version_B, and this is a very serious problem when you switch from a multi-CPUs SMP machine to a multi-core SMP machine

edit: when you update the C++ compiler ... you might see mismatches like this

Code: [Select]

Mismatch between the program and library build versions detected.
The library used 3.0 (wchar_t,compiler with C++ ABI 1010,wx containers,compatible with 2.8),
and your program used 3.0 (wchar_t,compiler with C++ ABI 1011,wx containers,compatible with 2.8).

legacy · « **Reply #283 on:** January 25, 2019, 01:44:41 pm »

Quote from: brucehoult on January 23, 2019, 10:02:22 pm

You don't have to recompile for a new implementation if you followed the published memory consistency rules.

sure, you have to!

powerpc440 and powerpc460 have different instructions for handling memory ordering; besides when your code uses POSIX semaphores to coordinate the beginning and end of each loop, sometimes the code uses asm volatile("" ::: "memory") to prevent compiler reordering (which for sure will make a mess otherwise); sometimes it's ok and enough for the job, but sometimes (usually in multicore SMP environments) this is not enough to avoid confusion so you explicitly need to prevent it with a StoreLoad Barrier instructions *IF* and only *IF* they are available, e.g. volatile("memfence" ::: "memory"); to prevent memory reordering

This one has a different implementation on PowerPC vs PowerPC-embedded: e.g. vs ppc-e500

Nominal Animal · « **Reply #284 on:** January 25, 2019, 05:07:21 pm »

(An aside: the STT_GNU_IFUNC extension to the ELF standard is very useful if you have that kind of variants -- functions that need to be implemented slightly differently, depending on small variations on the processor or architecture. They're pretty easy to use, too; just implement each variant and one resolver function that returns the pointer to the variant to be used. The dynamic linker calls that resolver function at runtime, so there is no extra indirection cost either, just a tiny startup delay cost of running those resolver functions once.)

brucehoult · « **Reply #285 on:** January 26, 2019, 05:35:45 am »

Quote from: legacy on January 25, 2019, 01:44:41 pm

Quote from: brucehoult on January 23, 2019, 10:02:22 pm
You don't have to recompile for a new implementation if you followed the published memory consistency rules.

sure, you have to!

powerpc440 and powerpc460 have different instructions for handling memory ordering; besides when your code uses POSIX semaphores to coordinate the beginning and end of each loop, sometimes the code uses asm volatile("" ::: "memory") to prevent compiler reordering (which for sure will make a mess otherwise); sometimes it's ok and enough for the job, but sometimes (usually in multicore SMP environments) this is not enough to avoid confusion so you explicitly need to prevent it with a StoreLoad Barrier instructions *IF* and only *IF* they are available, e.g. volatile("memfence" ::: "memory"); to prevent memory reordering

This one has a different implementation on PowerPC vs PowerPC-embedded: e.g. vs ppc-e500

The subject was RISC-V, not PowerPC. I fully accept that other architectures have handled these things badly in the past, and we've tried to learn from that. And hopefully done a good job. Time will tell.

If you follow the published RISC-V memory consistency rules then you will never have to recompile software for a new processor. The fence instruction is part of the base instruction set, will be accepted by every CPU, and is a zero or one cycle no-op on CPUs where no action is necessary.

legacy · « **Reply #286 on:** January 26, 2019, 06:20:11 am »

Quote from: brucehoult on January 26, 2019, 05:35:45 am

The subject was RISC-V, not PowerPC. I fully accept that other architectures have handled these things badly in the past

With all the respect, IBM is IBM, a company that has been being fruitful in the computer science since the beginning of the computer science itself, they have done and they are still doing the history of computer science began, whereas RISC-V is ... nothing similar with just a subfraction of the experience and the competence of IBM.

so I trust more what IBM recommends about POWER9: don't expect that all the spec about a family will be written in stones, certain things do change so you'd best assume that things always need to be recompiled (e.g. AIX needs a specific patch media to be installed for operating on POWER9), otherwise be prepared to suffer on your hex-editor.

so my point is: I don't expect that in the loooong time (10 years?) RISC-V will be better than what IBM has done in twenty years of PowerPC experience (from the PPC601 to the PPC970, including several embedded cores 4xx), certain things do change so I am already prepared to accept that I will have to spend time at recompiling things, and I am expecting to have/get (to get = to pay for obtaining it) the source of everything instead of just the binaries (for the reason that price is one-tenth if you don't request the sources).

This is the big mistake I did with power-pc when I purchased only the binaries.

legacy · « **Reply #287 on:** January 26, 2019, 07:17:07 am »

I am finalizing just right now the stage4 for HPPA2. This architecture, made by HP, is the most stable in terms of changes, but the last PA-RISC CPU, PA8900, adds a few changes against the PA8700, and they are related to the "multi-cores" nature of the PA8900 which causes problems in Linux and in HPUX11 (which needs specific patches to be installed) exactly because it's not expected: previous CPUs were capable of multi-CPUs SMP but they are not multicores.

What have I learned from these experiences? I have learned that every processor family has different habits when it comes to memory reordering, and those habits can only be observed in multicore or multiprocessor configurations, and given that multicore is now mainstream, it’s worth assuming that the market should have some familiarity with them, so new products are now developped with multi-cores which offer a certain compatibility with their predecessors, but don't assume that all the processors in a family are all the behave the same way in SMP because they do not because there are many types of memory reordering, and not all types of reordering occur equally often.

It all depends on the processor, on its implementation, and even if you’re targeting and/or the toolchain you’re using for development (e.g. java uses a different approach vs C++11).

This problem is known as "memory model" that tells you, for a given processor && toolchain, exactly what types of memory reordering to expect at runtime relative to a given source code listing. Keep in mind that the effects (and differences) of memory reordering can only be observed when lock-free programming techniques are used.

What I mean is that we have three kinds of memory models:

kind-A: you have CPUs that are ONLY sequentially consistent, and this is the ONLY way they van operate in SMP
kind-B: you might have CPUs that are usually strong, implementing explicit acquire and release, TSO. This usually works in multi-cores SMP at the cost of degrading performances, but .. sometimes it might not work correctly on multi-cores, while it for sure always works in multi-CPUs SMP
kind-C: you might also have multi-cores that are weak with data dependency reordering. This is assumed to be working in multi-CPUs/multi-cores SMP

All of these three are hardware memory model that tells you what kind of memory ordering to expect at runtime relative to an assembly compiler (and here, you can expect other problems with the C compile ... the C is not thread-safe, so you have to correctly tell the compiler what it has to correctly do. C++11 helps al lot about this, C doesn't).

Now, talking about hardware, between both the HPPA and the PowerPC families you find certain members are the kind-B, certain are the kind-C, but in the embedded PowerPC you also find members that are the kind-A (because based on the oldest/simplest/safest/more conservative CPU-model. e.g. military PPCs need redundancy, which needs to be kind-A).

Even the x86/64 should be both the kind-A (in i386 emulation mode) and kind-B.

Besides, on the software side, Java is only kind-A oriented, C++11 default atomic is kind-A, but new the C++11/20xx low level atomic tends to be kind-C.

You cannot say what will happen in the future: new kind-D? new kind-E?

Be prepared

brucehoult · « **Reply #288 on:** January 26, 2019, 11:39:19 pm »

Quote from: legacy on January 26, 2019, 07:17:07 am

What I mean is that we have three kinds of memory models:
kind-A: you have CPUs that are ONLY sequentially consistent, and this is the ONLY way they van operate in SMP
kind-B: you might have CPUs that are usually strong, implementing explicit acquire and release, TSO. This usually works in multi-cores SMP at the cost of degrading performances, but .. sometimes it might not work correctly on multi-cores, while it for sure always works in multi-CPUs SMP
kind-C: you might also have multi-cores that are weak with data dependency reordering. This is assumed to be working in multi-CPUs/multi-cores SMP

All of these three are hardware memory model that tells you what kind of memory ordering to expect at runtime relative to an assembly compiler (and here, you can expect other problems with the C compile ... the C is not thread-safe, so you have to correctly tell the compiler what it has to correctly do. C++11 helps al lot about this, C doesn't).

RISC-V has been designed from the start for kind-C. Every RISC-V CPU has the ordering instructions ("fence") needed by kind-C for C++11 (including acquire and release semantics, and distinguishing memory rom I/O) and Java and C# and other currently-known languages. On low end CPUs that will never have multiple cores "fence" is still recognised as an instruction but is a no-op.

Note that programs written correctly for kind-C systems are guaranteed to run fine on kind-B and kind-A systems, and programs written correctly for kind-B systems are guaranteed to run fine on kind-A systems. As long as the instructions are recognised as valid instructions. Which they are.

The two year old FE310 32 bit RISC-V microcontroller test chip (in the HiFive1) implements the fence instruction, as well as a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics.

RISC-V is by default kind-C, but you can as an extension build a system as TSO (kind-B). That makes programs a little simpler to write correctly, but you can then *only* run those programs on CPUs that implement the TSO extension. Normal programs run fine on TSO too.

Quote

You cannot say what will happen in the future: new kind-D? new kind-E?

Be prepared

New things are always a possibility.

But there is a *lot* of experience of old things that people have done wrong that can be learned from, from microcontrollers up to supercomputers.

What is sad is when people have a few decades of experience available to study and FAIL to learn from it.

legacy · « **Reply #289 on:** January 27, 2019, 02:57:51 pm »

I have read many and I do find this and this books useful concerning multiprocessor programming.

The Art of Multiprocessor Programming, by Maurice Herlihy
C++ Concurrency in Action Practical Multithreading, by Anthony Williams

Nominal Animal · « **Reply #290 on:** January 27, 2019, 04:20:37 pm »

Quote from: brucehoult on January 26, 2019, 11:39:19 pm

a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics

Do you know/remember the maximum data size the LL/SC ops support?

For a C programmer (say, low-level libraries and such), the built-ins (that GCC, clang, and Intel CC all support) are extremely useful, but the variation in the maximum size supported is a bit of a pickle.

Quote from: brucehoult on January 26, 2019, 11:39:19 pm

What is sad is when people have a few decades of experience available to study and FAIL to learn from it.

Yes. It is one thing to not know and stumble, but refusing to learn from others is just baffling.

(I'm not referring to anything in this thread. I only mean I see that way too often in real life, and cannot wrap my head around it. I can see why people repeat mistakes they didn't know about, but scientists and engineers whose entire job description is to build on top of existing knowledge? Weird.)

NorthGuy · « **Reply #291 on:** January 27, 2019, 05:06:36 pm »

Quote from: brucehoult on January 26, 2019, 11:39:19 pm

But there is a *lot* of experience of old things that people have done wrong that can be learned from, from microcontrollers up to supercomputers.

brucehoult · « **Reply #292 on:** January 28, 2019, 02:57:11 pm »

Quote from: Nominal Animal on January 27, 2019, 04:20:37 pm

Quote from: brucehoult on January 26, 2019, 11:39:19 pm
a full set of AMO (Atomic Memory Operation) instructions designed to natively implement C++11 semantics
Do you know/remember the maximum data size the LL/SC ops support?

XLEN.

i.e. 32 bits on a 32 bit machine, and 64 bits on a 64 bit machine.

In particular there is no native double width CAS AMO and no way to make one using LL/SC -- that needs to use an actual lock.

There is significant interest in more powerful lock-free programming primitives, and something will probably happen in the next year or two. I think that's most likely to be an extension to allow something between nested LL/SC and a very restricted STM. It's notable that others have stumbled over STM, so will be good to learn from that.

Nominal Animal · « **Reply #293 on:** January 28, 2019, 08:42:42 pm »

Quote from: brucehoult on January 28, 2019, 02:57:11 pm

In particular there is no native double width CAS AMO and no way to make one using LL/SC -- that needs to use an actual lock.

That is a minor pain with signal handers in C, because the only async-signal safe locking primitive is sem_post(). I end up having to work around it by using a dedicated thread to receive signals via sigwaitinfo(), rather than using signal handlers.

I wonder if anyone is experimenting with cacheline-wide CAS. That is, instead of registers, entire cache lines are compared and atomically swapped. Since partial address tags have already proven to be a security risk, it seems to me that swapping just the cacheline address tags, might work. Even without CAS, an atomic cacheline swap would be useful for atomic structure updates.

legacy · « **Reply #294 on:** January 29, 2019, 12:22:38 am »

I was googling for microdrive (micro harddrive), and I found risc-v, LOL

(by westerndigital)

brucehoult · « **Reply #295 on:** January 29, 2019, 01:09:43 am »

Quote from: legacy on January 29, 2019, 12:22:38 am

I was googling for microdrive (micro harddrive), and I found risc-v, LOL
(by westerndigital)

What exactly is it you find amusing there?

westfw · « **Reply #296 on:** January 29, 2019, 07:08:09 am »

Quote

I finally got to watch the videos...very interesting. I also thought his presentation was very clear.

Yes, me too! (finally actually watched the videos.)It was a really well-done intro to setting up an embedded development environment and writing your first simple program. But, based on the discussion that's popped up here, I was expecting a lot more detail about the RISC-V instruction set itself. The videos were pretty "generic."

legacy · « **Reply #297 on:** January 29, 2019, 09:58:27 am »

Quote from: brucehoult on January 29, 2019, 01:09:43 am

What exactly is it you find amusing there?

It's nice to see that even Western Digital has an interest for RISC-V.

brucehoult · « **Reply #298 on:** January 29, 2019, 11:45:37 am »

Quote from: westfw on January 29, 2019, 07:08:09 am

Quote
I finally got to watch the videos...very interesting. I also thought his presentation was very clear.
Yes, me too! (finally actually watched the videos.)It was a really well-done intro to setting up an embedded development environment and writing your first simple program. But, based on the discussion that's popped up here, I was expecting a lot more detail about the RISC-V instruction set itself. The videos were pretty "generic."

You're absolutely right, but I think it's probably the best thing to do. The instruction set can be easily learned from a book or even reference card. It's getting an environment set up and blinky running that is the stumbling block for most people.

He showed enough of the instruction set to get started, even though there are a couple of bugs in his code.

brucehoult · « **Reply #299 on:** January 29, 2019, 11:51:51 am »

Quote from: legacy on January 29, 2019, 09:58:27 am

Quote from: brucehoult on January 29, 2019, 01:09:43 am
What exactly is it you find amusing there?

It's nice to see that even Western Digital has an interest for RISC-V.

It's 14 months since WD announced they will be converting all of their 1+ billion cores a year to RISC-V. In April last year WD was announced as one of the major investors in a $50.6m Series C round raised by SiFive, and at the same time it was announced that WD signed a multi-year license for SiFive's "Freedom Platform".

And of course the videos that are the subject of this thread are narrated by WDs Chief Technical Officer.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 63017 times)

Share me