I am finalizing just right now the stage4 for HPPA2. This architecture, made by HP, is the most stable in terms of changes, but the last PA-RISC CPU, PA8900, adds a few changes against the PA8700, and they are related to the "multi-cores" nature of the PA8900 which causes problems in Linux and in HPUX11 (which needs specific patches to be installed) exactly because it's not expected: previous CPUs were capable of multi-CPUs SMP but they are not multicores.
What have I learned from these experiences? I have learned that every processor family has different habits when it comes to memory reordering, and those habits can only be observed in multicore or multiprocessor configurations, and given that
multicore is now mainstream, it’s worth assuming that the market should have some familiarity with them, so new products are now developped with multi-cores which offer a certain compatibility with their predecessors, but don't assume that all the processors in a family are all the behave the same way in SMP because they do not because there are many types of memory reordering, and not all types of reordering occur equally often.
It all depends on the processor, on its implementation, and even if you’re targeting and/or the toolchain you’re using for development (e.g. java uses a different approach vs C++11).
This problem is known as "
memory model" that tells you, for a given processor && toolchain, exactly
what types of memory reordering to expect at runtime relative to a given source code listing. Keep in mind that the effects (and differences) of memory reordering can only be observed when lock-free programming techniques are used.
What I mean is that we have three kinds of memory models:
- kind-A: you have CPUs that are ONLY sequentially consistent, and this is the ONLY way they van operate in SMP
- kind-B: you might have CPUs that are usually strong, implementing explicit acquire and release, TSO. This usually works in multi-cores SMP at the cost of degrading performances, but .. sometimes it might not work correctly on multi-cores, while it for sure always works in multi-CPUs SMP
- kind-C: you might also have multi-cores that are weak with data dependency reordering. This is assumed to be working in multi-CPUs/multi-cores SMP
All of these three are hardware memory model that tells you what kind of memory ordering to expect at runtime relative to an assembly compiler (and here, you can expect other problems with the C compile ... the C is not thread-safe, so you have to correctly tell the compiler what it has to correctly do. C++11 helps al lot about this, C doesn't).
Now, talking about hardware, between both the HPPA and the PowerPC families you find certain members are the
kind-B, certain are the
kind-C, but in the embedded PowerPC you also find members that are the
kind-A (because based on the oldest/simplest/safest/more conservative CPU-model. e.g. military PPCs need redundancy, which needs to be
kind-A).
Even the x86/64 should be both the
kind-A (in i386 emulation mode) and
kind-B.
Besides, on the software side, Java is only
kind-A oriented, C++11 default atomic is
kind-A, but new the C++11/20xx low level atomic tends to be
kind-C.
You cannot say what will happen in the future: new
kind-D? new
kind-E?
Be prepared