For the last twenty years, I've written a
lot of C code, the vast majority compiled with GCC, some with Clang, Intel CC, Pathscale, or Portland Group compilers. Always with -O2. My bug rate is lower than average, too; I do claim to
know this stuff.
(I'm also proficient in several other programming languages from Fortran to Python, and have a pretty widely ranging background in development, so I'm not "stuck" in C, or in imperative programming languages, either. By this I mean, I have years of experience in software development in very different fields from web development to microcontrollers; I'm not a one-niche guy, who assumes their experience in one niche is extensible to everywhere else. I've just found that in all niches where I've used C,
-O2 has been the proper optimization choice.)
On x86 and x86-64, I do a lot of parallel and distributed processing. For efficiency, I often use lockless structures via compiler-provided atomic built-ins. A decade ago, I used to write a lot if extended inline assembly for SIMD operations, but nowadays the x86 intrinsics are so well integrated to the compilers it is no longer necessary. So, I do claim I know quite a bit about complex interactions between threads, atomicity, and manipulating volatile data (pun intended).
The first MCU I started developing on was a Teensy 2.0++, an Atmel AT90USB1286, on top of a set of header files, avr-libc, and avr-gcc.
I have a few dev boards using ATtinys (digispark clones), ATmega32u4 (pro micro clones), and ATmega328 (pro mini clones) that I've programmed the same way; on bare metal. For more interesting stuff, I now use ARM microcontrollers, in Arduino or PlatformIO environments. (I particularly like Teensy LC, 3.2, 4.0, and 4.1, all of which I have at least one. I do have about a dozen others, from various manufacturers, some still in their original packaging.)
On the electronics side, I'm an utter ham-handed hobbyist. I do have a physics background, with theoretical courses over electronics (up to opamps and digital logic), as my "core field" is computational materials physics, specifically simulator software development, but have only used this "in anger" in the last few years, mostly using EasyEda and JLCPCB (because it's so darned easy there). So, I'm still learning myself, and not "stuck" believing I know everything I need to know; I know I don't know enough, and am
very interested in learning, and not at all afraid of admitting publicly when I'm wrong. I'm deliberately very blunt that way. It's cathartic, too.
GCC
atomic built-ins are available for ARM architectures, and are also provided by LLVM Clang (ie. if you use clang to compile to ARM targets). Do not let the C++18 reference mislead you; all they mean is that the six
__ATOMIC_ memory order constraints use the memory model definitions in the C++18 standard, that's all. It is perfectly acceptable to use these in plain C code, or embedded C++. I typically end up using
__ATOMIC_SEQ_CST anyway. The "trick" is to always use the atomic built-in when accessing a variable, and not mix non-atomic accesses with atomic ones, unless the non-atomic accesses are allowed to occasionally be garbled (like in a compare-and-swap loop).
The C standard is defined in terms of an abstract machine. Mostly, C compilers strictly follow this standard. GCC (and other compilers) have options that diverge or relax some rules. Generally,
-O2 does not include any of those. You can check by examining the output of
gcc -c -Q -O2 --help=optimizers.
The C standard does leave many things "implementation defined". (When you use freestanding C++, for example in the Arduino environment, almost everything is "implementation defined" per the C++ standard; only when you know a feature is available, can you reasonably consult the C++ standard to see how it is supposed to be implemented. The corresponding freestanding C environment is much more "defined", so that the environment normally used for microcontroller development is quite a complicated subset of C and C++.)
The most relevant terms here are
immutable,
constant,
const,
volatile, and
atomic.
Atomic is the most complicated one, because it really refers to several things that attempt to achieve the same result. In certain architectures, basic accesses to base types may be inherently atomic; this depends on the hardware architecture. Later C and C++ standards define
atomic types, corresponding to these types. GCC and LLVM-clang provide the aforementioned builtins, that implements such atomic accesses, if it is possible in the current hardware; if the target is such an inherently atomic type, these builtins compile to basic accesses, and thus are very efficient ways to implement atomic accesses. In both cases, the problem is that not all hardware architectures implement the full complement –– in particular, most are either compare-exchange or load-locked, store-conditional type –– so one kinda-sorta needs to check the compiler generated assembly or machine code to see if the constructs you need generate sane-looking code. If there are things like "disable interrupts", you know that operation isn't really atomic on that architecture, and the compiler is trying to work around the hardware; a different code pattern is then needed on that hardware.
Constant !=
const. The C and C++ standards use specific definitions for terms like "literal constant" and "constant"; they are not necessarily what one might think they mean. So, for
constant, be careful to check the context in which it is used. (Also, if you find your compiler does not do something that the standard says it should, means either the compiler has a bug, or the compiler developers and you see that passage in the standard differently. I always say that
reality trumps theory, because it does. Instead of railing against it, it is more effective to report it (but accept that it likely will be ignored) and work around it, because what matters is that the generated code works as required in all situations in real life; whether it is exactly according to rules drawn up by a committee is always a secondary concern, something for the business and people staff to discuss in their endless meetings.)
Immutable is used in the sense that "this is not allowed to be modified". From the C programmers view, an immutable object or variable resides in read-only memory, and an attempt to modify such causes "undefined behaviour", something that depends on the hardware and the environment used. In userspace code running under a full operating system, it usually leads to segmentation violation error, and a crash of that process. In a microcontroller, the attempt may be ignored, the MCU can reset, or an interrupt fire. It varies.
This leaves the two C keywords,
const and
volatile.
const is a promise from the programmer to the compiler that the code does not try to modify the object or variable such denoted.
volatile is the inverse: it means the compiler is not allowed to make any assumptions whatsoever about the object or variable such denoted.
This means that constructs such as
const volatile int foo; are perfectly valid and useful. The
const keyword is a promise to the compiler that the code in this scope will not try to modify
foo, and the
volatile keyword tells the compiler that whenever
foo is used, it must read its value from memory, because it may be modified by something unknown; even by hardware, another thread, whatever.
Trick is,
const and
volatile work exactly that way in an expression as well. Even if you have an object or variable not declared
volatile, you can take its address, cast that to a pointer to volatile to the type of the object/variable, and dereference the cast; such access is then equivalent to one when the variable or object was declared
volatile in the first place. I do not recommend this as a general pattern, because it means the type of that variable must be duplicated in every such cast.
A much better pattern is to have the variable or object declared
volatile, but in any scope where a snapshot of that suffices, just copy its value into a local
const one (non-volatile).
There are a couple of additional details in C that are useful when dealing with numerical expressions (important when you get into stuff like Kahan summation), without going into the details of how that abstract machine works and what
side effects and
sequence points are. First is that in C99 and later, casts of numeric types, both reals and integers, limit the range and precision to that of the cast type. The second is that unsigned integer arithmetic is modulo arithmetic (wraps around), and since C99, there are exact-width unsigned binary types
uintN_t, binary twos complement types
intN_t (and corresponding minimum-width and optimum-width/fast types) provided by the compiler in
<stdint.h> even in freestanding environments (i.e.,
always). As of 2021-08-16, the fixed-point support in GCC is not good enough to really use in my opinion. Using the
integer overflow built-ins you can do multi-limb (multi-byte/word) counters trivially. (Making one atomic really needs two generation counters, and a retry loop, though; and for both reading and incrementing/modification.)
Finally,
compiler barriers, in particular
__asm__ __volatile__("": : :"memory"); , can be used to ensure all memory accesses in preceding code are done prior to this barrier, and all memory accesses done in succeeding code are done after this barrier. Basically, it makes sure the compiler does not move memory accesses across this barrier. (It does this by basically telling the compiler that everything it knows about memory contents at this exact point becomes invalid.)
I really, really do not understand why you'd find
-O0 necessary. I suspect it is because you haven't really yet
grokked how C compilers and the C language work, deep inside the nitty gritty details. This is not an insult; not all C programmers need that kind of deep understanding to effectively wield C in anger, but since you found you need to disable optimizations to get the code you want, I suspect you do need to know. I warmly recommend reading the standard; specifically, starting with the C99 version, because it is the most widely supported one (except for Microsoft C++ compiler, as Microsoft still refuses to fully support C99, even after contributing significantly to later C11 version). The final draft, with the three corrigenda included, is publicly available as
n1256.pdf at open-std.org. (For C11, the final draft is
n1570.pdf, and for C18, archived as
n2176.pdf.) The actual standards can be bought from ISO, but I haven't bothered; too expensive for what they are.