Author Topic: passing argument 1 of 'memcpy' discards 'volatile' qualifier from pointer target  (Read 3791 times)

0 Members and 2 Guests are viewing this topic.

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4046
  • Country: gb
  • Doing electronics since the 1960s...
Quote
I do understand the idea of maximizing profit while minimizing development cost

It isn't that. It is simply that one should not change anything for "quite a while" (during which a lot of feature testing etc takes place) before something is released.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6811
  • Country: fi
    • My home page and email address
Quote
I do understand the idea of maximizing profit while minimizing development cost

It isn't that. It is simply that one should not change anything for "quite a while" (during which a lot of feature testing etc takes place) before something is released.
I'm not sure I understand what you mean.  I don't see testing as separate from development, and the way I write code, I don't require specific machine code to be generated, only that it performs as expected.  I don't mind if a new compiler version might optimize the code differently.

Obviously, there are specific sectors like medical, or anything certified at the machine code level, that need a different approach.  However, my suggestion on those would be to use a more appropriate programming language, one with static and dynamic analysis tooling and a reputation for verifiability; say SPARK 2014 (based on Ada).
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3239
  • Country: ca
I'm not sure I understand what you mean.  I don't see testing as separate from development, and the way I write code, I don't require specific machine code to be generated, only that it performs as expected.  I don't mind if a new compiler version might optimize the code differently.

No matter how hard you try, there may be bugs in your code, which may or may not manifest themselves depending on compiler settings and other circumstances. Therefore, before you ship anything in binary form, it should be tested. Typically, so called, regression tests are used which test the majority of the product functions on the exact binary being distributed. Some companies do very thorough tests which may take days to complete. Apparently, Crowdstrike didn't, which caused a little bit of troubles for the company.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3876
  • Country: us
Is there another way to stop the compiler from optimizing out variables? It seems to me that volatile is having to serve two purposes. On the one hand it is saying that this variable will be externally changed by something not in this program so don't optimize it out, but on the other hand I think the compiler makes other assumptions about it?

I'd suggest trying to reframe your question.  What you want is "how do I make sure that behavior I want is part of the observable behavior of the program."  Variables and expressions are not themselves observable behavior, they are just names we assign to describe the program.  They don't really have any meaning after compilation.

Basically the two relevant ways to make a memory access have well defined behavior are volatile and C11 atomics.  They work somewhat differently.  voltaile access is directly a defined side effect of the program: the compiler must respect the order and contents of volatile accesses.  There is some nuance there to do with sequence points and while the C standard doesn't require volatile accesses to be atomic, nor does it directly prevent hardware reordering or optimizations, individual platforms and ABIs do make guarantees needed to operate.  For instance, memory mapped registers are generally configured by the platform to be non-cachable and non-reorderable by the CPU.

C11 atomic operations are not strictly observable behavior, however they are how C11 defines it's memory model.  Specifically, C11 guarantees "sequential consistency for race free programs".  That means that if you use C11 atomic operations correctly to denote critical sections, the program will behave as if every operation happened strictly within the order as written, even in the face of multiple different views of "memory".

Normally we think of volatile as used for IO and atomics as used for multi-core / multi-threaded communication, but it's possible to use atomics to enforce consistent memory view between peripherals and a processor core as well.  This may actually be a good fit for what it sounds like you are trying to do.  The nice thing is that the compiler is allowed to optimize the inside of a critical section however it likes, as long as memory is consistent at the end.  It can also optimize by moving accesses into the critical section, but it can't move them out. 

There are definitely pitfalls to using C11 atomics as well, and for true IO, where the read or write has specific side effects, volatile is the correct tool.  However for a buffer that just needs to be written to before an operation starts, atomics might work well.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6811
  • Country: fi
    • My home page and email address
I'm not sure I understand what you mean.  I don't see testing as separate from development, and the way I write code, I don't require specific machine code to be generated, only that it performs as expected.  I don't mind if a new compiler version might optimize the code differently.
No matter how hard you try, there may be bugs in your code, which may or may not manifest themselves depending on compiler settings and other circumstances. Therefore, before you ship anything in binary form, it should be tested.
Obviously; so obviously in fact, that I consider that testing to be a natural part of the development.  It is not something done "after the project is complete", but at all phases of the project, too.

For example, seeing how users actually use the software, I typically realize new types of tests that can verify the system works as designed.  I expect this, because no specification is truly complete (or even correct) up front, no matter how well written, and consider this too part of the development process.

In practice, I look at development activity as a curve rather similar to typical product lifetimes.  There is a big hump up front, then it tapers down slowly.  Adding new features, new versions, refactoring, adds their own humps, and rewriting yields very much a new curve.

It is because of this, because of how closely intertwined testing should be with development –– they should feed each other, continuously –– that I don't understand the idea of freezing the code base for a while.  I understand freezing in the sense of not adding new features, and working on fixing bugs found in testing, but not leaving the code as is during testing.

The nice thing is that the compiler is allowed to optimize the inside of a critical section however it likes, as long as memory is consistent at the end.  It can also optimize by moving accesses into the critical section, but it can't move them out.
Another option is compiler memory barrier, using asm volatile (""::"r"(buf):"memory"), where buf is the buffer modified.  It works with both GCC and Clang; the compiler will not move any memory accesses to buf over such a barrier, nor eliminate stores to it, before the barrier.  (See related llvm bug (#15495) discussion.)  It is used in the Linux kernel, via the barrier_data macro, for example to ensure that when calling memzero_explicit(s,n), the buffer is truly cleared via memset(s,0,n); and not eliminated because it is not accessed afterwards.

I do suspect that such a compiler memory barrier would have worked for peter-h's use cases, but I'd need to know the precise details of the use case pattern to be sure.  If the data involved is all in SRAM that does not get modified by hardware (other than possibly DMA initiated after the barrier), then it would suffice.

Of course, while it conforms to the C11 memory model, it is not standard C at all, so whether it is a suitable solution worth considering depends on the situation.  Assuming the target is a 32-bit ARM Cortex-M, then I would personally use a compiler memory barrier instead of volatile, when clearing or copying firmware-related stuff from the SRAM.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4046
  • Country: gb
  • Doing electronics since the 1960s...
I don't think we actually disagree on anything, but

Quote
It is not something done "after the project is complete", but at all phases of the project, too.

means you should be testing the version to be shipped "at all phases of the project" :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8725
  • Country: fi
Quote
I do understand the idea of maximizing profit while minimizing development cost

It isn't that. It is simply that one should not change anything for "quite a while" (during which a lot of feature testing etc takes place) before something is released.

Being careful with any change during critical parts of operations is only natural and right. But we should be careful not to use it as an excuse to produce crappy code. I haven't had any large problems switching compiler versions, optimization settings etc. in a long long time. Of course I would not do that and then produce binaries without testing them. But then again, if adjusting compiler versions or optimization flags is something that everybody fears of becoming a "big project", something's seriously wrong in how the program is written.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3239
  • Country: ca
Basically the two relevant ways to make a memory access have well defined behavior are volatile and C11 atomics. 

I thought atomics were implemented with LDREX/STREX. In this case they cannot be use to synchronize memory for DMA.

Anyway, "volatile" is only for C compiler - it guarantees that the compiler emits access instructions, it doesn't guarantee that the instructions do what you want them to do.

If CPU uses caches, care must be taken to make sure the data make it to the memory (not just sitting in cache) by the time when you enable DMA.

With out-of-order CPUs, you may need a memory barrier before you activate DMA because the desired writes may not yet been complete.
 
The following users thanked this post: glenenglish

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3876
  • Country: us
Basically the two relevant ways to make a memory access have well defined behavior are volatile and C11 atomics. 

I thought atomics were implemented with LDREX/STREX. In this case they cannot be use to synchronize memory for DMA.

It depends on the platform and environment (which is a big pitfall with atomics), but generally LDREX/STREX and similar instructions are used only for compare exchange or other read/modify/write operations.  atomic_load and atomic_store are just ordinary load/store with memory barriers, either separate barrier instructions or with dedicated load-acquire / store-release instructions.  I think only 64 bit arm has these, and on 32 bit, an atomic load/store uses dmb.

Quote
If CPU uses caches, care must be taken to make sure the data make it to the memory (not just sitting in cache) by the time when you enable DMA.

With out-of-order CPUs, you may need a memory barrier before you activate DMA because the desired writes may not yet been complete.

Yep, that's what atomic operations do.  Before an atomic store is visible (including to the IO peripheral), all of the preceding memory writes must be globally visible, and neither the compiler or the CPU is allowed to delay them past that point.

Here is a simple example for arm32: https://godbolt.org/z/f9avoGeh6

But whether this works for a given architecture and peripheral is not always clear, and even if it would work, platform headers are not usually set up for this.
 

Online peter-h

  • Super Contributor
  • ***
  • Posts: 4046
  • Country: gb
  • Doing electronics since the 1960s...


The dmb flushes the caches but that register load should be atomic anyway, surely (except for DMA writing to that address)?

We did that here
https://www.eevblog.com/forum/microcontrollers/32f417-arm32-which-variables-are-atomic/msg4781123/#msg4781123

Interesting.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11705
  • Country: us
    • Personal site
DMB does not flush the caches.

LDR is atomic just for loads. LDREX/STREX are a pair of instructions that permit atomic exchange (load and store store that may fail as a pair if some other process loaded the same location before the store could complete).
Alex
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3876
  • Country: us
Yes. The main point of atomic variables (despite the name) is that they create memory ordering constraints that apply both to the compiler and the processor.  It's important that the accesses themselves be actually atomic, but as you say,  machine word sized access is already atomic on every platform worth considering.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15215
  • Country: fr
Yes, that's why only memory barriers are required to be added.

Of course, for the more involved atomic operations (like atomic add/sub), you either require specific instructions, or resort to disabling interrupts (which isn't very pretty but works).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf