Author Topic: GCC compiler optimisation  (Read 45827 times)

0 Members and 2 Guests are viewing this topic.

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #50 on: August 09, 2021, 04:19:09 pm »
Just spent hours on this one.
Why? When something is not clear. "arm-none-eabi-objdump -d file.elf". It would be trivially obvious when code is missing.

Also recent versions of GCC in some cases when they can detect undefined behaviour, will replace the whole chunk of code that relies on the UB with a single UDF instruction.

I ran into it when one control path accidentally used a local pointer that was not initialized. The whole function got replaced with "UDF" and it was awesome because it caused an immediate exception rather than some random exception in the future.
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15446
  • Country: fr
Re: GCC compiler optimisation
« Reply #51 on: August 09, 2021, 04:58:06 pm »
Would something like this get optimised (a read of CPU FLASH)

       uint8_t* p = (uint8_t*) 0x08000000;
      ch=*p++;

on the assumption that the compiler has not seen any code writing to that location? That would be bizzare, surely?

The above piece of code *alone* won't be optimized out as long as the given compiler defines integer to pointer conversion. Indeed, according to the std, this is implementation-defined. Whereas it's "defined" on most implementations around (and in particular compilers targetting embedded targets), this is not in itself portable (but common sense will tell you this anyway), and may not even yield any output code on some particular implementation.

That set aside, as in your case, implementation is surely defined, let's take a look at what happens if you omit the volatile qualifier.
If you read the same location several times, the compiler may (most compilers with optimizations enabled will) only access the location ONCE, and then re-use the same read value at each later occurence (as long as it can be statically inferred). This is particularly problematic when reading typical peripheral "registers", as for instance, reading a certain register in a loop waiting for some flag to toggle WILL not do what you intend if the pointer was not qualified volatile.

Again for the small piece of code you posted, there is possibly missing context for explaining how it will effectively be compiled. In isolation, it can't be optimized out. A pointer dereference will be honored. At least the first time it appears in code flow. It's successive accesses that may get optimized out.

volatile ensures that all accesses to a given variable will be honored in the order they appear. That doesn't just happen with pointer dereference either. It happens with any object qualified volatile.
A small example:

Code: [Select]
int Test1(volatile int n)
{
        return n * n;
}

int Test2(int n)
{
        return n * n;
}

Latest GCC, with -O3:
* In Test1, n will be copied on the stack and accessed twice. (Which admittedly is pretty weird.)
* In Test2, it's not the case.

x86_64 code:
Code: [Select]
Test1:
.LFB1:
.cfi_startproc
movl %edi, -4(%rsp)
movl -4(%rsp), %eax
movl -4(%rsp), %edx
imull %edx, %eax
ret
.cfi_endproc
.LFE1:
.size Test1, .-Test1
.p2align 4
.globl Test2
.type Test2, @function
Test2:
.LFB2:
.cfi_startproc
movl %edi, %eax
imull %edi, %eax
ret
.cfi_endproc
.LFE2:
.size Test2, .-Test2

Use "volatile" when a given object may be modified *outside* the scope of the current compilation unit (so, without the compiler being able to know it can be modified.)
When you do know it's not the case, don't use volatile, as it can yield pretty inefficient code as can be illustrated above.
« Last Edit: August 09, 2021, 05:01:18 pm by SiliconWizard »
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8908
  • Country: fi
Re: GCC compiler optimisation
« Reply #52 on: August 09, 2021, 07:05:38 pm »
You usually set up the registers, then you set the enable bit.
So you need to read, modify, and write back.

But obviously you don't need RMW for this, two writes are enough. I do that all the time, ST libraries do that as well. First write are the flags except enable bit, second write is the same but now with the enable bit set as well. Two compile-time constant values, usually! Easy for the compiler to optimize. ... except the register is qualified volatile preventing this optimization if you do |= or &= on the register directly.


This pattern is best, and actually somewhat used in ST's library code, surprisingly:

Code: [Select]
tmp = flags;
(volatile) peripheral = tmp;
tmp |= enable;
(volatile) peripheral = tmp;

Just two writes to the peripheral. Compiler is free to optimize the the tmp since it's not volatile, using two compile time constant loads for example.
« Last Edit: August 09, 2021, 07:09:40 pm by Siwastaja »
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #53 on: August 09, 2021, 07:35:52 pm »
A pointer dereference will be honored. At least the first time it appears in code flow. It's successive accesses that may get optimized out.

A pointer dereference must be honored of course, but only in the sense of "as if". The resulting code does not need to do any memory fetch. See lines 7 and 16 here.

It is rather about values, and common sub-expression elimination. If the value of a sub-expression *p is already known at a particular point in the code flow, then it does not need to be re-evaluated (unless *p is volatile, or unless the memory location might have been invalidated in the meantime, according to the aliasing rules).
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #54 on: August 09, 2021, 09:45:59 pm »
Interesting...

Should I use

volatile uint8_t* p = (uint8_t*) 0x08000000;

or

uint8_t* p = (volatile uint8_t*) 0x08000000;

The code does run BTW, with the compiler default setting (not sure what it is in Cube, out of the box), and correctly because it is correctly verifying the FLASH content.

Elsewhere I am using e.g. this code to check that a section of FLASH has been erased

Code: [Select]
// Check it is all FFs
for (address=0x080e0000; address<0x080fffff; address+=4)
{
data=*(volatile uint32_t*)address;
if ( data != 0xffffffff )
error++;
}

This is probably wrong too (but it works):



Where should the "volatile" go? Is it enough to put it in the initial declaration of addr i.e.

volatile uint32_t addr=0x08000000;

I can't understand how a compiler could optimise out "addr" however, given that the address being read is "obviously" continually changing (incremented by 512 each time).
« Last Edit: August 09, 2021, 10:10:23 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15446
  • Country: fr
Re: GCC compiler optimisation
« Reply #55 on: August 09, 2021, 10:02:27 pm »
Should I use

volatile uint8_t* p = (uint8_t*) 0x08000000;

or

uint8_t* p = (volatile uint8_t*) 0x08000000;

The second one is not correct. The compiler should give you a warning about it. You're assigning a pointer to volatile to a pointer to non-volatile. As the compiler will tell you, 'p' will just lose the volatile qualification.

The first one is correct.
I tend to do this myself instead, because to me it looks more consistent:

volatile uint8_t* p = (volatile uint8_t*) 0x08000000;

But your version (1) is correct. 'p' should be qualified volatile. The constant pointer on the right hand doesn't need itself to be volatile, so my version is probably unnecessarily verbose. The conversion will be implicit.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #56 on: August 09, 2021, 10:32:43 pm »
But your version (1) is correct. 'p' should be qualified volatile.

More precisely, not the variable p is qualified volatile here, but
Code: [Select]
volatile uint8_t *p;
declares p as (non-volatile) pointer variable, pointing to a volatile memory location. But yes, this is what peter-h wants for this particuar case.

If the pointer variable itself should be volatile, too, then the declaration should look like
Code: [Select]
volatile uint8_t* volatile p;
=> volatile pointer variable, pointing to a avolatile memory location.
« Last Edit: August 09, 2021, 10:37:31 pm by gf »
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #57 on: August 09, 2021, 10:53:14 pm »
I can't understand how a compiler could optimise out "addr" however, given that the address being read is "obviously" continually changing (incremented by 512 each time).

The compiler could certainly call the function AT45dbxx_WritePage() with the constant (uint8_t*)0x08000000 as first argument, and don't allocate any register or stack frame slot for the variable "addr".


Sorry, too late in the night :=\ Overlooked the increment.
« Last Edit: August 09, 2021, 11:13:42 pm by gf »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15446
  • Country: fr
Re: GCC compiler optimisation
« Reply #58 on: August 10, 2021, 01:37:32 am »
Where should the "volatile" go? Is it enough to put it in the initial declaration of addr i.e.

volatile uint32_t addr=0x08000000;

I can't understand how a compiler could optimise out "addr" however, given that the address being read is "obviously" continually changing (incremented by 512 each time).

Yeah. Your question shows that understanding "volatile" is much trickier than it looks for many people.
Possibly you're confusing the "address" and the read operation itself. Possibly same with the pointer.

For the code in your screenshot, there's still one potential pitfall. But it's not in the piece of code you're showing us. It's with the AT45dbxx_WritePage() function you're calling, and it's possibly yet another related problem that we haven't touched quite yet.

Assuming this function was defined within the same compilation unit (which typically means, either within the same source file, or in a source file itself included in this source file), the compiler could decide, depending on how the function is written, that it may have NO effect whatsoever when called with a pointer which points to no known object, and thus decide to optimize the call out entirely. The possible side-effect would be that 'addr' itself would never get incremented, unless you use the 'addr' value after the 'for' loop. If you don't, the optimizer may not even actually generate any code for addr. It would for page though, because page is used in an 'if' condition which itself has effects. Assuming the KDE_LED_xx() functions themselves are not optimized out. Are you starting to like it?

An example of function, used in place of AT45dbxx_WritePage(), that could trigger this behavior, is memset() or memcpy(). If you called memset() with a pointer to something converted from an int (some 'address' to something that's not known by the compiler), the call to memset() could be optimized out. Reason is that the compiler may assume that this call would have no effect.

One way to circumvent this is to write your own memset(), or memcpy() function. With a pointer to volatile as parameter. The std ones don't have the volatile qualifier in their parameters. This is something that can bite your ass here.

So for instance, for an "always has an effect" version of memset(), regardless of what you call it with, you would need to write your own.
The original std one has the following prototype: "void *memset(void *str, int c, size_t n)".
Yours should have this one: "void *my_memset(volatile void *str, int c, size_t n)".

A common trap is believing that merely casting the passed pointer to memset() with a volatile qualifier will do the trick, such as: "memset((volatile void *) p, ..., ...)". It won't, for the reason explained above about your pointer casts. This is actually an example I remember was discussed in another thread on this forum.

 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3937
  • Country: us
Re: GCC compiler optimisation
« Reply #59 on: August 10, 2021, 03:18:45 am »
Interesting...

It does make me wonder whether these optimisations have any impact whatsoever on system performance. I have decades' experience of assembler, and all the tricks people used to do (including self modifying code, which I avoided), so I understand this stuff at the machine level. And in most systems some 1% of the code is speed critical, and one generally gains far more there by sitting down and thinking about doing that job differently, than by rewriting it in the slickest assembler possible.

But that's not how optimizing compilers work.  They don't try to figure out that certain areas are hot and then deliberately produce unoptimized code for everything else because it isn't performance critical in the application.  They just try to produce high performance code everywhere.  To answer you question, yes all those optimizations make a difference, but obviously the difference is only large when the code in question is heavily used.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #60 on: August 10, 2021, 06:20:00 am »
This stuff, especially SiliconWizard's post above, is unbelievable. If "addr" is a problem, reading incrementing CPU FLASH addresses, it seems almost impossible to write working C code unless you are a total expert at what could go mysteriously wrong.

I think I will stick to this compiler version and its compiler options for ever, because right now everything is working :)

My actuarial life expectancy is about 20 years so this is a viable strategy :) I still run some software from ~1995 (in a winXP VM) so I have a solution...

I have often looked at the assembler generated, when single stepping, and it looks perfectly reasonable. There will not be any perf gain obtained by removing little bits of it. To make code run faster, one needs to think about critical portions, which are usually tiny.

BTW, I don't actually understand the syntax of
volatile uint8_t* p = (uint8_t*) 0x08000000;
I got it out of the ST libs, and it seems to work :) I don't use pointers in my own code.
« Last Edit: August 10, 2021, 06:23:26 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #61 on: August 10, 2021, 06:38:56 am »
It is a good idea to understand the syntax of the language you are using.

The syntax of this is easy. "p" is a pointer to a volatile location. The address of that location is 0x08000000. The expression in the right does not need to be volatile because it is never directly used to access the memory, it is only used to initialize the pointer.

A situation where volatile is necessary:

int a = ((volatile uint8_t*)0x08000000)[100]; // read 100th byte from the start of the flash. In this case there is no explicit pointer is created, so the value of the expression is used directly.
Alex
 
The following users thanked this post: newbrain

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #62 on: August 10, 2021, 07:11:38 am »
I usually understand this when it is explained, but it escapes soon :)

In asm one did this all the time but never thought about it in terms of a formal syntax.

Anyway, I looked around Cube and found that currently I am running with no optimisation:



Would it be correct that with no optimisation in GCC, "volatile" is not needed?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1773
  • Country: se
Re: GCC compiler optimisation
« Reply #63 on: August 10, 2021, 09:41:58 am »
Would it be correct that with no optimisation in GCC, "volatile" is not needed?
If some code working or not (not taking performance into account) depends on optimizations being enabled or not, that code is 99% wrong.

Volatile semantic (and atomic etc.) needed or not needed cannot depend on optimization.
A compiler is perfectly enabled to optimize or not your code regardless of its command line options.
That -O0 does what you expect the abstract machine defined in the standard would do, is just an implementation accident.

It is a good idea to understand the syntax of the language you are using.

This, QFT.
And the semantic too.
« Last Edit: August 10, 2021, 09:44:42 am by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: lucazader

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #64 on: August 10, 2021, 12:54:03 pm »
"Volatile semantic (and atomic etc.) needed or not needed cannot depend on optimization."

I know one cannot dispose of a complex topic briefly, but I find it incredible that C is so full of traps. I have seen enough of it over many years to know that basically no program would run.

Can someone demonstrate GCC optimising out a straight read of 0x08000000, using the simplest syntax which I understand of e.g.

 uint32_t address = 0x08000000;
 uint8_t buffer[1000];
 memcpy(buffer,(char*)address,1000);  // memcpy(buffer,address,1000); also works but you get a compiler warning

especially if further down there is

 address+=512;

With writing to that address, that to me looks more dodgy, but why really? It is not a memory variable, which you could discard if you don't see it read later. I do get compiler warnings if I do say

 address=fred+3;

and if address is never later accessed, and if the compiler warns then it can safely discard that load (because it did warn about it).

If you have another RTOS thread picking up address then you have to either make it volatile, or assign something (harmless) from it so it appears to be used.

I have been using global variables for simple inter-thread comms and they never produced a warning in the code writing them, for some reason. I suspect there is a "scope" involved, and perhaps if you have

{
 uint32_t address=36;
 ...
 ...
}

that will warn, because in any case address is not valid outside of the {} block and if it doesn't get picked up within that block, that is a candidate for a warning and probably removal. But if you have declared a static at the start of a .c file

 uint32_t address;

and write to it inside some block then I see no warnings and the code does work. Warnings occur only if the variable is never referenced in the file. Hence I think there is a "scope" involved, outside which it doesn't care. If this wasn't the case then IMHO a lot of programs would never work. And from what I can see, at least with -O0, that scope is just the current block, for variables defined in that block. I have never seen warnings on statics (declared, but never referenced), and the algorithm for picking up statics which were "written to but not read afterwards" would be pretty interesting.

Another thing is that removing such code, without a warning, is unlikely to produce any performance or size gain, because after all the coder intended it to do "something", so all you have achieved is a broken program.
« Last Edit: August 10, 2021, 01:44:13 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #65 on: August 10, 2021, 02:46:57 pm »
Can someone demonstrate GCC optimising out a straight read of 0x08000000, using the simplest syntax which I understand of e.g.

Yes: https://godbolt.org/z/he6avac4E

Edit: And clang even eliminates a memcpy() from a local buffer[] array to 0x08000000: https://godbolt.org/z/86KbWo93z
(-> copying undefined values from buffer[] to 0x08000000 is obviously not considered better than not copying anything)

And if buffer[] is static (i.e. zero-initialized), then clang even replaces memcpy() by memset(), and eliminates buffer[]: https://godbolt.org/z/8Es7xhsr3
« Last Edit: August 10, 2021, 03:14:44 pm by gf »
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1773
  • Country: se
Re: GCC compiler optimisation
« Reply #66 on: August 10, 2021, 02:57:25 pm »
Can someone demonstrate GCC optimising out a straight read of 0x08000000, using the simplest syntax which I understand of e.g.

 uint32_t address = 0x08000000;
 uint8_t buffer[1000];
 memcpy(buffer,(char*)address,1000);  // memcpy(buffer,address,1000); also works but you get a compiler warning

especially if further down there is

 address+=512;
Easy! (and ninjaed by gf...)
In this example on godbolt (always be praised!) the whole shebang is thrown away starting with -O2, and an empty loop is produced with -O1.
How is this not expected? There are, standing the declarations, no observable side effects of that code.

Then, some other random notes:
Quote
If you have another RTOS thread picking up address then you have to either make it volatile, or assign something (harmless) from it so it appears to be used.
No. Do things in the right way - if you rely on things that are not guaranteed by the standard, you will be bitten.
In this case volatile (though, in general, global variables are not very good inter-thread communication primitives...).

Quote
But if you have declared a static at the start of a .c file

 uint32_t address;

and write to it inside some block then I see no warnings and the code does work.
Be careful not to conflate scope, duration and linkage properties of an object.
That variable has file scope, static duration, but it has by default external linkage, meaning the compiler cannot know whether some other translation unit will reference the same object.
Were it declared static (so making that internal linkage), more optimizations are possible.
Yes it is confusing. The "static" storage class specifier affect both linkage and duration.

Quote
Another thing is that removing such code, without a warning, is unlikely to produce any performance or size gain, because after all the coder intended it to do "something", so all you have achieved is a broken program.
The compilers (and the standard) care not what a programmer intends.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #67 on: August 10, 2021, 03:45:32 pm »
uint32_t address = 0x08000000;
 uint8_t buffer[1000];
 memcpy(buffer,(char*)address,1000);  // memcpy(buffer,address,1000); also works but you get a compiler warning

This specific example is perfectly fine, memcpy() will discard volatile anyway. And if memcpy() is an actual library function, not a compiler intrinsic, then it is 100% guaranteed to work.

The examples of it being optimized posted earlier are because buffer is never used for anything. As soon as you use the data in the buffer, a complete code will be generated.
Alex
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8908
  • Country: fi
Re: GCC compiler optimisation
« Reply #68 on: August 10, 2021, 03:46:51 pm »
For someone who understands how the CPU works and has written assembly for decades, C can be surprisingly difficult. In assembly, you directly enter the correct instruction, which is almost always obvious to you as the programmer. In C, there are two cases, the simpler "high-level" case where it doesn't matter, but when the right memory access pattern does matter, then you need to "guide" C through its type system, using pointers, casts, and possibly the volatile qualifier in a way that is different than just writing the right instruction directly.

But it's not impossible to learn, even at old age I'd guess. The rules are simple, it's just a matter of changing the perspective.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #69 on: August 10, 2021, 04:13:21 pm »
ataradov - yes, so those are not great examples because that code fragment does nothing. This

Code: [Select]
int f()
{
    char buffer[512];
    memcpy(buffer, (char*)0x08000000, 512);
    char fred=buffer[0];
    return (int) fred;
}

compiles into real code in all cases.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #70 on: August 10, 2021, 04:23:34 pm »
ataradov - yes, so those are not great examples because that code fragment does nothing. This
....
compiles into real code in all cases.
And it is expected that it would work. Compilers are not that smart with optimizations. But there is a difference between passing pointers to external functions and actions of the compiler itself. The functions are opaque for the compiler in most cases. memcpy() is not the best example, as it is a compiler intrinsic in a lot of cases, not an actual function, so more optimizations are possible.
Alex
 

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4778
  • Country: dk
Re: GCC compiler optimisation
« Reply #71 on: August 10, 2021, 04:28:21 pm »
ataradov - yes, so those are not great examples because that code fragment does nothing. This


sometimes the compiler is smart enough to see that the code does nothing (or something much simpler) even though you can't see it.

the optimizer makes the code do efficiently what you tell it to do, not necessarily how you tell it to do it
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #72 on: August 10, 2021, 04:29:12 pm »
Code: [Select]
int f()
{
    char buffer[512];
    memcpy(buffer, (char*)0x08000000, 512);
    char fred=buffer[0];
    return (int) fred;
}

compiles into real code in all cases.

Clang still does not copy all 512 bytes to buffer[], but fetches only the first byte from 0x08000000, which is actually used at the end: https://godbolt.org/z/jGrv1vYj7
« Last Edit: August 10, 2021, 04:34:20 pm by gf »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #73 on: August 10, 2021, 04:42:06 pm »
"sometimes the compiler is smart enough to see that the code does nothing (or something much simpler) even though you can't see it."

OK, but one would rarely write code which actually does nothing. Sometimes... one sets up a variable which is unused but that's rare.

"Clang still does not copy all 512 bytes to buffer[], but fetches only the first byte from 0x08000000, which is actually used at the end: https://godbolt.org/z/jGrv1vYj7"

That is hilarious!

In this case

Code: [Select]
int f()
{
    char buffer[512];
    memcpy(buffer, (char*)0x08000000, 512);
    char fred=buffer[0];
    fred+=buffer[511];
    return (int) fred;
}

it does just two reads; it is basically optimising out memcpy(), and doing it correctly, so the code would run correctly.

The time it would break is if the "thing" at 0x08000000 was something which was expecting to see 512 read cycles, but I would not use memcpy to achieve 512 contiguous read cycles because it is known that these functions do a load of optimisations e.g. on a 32F they would copy 32 bits at a time and then fix up the ends with byte reads (or some such). So yes this is a good example!

« Last Edit: August 10, 2021, 04:45:40 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #74 on: August 10, 2021, 05:04:38 pm »
A (non-volatile) memory fetch or memory store is not considered a visible side effect, therefore the exact memory access pattern does not need to be preserved by the optimizer.
OTOH, a volatile memory fetch or memory store is by definition considered a visible side effect.
 
The following users thanked this post: newbrain


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf