Author Topic: GCC compiler optimisation  (Read 45812 times)

0 Members and 5 Guests are viewing this topic.

Offline Doc Daneeka

  • Contributor
  • Posts: 36
  • Country: it
Re: GCC compiler optimisation
« Reply #200 on: August 17, 2021, 02:14:35 am »
Quote
As to volatile, it's - unfortunately - more subtle than what you said.
1. To begin with: you talk about "observable". The only sentence actually using this term (in C99 at least) in the std is in this part:

An object that is accessed through a restrict-qualified pointer has a special association
with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to
that object use, directly or indirectly, the value of that particular pointer.
The intended
use  of  the restrict qualifier  (like the register storage  class)  is  to  promote
optimization, and deleting all instances of the qualifier from all preprocessing translation
units  composing  a  conforming  program  does  not  change  its  meaning  (i.e.,  observable
behavior
).

C11 onward explicitly adds that access to volatile objects is part of the observable behavior - so this is clearly what is intended here in C99 even though it is not spelled out - otherwise some core semantics of C would be changing between the standards - unlikely.

Quote
From what I understand here, this defines an "observable behavior" as the *meaning* of a program. Problem here is: what is the meaning of a program? The way I get this is the same as what I meant by the "functional POV", so anything volatile-related, when it may have unknown side-effects, but no analyzable effect, is NOT observable behavior. I may be wrong here and I admit we are really nitpicking on terms. I could not find the definition of the "meaning of a program" in the std.

As I said, C11 explicitly adds that access to volatile objects is observable behavior - assuming that is the intent in C99, any volatile access - even one with no side effects 'within' the C program, is observable.

Quote
For instance, taking the typical "delay loop" example, is a "delay loop", doing absolutely nothing apart from taking CPU cycles, part of the meaning of the program? If you can answer this one by a resounding "yes", without a blink, and backing it up with solid arguments, you are better than I am.

Without a definition of 'meaning' of a program - who knows? There are no CPU cycles in abstract C -  But one thing is certain it is not observable behavior as far as abstract C is concerned (assuming something like an empty loop with no volatile or library calls etc etc.). There is probably a reason the standard is framed in terms of observable behavior and not 'meaning'.

Quote
2. More importantly, about the volatile qualifier: it's unfortunately more subtle than it looks. Let's again quote C99 for the relevant parts:

Quote
An  object  that  has  volatile-qualified  type  may  be  modified  in  ways  unknown  to  the
implementation or have other unknown side effects.  Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3.

So far so good. Looks like "volatile" will guarantee that such a qualified object is evaluated in all cases, right?
But we need to refer to the "rules of the abstract machine" it mentions. So, again, relevant parts:

Quote
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects,
which are changes in the state of
the  execution  environment.  Evaluation  of  an  expression  may  produce  side  effects.  At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place. (A summary of the sequence points is given in annex C.)

Still looks, at this point, like the volatile object will be evaluated no matter what. But the following paragraph kind of ruins it all:

Quote
In the abstract machine, all expressions are evaluated as specified by the semantics. An
actual implementation  need  not  evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced (including any caused by
calling a function or accessing a volatile object)
.

So, it looks a bit like what I said earlier. Doesn't it? (See the part in bold.)

No it doesn't - that paragraph does not ruin it at all - it says if something is not a needed side effect (which I don't think is defined anywhere, but it does not matter) it does not need to be evaluated - it absolutely does not say it must *not* be evaluated - you can still have other conditions on an expression which requrie that it be evaluated - which is what all the other paragraphs above do

- "An implementation does not need to evaluate every expression"
- "An implementation must evaluate an expression referring to a volatile qualified object"

It's pretty clear

In any case scope is irrelevant - a 'local' volatile variable for example might be implemented 'on the stack' but might be modified asynchronously - maybe another thread or an OS or task manager or something goes in and fiddles with it - there is even a footnote:

Quote
A volatile declaration  may  be  used  to  describe  an  object  corresponding  to  a  memory-mapped
input/output  port  or  an  object  accessed  by  an  asynchronously  interrupting  function. Actions  on
objects  so  declared  shall  not  be  ‘‘optimized  out’’ by an implementation  or  reordered  except  as
permitted by the rules for evaluating expressions.

The standard (the definition of the language - at least in modern terms) - cannot have anything to say about where local scoped variable are kept - it does not matter to the semantics all that matters is the C program cannot access them outside of their scope - inside their scope all the other semantics still apply
« Last Edit: August 17, 2021, 02:34:00 am by Doc Daneeka »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3937
  • Country: us
Re: GCC compiler optimisation
« Reply #201 on: August 17, 2021, 02:15:53 am »
"Yes, because a compiler is allowed to turn code into a (standard) function call, when that standard function call is available, and -O3 enables all sorts of "unsafe" optimizations and assumptions the compiler may do to get stuff "run faster".  Usually it fails at it, though, so -O3 is rarely used."

That isn't exactly what those above have been telling me (that any failure due to optimisation means my code is crap) :) I can see this "optimisation" is legitimate, but that's not the same Q.
There is optimization, and then there is unsafe optimization.
-Os optimizes for size.  Often the code is fast as well.
-Og optimizes for debugging.
-O and -O1 enables optimizations.
-O2 optimizes even more.  This is the setting that vast majority of projects use, and some people mean when they tell you to compile with optimizations enabled.
-O3 optimizes yet more.  These optimizations usually cause code bloat, and often includes optimization features that have been relatively recently implemented, and are still being tested.  If they were always useful, they'd be included in -O2.  It should not enable any features that relax strict standards conformance, so if the compiler developers were perfect programmers, it would be safe to use -O3.  Unfortunately, in reality, -O3 tends to enable features that programmer-users and compiler-programmers disagree wrt. the standard, or are not sufficiently integrated or debugged in the compiler.  So, while -O3 is safe in theory, it is unsafe in reality.
-Ofast enables all (-O3) optimizations, plus some that are not strictly standards compliant.  That makes it unsafe.

When I advise new programmers, I always recommend starting with -Wall -O2.

I would consider this advice on -O3 obsolete when targeting x86_64 and probably Armv8-A. On these type of architectures -O3 can show considerable performance improvements and the bugs and standard interpretation disagreements have been mostly worked out.  On a microcontroller it is probably true that -O3 does more harm than good.  On a general purpose CPU you should probably try both if you care a lot about performance.
« Last Edit: August 17, 2021, 02:25:43 am by ejeffrey »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #202 on: August 17, 2021, 06:17:08 am »
Re the questions on what I am doing, see these:
https://www.eevblog.com/forum/microcontrollers/how-to-create-elf-file-which-contains-the-normal-prog-plus-a-relocatable-block/
https://www.eevblog.com/forum/microcontrollers/elf-to-binary-for-boot-loader/
https://www.eevblog.com/forum/microcontrollers/32f417-best-way-to-program-the-flash-from-ram-based-code/

-O0 produces 230k
-Og produces 160k
-O2 produces 160k
-O3 produces 180k
-Os produces 146k

I will try to incorporate Nominal Animal's script in the post build batch file. Can anyone recommend a good set of win32 command line utils (sed awk grep etc)? I have an old win16 set but they obviously don't run anymore. Even just getting a list of functions called from the boot block .c files would be sufficient; stuff like memcpy would be obvious. I already got caught with printf debugs but that should have been obvious, so wrote my own puts() :)

Incidentally, using the auto-CS on SPI feature would have saved a whole load of trouble, especially as my SPI2 is dedicated to the serial FLASH. And even with multiple SPI device on one SPI channel, using auto-CS and a demux driven from another couple of pins would work for multiple devices.
« Last Edit: August 17, 2021, 09:25:53 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6278
  • Country: es
Re: GCC compiler optimisation
« Reply #203 on: August 17, 2021, 07:07:35 am »
Maybe cygwin/mingw?
I use cygwin, comes really handy, but will never be the same as a Linux box.
For these commands, bash scripts, dd, sed... It's perfect.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 471
  • Country: de
  • ee - digital & analog
    • My services:
Re: GCC compiler optimisation
« Reply #204 on: August 17, 2021, 04:36:54 pm »
...
There is optimization, and then there is unsafe optimization.
-Os optimizes for size.  Often the code is fast as well.
-Og optimizes for debugging.
-O and -O1 enables optimizations.
-O2 optimizes even more.  This is the setting that vast majority of projects use, and some people mean when they tell you to compile with optimizations enabled.
-O3 optimizes yet more.  ...

When I advise new programmers, I always recommend starting with -Wall -O2.
...

Nominal Animal, thank you for this list. I have been a huge fan of "-Wall -Og", ever since the option became available. I prefer this setting as standard over the higher optimization levels, a way to make sure that the production code will fit the target. I only switch back to "-Wall -O0" when debugging gets nasty.
In your understanding - what would be the drawbacks of using -Og in production code? Size- and performance-wise I see not much difference to -O2.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6984
  • Country: fi
    • My home page and email address
Re: GCC compiler optimisation
« Reply #205 on: August 17, 2021, 08:47:18 pm »
In your understanding - what would be the drawbacks of using -Og in production code? Size- and performance-wise I see not much difference to -O2.
For open source, none really.  For proprietary code, the impact of leaving the debugging information in the binaries is something to consider.  (Sometimes it is a positive, sometimes PHBs don't like the idea at all.  It varies.)

On embedded targets (appliances and microcontrollers) where size is an issue and you do not usually do any debugging on release hardware, optimizing for size and stripping unneeded symbols from ELF binaries (at release, or before converting to a hex file to be uploaded) may be the difference between things fitting and working, or not.  On such situations, I usually make it easy to rebuild the binaries, and test the compiler version used for the actual effects.  Once again, reality wins over theory.

If someone asked me why I am not recommending -Og instead, I'd have to admit, "old habits".  Clang supports -Og as well with similar semantics (enable basic optimizations but with the focus on debuggability), so perhaps -Wall -Og would be a superior suggestion to new programmers; I am not certain yet.



In Linux distributions, there are usually separate versions of library binaries that have debugging information enabled.  It is also possible to provide the debugging symbols for a dynamic library, for example the standard C library (libc6), in a separate ELF dynamic library that only contains the debugging information (libc6-dbg in Debian derivatives).

I don't usually need debugging information in my binaries.  This is not to say I don't do it, only that I have a lot of tools I can use instead, and gdb just isn't one of the fastest/most efficient ones for most cases for me.  When I do use e.g. gdb, I do like to make things easy for myself, for example via python pretty-printing gdb extensions.

As an example, consider abstract data types, like trees, graphs, heaps and such.  When I implement a new (to me) one, I always create a test program that includes graph/tree/heap traversal that does not get confused by loops (i.e., stores the pointer of each visited node, or marks each node visited if there is room in the data structure), and have it emit a nice Graphviz Dot format description of the data structure.  Dot is a very simple, but powerful and expressive text format, so very easy to emit from ones code.  I then test it with random, typical, and pathological data sets, and examine the trees such generated.  Not only is it obvious if there is an unwanted cycle or similar problem, but adding information (like recursion depth, level, or distance from initial node) to the Graphviz graph can help pinpoint the root cause in fraction of the time than e.g. single-stepping through the code could.

Even when writing say recursive code, emitting the call graph as a Graphviz Dot directed graph, can be much more informative than gdb debugging, even single-stepping through the recursive code.  See this and this for example graphs (in SVG form, drawn using graphviz dot -Tsvg then cleaned up in Inkscape and minimized by hand for online use), describing how the very common recursive Fibonacci exercise call graph can be visualized.  If you are familiar with the sequence and the exercise, I don't think I even need to describe the graphs, really; it is pretty darned obvious...  just imagine if you had a bug, how obvious that bug would be in the graph.

I believe the visual methods also helps new programmers to come up with their own visualization/modeling methods, when dealing with code or data structures and flow.  (My own tend to be chaotic, as if I were using paper as a cache for my mind.  Because of that, after I work a problem out, I write a simple text file, perhaps with a descriptive image or two, to document the solution, saving them and the code in a dedicated directory.  After a month or a year, the stuff is as foreign to me as it would be to anyone else.  I don't like trying to memorize anything, so that documentation is useful even if I were the only one ever to access them.)

All this said, I would not be surprised if someone had a different experience and therefore different opinion on this.  This is just one of the patterns I've found to work.
You could say a core reason why I place much less weight on debugging tools than others is that I very much believe understanding (or "grokking") the intent/purpose/design of the code correctly, is much more important than getting the code to have the effects you want.  Debugging tools help you with the latter, but I want to do the former, and emphasize the need to do the former, for example via visual tools like Graphviz.
« Last Edit: August 17, 2021, 08:49:14 pm by Nominal Animal »
 
The following users thanked this post: harerod

Offline ttt

  • Regular Contributor
  • *
  • Posts: 87
  • Country: us
Re: GCC compiler optimisation
« Reply #206 on: August 17, 2021, 09:13:56 pm »
I take a slightly different perspective on optimizations with gcc, it's not all about -Ox. I am usually actively trying to fit the code into the lowest cost MCU possible, which can mean the smaller flash size variant.

So -Og for debugging and -Os for release builds by default. And then decorate my performance critical function with optimization attributes as such:

    __attribute__ ((hot, flatten, optimize("O3"), optimize("unroll-loops")))
    void Strip::ws2812_alike_convert(const size_t start, const size_t end) {
...

In addition, and that has not been mentioned in this thread yet, link time optimization (-flto) makes a _huge_ difference code size and performance wise. Though it can be tricky to make a code base LTO safe, link order and symbol visibility issues can be frustrating.
 
The following users thanked this post: lucazader

Offline cfbsoftware

  • Regular Contributor
  • *
  • Posts: 124
  • Country: au
    • Astrobe: Oberon IDE for Cortex-M and FPGA Development
Re: GCC compiler optimisation
« Reply #207 on: August 17, 2021, 09:36:32 pm »
So -Og for debugging and -Os for release builds by default.
Do you do your both your unit testing and integration testing on both builds?
Chris Burrows
CFB Software
https://www.astrobe.com
 

Offline ttt

  • Regular Contributor
  • *
  • Posts: 87
  • Country: us
Re: GCC compiler optimisation
« Reply #208 on: August 17, 2021, 10:04:45 pm »
So -Og for debugging and -Os for release builds by default.
Do you do your both your unit testing and integration testing on both builds?

Yes, if it fits :-) Behavior of the code changes depending on optimization flags. You would think that it is a sign of badly written code but it is all to common to run into race conditions if anything is slightly timing critical.
« Last Edit: August 17, 2021, 10:10:41 pm by ttt »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #209 on: August 17, 2021, 11:58:44 pm »
Can anyone explain what is meant by "debugging information" included in the code?

I am able to step through code compiled with -O0 all the way through the -O3 etc. With the higher levels it doesn't make much sense but the source code is still visible because it is available locally so the debugger can refer to it.

" it is all to common to run into race conditions if anything is slightly timing critical."

It isn't just race conditions; it is all kinds of stuff which break. In addition to the example already posted (replacement of a loop with memcpy() which broke a program which was supposed to live in the bottom 32k) I have just spent a few more hours chasing down a much more subtle issue where an Adesto serial FLASH seems to have an undocumented sensitivity to minimum CS=1 time when reading out parts of the manufacturer ID etc.

I have cygwin (use rsync a lot) so will try that for the above mentioned test.
« Last Edit: August 18, 2021, 12:17:25 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #210 on: August 18, 2021, 12:03:28 am »
There is no debugging information in the code itself. You should not confuse options -Og, which is just optimization setting that generates code that is better for debugging (less rearranging). And -g option, which generates actual debug information (correspondence of the assembly instructions to the C source, variable names and locations, etc).

In any case you can strip the debug information from any ELF file regardless of how it was created.

And if you only distribute binary files (BIN, HEX), then there is no debug information, of course. It only applies to ELF files.
« Last Edit: August 18, 2021, 12:04:59 am by ataradov »
Alex
 
The following users thanked this post: newbrain, Nominal Animal

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15444
  • Country: fr
Re: GCC compiler optimisation
« Reply #211 on: August 18, 2021, 12:10:08 am »
Yes, debug information is generated with the '-g' option. That can be combined with any optimization options.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #212 on: August 18, 2021, 12:52:09 am »
Having done 40 years of assembler I remain unimpressed with that loop replacement with memcpy.

If the loop is short then the replacement cannot make sense. The only thing which would help, and only with a prefetch queue, would be to unroll the loop and inline it. The memcpy function is a lot of code because it will do it 4 bytes at a time and then (or beforehand) tidy up any unaligned ends. And if the compiler was trying to evaluate the loop length, it got it wrong because it was at most 6 bytes (within a 512 byte buffer).

If the loop is long then ok. But the designer would have probably used memcpy anyway if doing hundreds of bytes or more.

I see FatFS do their own versions of these functions, probably because they had problems too. But the compiler will try replacing these too.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #213 on: August 18, 2021, 01:20:41 am »
If the loop is short then the replacement cannot make sense.
But if you use that memcpy in many cases, then it will be replaced with calls. If you only have one, then it makes no sense to place it as a function and them make a call.

And doing aligned transfers is good for performance.

I see FatFS do their own versions of these functions, probably because they had problems too. But the compiler will try replacing these too.
Their functions would be recognized as copy functions and replaced with memcpy(). They did it because you can't guarantee that memcpy() will be present on the target platform. I often build things with "-nostdlib" flag, making sure no standard library functions are included.

Alex
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1773
  • Country: se
Re: GCC compiler optimisation
« Reply #214 on: August 18, 2021, 09:06:36 am »
Slightly off-topic, as we are talking about gcc here, but here is an interesting explanation on how clang + LLVM perform optimizations.

The example is this nifty test for evenness:
Code: [Select]
bool isEven(int number)
{
    int numberCompare = 0;
    bool even = true;

    while (number != numberCompare)
    {
        even = !even;
        numberCompare++;
    }
    return even;
}

Do not miss the corresponding Hacker News discussion.

The takeaway is, again: a conforming C compiler is allowed to do more or less whatever it fancies, as long as the osservable behaviour (in this case: calling the isEven function, under a number of assumption...) is guaranteed.
« Last Edit: August 18, 2021, 09:08:52 am by newbrain »
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: DiTBho

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 471
  • Country: de
  • ee - digital & analog
    • My services:
Re: GCC compiler optimisation
« Reply #215 on: August 18, 2021, 09:39:48 am »
Quote from: harerod on Yesterday at 17:36:54

    In your understanding - what would be the drawbacks of using -Og in production code? Size- and performance-wise I see not much difference to -O2.

...
If someone asked me why I am not recommending -Og instead, I'd have to admit, "old habits".  Clang supports -Og as well with similar semantics (enable basic optimizations but with the focus on debuggability), so perhaps -Wall -Og would be a superior suggestion to new programmers; I am not certain yet.
...

Nominal Animal, I appreciate your input, because of our different views. You seem to be a highly trained software expert who happens to write code for embedded systems. I am a hardware designer who also happens to write code. :)

+ + +

Several posts ago somebody asked for a tutorial. Why not have a look at the available documentation? Basic information for this thread spans STM32, CubeIDE and ARM-GCC:

First of all the MCU involved:
https://www.st.com/en/microcontrollers-microprocessors/stm32f407-417.html#documentation

Maybe the release note for the CubeIDE:
https://www.st.com/content/ccc/resource/technical/document/release_note/group0/9a/72/48/16/ec/bd/44/5a/DM00603738/files/DM00603738.pdf/jcr:content/translations/en.DM00603738.pdf <- RN0114 CubeIDE release note

ARM-GCC:
https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm
https://gcc.gnu.org/onlinedocs/9.3.0/ <- GCC used in CubeIDE 1.7.0, if I read RN0114 correctly
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6278
  • Country: es
Re: GCC compiler optimisation
« Reply #216 on: August 18, 2021, 10:50:20 am »
Not that kind documentation. A proper guide/manual for the HAL.
You have a simple one, barely describing what each function does. But you still have to guess a lot of things.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 471
  • Country: de
  • ee - digital & analog
    • My services:
Re: GCC compiler optimisation
« Reply #217 on: August 18, 2021, 11:21:13 am »
DavidAlfa, let's drift too far off-topic. Kindly follow me to the CubeIDE thread:
https://www.eevblog.com/forum/microcontrollers/is-st-cube-ide-a-piece-of-buggy-crap/msg3632875/#msg3632875


 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6984
  • Country: fi
    • My home page and email address
Re: GCC compiler optimisation
« Reply #218 on: August 18, 2021, 02:22:02 pm »
I would consider this advice on -O3 obsolete when targeting x86_64 and probably Armv8-A.
Perhaps, but it very much depends on the compiler and especially compiler version.

For example, I do not use anything newer than GCC 9 for ARM targets, because of the unfixed issues in later versions.  I'm seriously considering switching to Clang for arm, anyway.

I take a slightly different perspective on optimizations with gcc, it's not all about -Ox.
Very true; I too mentioned specific optimization flags that I end up using; for example, -ffinite-math-only can make a big difference and be very useful when you have e.g. explicit checks so that you never do division by values very close to zero and such.

However, I like to keep such things in separate compilation units (files).  For optimized routines, I often have alternates with the exact same interface but wildly different implementations, and choose the implementation simply by selecting which C source file (among the alternates) is used: either via Makefile options, or via a common .c source file that #includes the appropriate .c source file based on preprocessor macros.  (Note that some people do have an irrational dislike of #include used with source files though; it seems that it jars some peoples sensitivities somehow.)

Can anyone explain what is meant by "debugging information" included in the code?
As ataradov and SiliconWizard already mentioned, there is no debugging information per se in the code.

Some optimization flags do affect the debuggability of the code, though; in particular, -fomit-frame-pointer.  In many architectures, the address of the current stack frame is kept in a separate register.  This option disables that (so that the stack frame is then implicit, and local variables on stack are accessed via the stack pointer).  On some architectures, this can make debugging much harder; according to documentation, impossible on some, but I'm not sure on which arches that is.  Stack frames can still be described for each function via separate debugging data, for example when using DWARF formats (for the debugging data).

Object files and especially final ELF binaries will contain a lot of extra information when debugging information is enabled.  If your build facilities are such that the ELF files are stripped before uploaded to the target device (say, like in Arduino environment, or most environments targetting microcontrollers), it does not matter whether the compilation included debugging information in the object files or not (whether -g was used or not); but the optimization options used does.

One nice thing about -Og is that the compiled code should be the same regardless of whether debugging information is included or not via -g, in the object files and final binaries.  If one uses -O2 or -Os , and then switches to say -O0 -g for debugging, the compiled code is usually different, making debugging problems more difficult than necessary.  Also, both -O2 and -Os enable -fomit-frame-pointer, affecting debuggability.  I do believe it was the programmers' need for an optimization level that generates reasonably optimized code without affecting debuggability that caused GCC to grow support for -Og in GCC 4.8 in 2013, I believe; but I'm not sure if Clang actually implemented it first and GCC users found how useful it is, or vice versa.

Since we're using the ELF file format for object files (and final binaries before converting to hex), knowing the structure of ELF files can be very useful, since ELF files can contain all sorts of information, not just "code" and "data".

I often build things with "-nostdlib" flag, making sure no standard library functions are included.
Me too, but with GCC, it is not enough to avoid a dependency on memcpy(), memmove(), memset(), and memcmp(), because GCC expects these to be provided by even a freestanding environment; see the second-to-last paragraph in section 2.1, GCC C-Language Standards in GCC documentation.

It is not too common for GCC (across its versions and compiler options) to turn loops into a call of one of the above, I think.

A bit of glue logic (even preprocessor macros detecting compile-time type or alignment, so that an optimized native-word-sized operations can be used) is usually enough to ensure it does not happen for a particular function implementation.  When one does need these four functions anyway, or duplicates of them in the same project, the library-provided ones are weak, and one can override those simply by implementing ones own, using the same function signature (including name).

If one defines them in the same compilation unit (file or files compiled in the same gcc/clang command), the pattern shown by ttt in post #206 can be used to control the optimization flags; and the scriptlet I showed earlier can be used to verify the compiled object file contains no external dependencies.

Sometimes it can be worth the effort to implement these (separate variants for loop direction and access size for memcpy()/memmove(), separate access sizes for memset(), and only a byte-by-byte memcmp()) in extended inline assembly (asm volatile ("code" : outputs : inputs : clobbers); as the function body).  If in the same compilation unit, I recommend using an #include "memfuncs.c" so that the implementation is easy to change/select at build time, for example based on the hardware architecture.  It also makes unit testing them (with a separate program) much easier.

Nominal Animal, I appreciate your input, because of our different views.
I too appreciate different views, especially when people describe the reasons for their different views (like you did, and members like ataradov and SiliconWizard and many others do), because that way I can learn.  I know I don't know really that much; but I can and am willing to learn.  Never hesitate to correct me, if you believe I am in error; I very much appreciate that.

My communications style is far from optimal (verbose, sometimes looks like I'm trying to be more authoritative than I actually am, me occasionally fail English, and so on); but my attempt is always to describe my reasons, with my current opinion (based on those reasons) more like a side note than the focus, because my opinions change as I learn.  But that sort of describing-the-reasons can sometimes appear as The List Of Facts, which they aren't; they're just the stuff I currently am aware of.
« Last Edit: August 18, 2021, 02:24:55 pm by Nominal Animal »
 
The following users thanked this post: peter-h, lucazader, harerod

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #219 on: August 18, 2021, 04:58:27 pm »
"My communications style is far from optimal "

Your communication style is excellent :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6984
  • Country: fi
    • My home page and email address
Re: GCC compiler optimisation
« Reply #220 on: August 19, 2021, 06:55:24 pm »
No.  If I were more concise and less blunt and confrontational, I'd be more effective.

As it is, I'm in the ignore list (Profile, Modify Profile > Buddies/Ignore list... > Edit ignore list) of quite a few members and at least one admin.

I've managed to anger several members whose knowledge and expertise I value, inadvertently, due to my communications style – ataradov and Bruce Hoult I know for sure, but how many others I haven't even realized? :-//

I should be more aware, because I use my pseudonym for the express purpose of being able to interact with others: I'm much too sensitive to perceived slights to my person, and using a pseudonym helps me remember any negativity is caused by my own output, and is not about my immutable characteristics: about what I said, not about who I am.  My output I can affect, at least to some degree; my person, not so much.



As to GCC compiler optimization:

GCC has version-specific documentation, as well as the latest version of the documentation available online.  Don't be afraid or think you need to know these, because you don't.  You don't memorize IC datasheets, so why the heck would you memorize GCC or Make manuals either?  If you use them constantly, some details may stick in your memory, and that's fine; but it isn't necessary at all.  My own memory does not do that much (or rather, it can get the smallest details wrong), so I personally don't trust my memory for the details, and instead have concentrated on my searching and lookup skills (including fast reading/glancing to find appropriate contexts, that I need to actually read, in the sense that I am conscious of the sentences).  I usually have a browser window (in a separate workspace/virtual desktop) with tabs open to relevant manuals only.  (I can warmly recommend the Linux man-pages project for up-to-date pages on POSIX C interfaces.  It is not complete wrt. non-Linux interfaces (BSD, Solaris), but those that it does cover, it mentions even the standards/sources where those interfaces are derived.)

In this thread, the GCC documentation page on optimize options should be extremely useful.  Not just the basic -O options, but also wrt. the individual options one might define for specific files or functions.

I started with systems integration stuff ("making my own Linux distro") before the turn of the century, in the Linux from Scratch community.  Around the turn of the century, I also started packaging some of my output as RPM and DEB packages.  Even if one does not intend to create such packages, browsing through the Debian packaging tutorial (and say RPM packaging guide) is an excellent source of an overview of how software packages have been delivered on a number of Linux systems very efficiently.  Aside from human causes (incorrect/bad package dependencies and such), these are very robust formats, and include pre-install, post-install, pre-remove, and post-remove shell scripts triggered by the corresponding action: this is where I discovered the utility of such, and started incorporating scripts into my Makefiles, so that I could more easily fully automate my builds.

Now, if one goes to some package page in Debian or debian-derivatives like Ubuntu, say Inscape for Ubuntu 20.04 LTS (focal), you can find the source archives for the package.  If you download the .debian.gz/.debian.bz2/.debian.xz one, you get the Debian packaging additions (that are extracted into the debian/ subdirectory of the original source tree).  The interesting file in source archives is the rules file, since it defines the variables (DEB_BUILD_MAINT_OPTIONS, DEB_CFLAGS_MAINT_APPEND, DEB_LDFLAGS_MAINT_APPEND) describing the options (including optimization options) on how the sources were compiled to obtain the particular binaries.  If not defined, the defaults set in the project source tree (Makefile, CMake, etc.) are used.  Similarly, the spec file in source RPM files (.srpm) contain the optimization options etc. used to compile the binaries; see e.g. Fedora build flags documentation for details.

There are three methods to modify the optimization options within a single source file: the optimize function attribute, #pragma, and _Pragma().  GCC documentation states that these may not support all options, and should only be used for debugging, not production code.  Because of this, I do recommend compiling functions that may need special compile options in separate compilation units (separate .c source files) instead.

Since makefiles support both generic recipes and recipes used for specific files (even though they match the generic recipe rule), one only needs to add a new compilation recipe to handle each specific file or files that need separate compiler options to be used.

The GNU make manual is something any Makefile user should at least browse through.  Not to memorize anything, or even understand the purpose, but to get an overview of its capabilities.  Like with source control tools, you don't need to be a PhD to use make effectively; the important thing is to understand the overview, and have the manual handy for referencing any details.

The linker and linker scripts are another thing that initially seem very complex, but are actually rather straightforward.  The terminology (section vs. segment, and so on) used in ELF linkers can be confusing, so I recommend writing a short crib sheet for the terms, using ones own words.  With GCC and Clang, I do not recommend directly executing the linker at the link phase; it is better to let the compiler internally call the (correct) linker for the target.  Both compilers do provide command-line options on how to pass parameters to the linker (-Wl,param as a single command-line argument, so this is not a restriction, really.  This also means that if a Makefile contains $(LD), you know it is an "old-style" one that executes the linker directly, instead of through the compiler.  In particular, the compiler knows the target and options needed to supply to the linker to generate code for that target, so there may be some parameters not passed to the linker if you execute the linker directly, or you might even execute the incorrect linker.  Better let the compiler handle it.
« Last Edit: August 19, 2021, 10:38:26 pm by Nominal Animal »
 
The following users thanked this post: harerod

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8908
  • Country: fi
Re: GCC compiler optimisation
« Reply #221 on: August 20, 2021, 06:46:49 am »
Nominal, this is OT but just a quick note, when people discuss, sometimes things get a bit heated and people even get angry. This usually means no hard feelings afterwards, and when kept to modest amounts, is felt rewarding or cathartic. It's normal, it's life, and something which makes life more worth living, I would not prefer a dull world where emotions are all but suppressed. I'm 99% sure you haven't angered ataradov or Bruce Hoult "in the wrong way" at all.

Those who can't grok that you also have an emotional side as well, it's their problem. If you are on their ignore lists, good riddance, at least they won't pick unnecessary battles.

Not all long posts are created the same. Your verbose style is OK because I find it easy to just skim through the posts and still find the key points whenever I don't have time to read it all. This is because despite verbosity (and occasional addition of OT), the posts are properly organized and paragraphed. And when I do have time, I may enjoy reading it all slowly. The choice is left to the reader. No issue here.
« Last Edit: August 20, 2021, 06:48:44 am by Siwastaja »
 
The following users thanked this post: thm_w, Nominal Animal, harerod

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #222 on: August 22, 2021, 08:52:42 pm »
Back on the topic of compiler optimisation:

I've just watched this video:



He seems to have used "volatile" for just about every variable in any way connected with the flash programming code, and I just don't get it. Take a look around 8:20.



Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #223 on: August 22, 2021, 08:59:25 pm »
This is absolutely not necessary, but also does not hurt. He still uses high level APIs, so all those volatiles will be discarded anyway.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #224 on: August 22, 2021, 09:06:19 pm »
Whenever I tried declaring say a buffer as "volatile", and passed its address to some function whose prototype didn't have a "volatile" on that parameter, I got a compiler warning that the "volatile" is being discarded.

Sure it does not hurt but then why not just make everything "volatile". It makes no sense to do it as a precaution, just in case. AIUI, if you read location 0x0800000 into some variable then that variable must be "volatile", but code which subsequently accesses that variable does not have to be.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf