Author Topic: GCC compiler optimisation  (Read 45811 times)

0 Members and 5 Guests are viewing this topic.

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
GCC compiler optimisation
« on: August 02, 2021, 01:31:11 pm »
Cube IDE, 32F417.

I've just had a funny one. I have a boot loader which transfers control to base+32k (0x08008000) and that is where the linker script places main.o. And I put main() right at the start.

However, if you used the SWV ITM / SWD debug interface, the compiler was putting ITM_SendChar() at the start of main.o! This is actually a macro doing some inline code.

It looks like the compiler is placing inline code before normal functions. This is news to me; I thought that compilers didn't change the order of functions in a .c file :) Why should they?

I fixed it initially by changing the function to a normal one

Code: [Select]
static void ITM_SendChar_2 (uint32_t ch)
{
  if (((ITM->TCR & ITM_TCR_ITMENA_Msk) != 0UL) &&      /* ITM enabled */
      ((ITM->TER & 1UL               ) != 0UL)   )     /* ITM Port #0 enabled */
  {
    while (ITM->PORT[0U].u32 == 0UL)
    {
      __NOP();
    }
    ITM->PORT[0U].u8 = (uint8_t)ch;
  }
  return;
}

but the proper fix was do create main_stub.c which contains just main() which then calls real_main(), and in the linkfile you put main_stub.o first

Code: [Select]

  /* The rest of the code goes here, loaded at base+32k, starting with a stub and then the real main() */

  .main_stub.o :
  {
    . = ALIGN(4);
    KEEP(*(.main_stub.o))
    *main_stub.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_APP
   
  .main.o :
  {
    . = ALIGN(4);
    KEEP(*(.main.o))
    *main.o (.text .text* .rodata .rodata*)
    . = ALIGN(4);
  } >FLASH_APP
 
/* This collects all other stuff, which gets loaded into FLASH after main.o above */
 
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbol at end of code */
} >FLASH_APP
« Last Edit: August 02, 2021, 01:33:28 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ComradeXavier

  • Contributor
  • Posts: 18
  • Country: us
Re: GCC compiler optimisation
« Reply #1 on: August 02, 2021, 02:00:22 pm »
It looks like the compiler is placing inline code before normal functions. This is news to me; I thought that compilers didn't change the order of functions in a .c file :) Why should they?
I don't think there's generally any guarantee that object code will be in the same order as the source.

But assuming that that source order generally holds, remember that the compiler's view of a .c file contains everything that was included (the translation unit). So if the compiler emits a body for an inline function in a header included before your .c file's first function, the inline function's object code will appear before your .c file's first function's object code.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #2 on: August 02, 2021, 02:15:41 pm »
No, there is neither a guarantee for the order of functions, nor for the order of global variables.
If you want to place main() at 0x08008000, then you need to put main into a separate section, and place this section at 0x08008000 via the linker script.
 
The following users thanked this post: thm_w

Online magic

  • Super Contributor
  • ***
  • Posts: 7249
  • Country: pl
Re: GCC compiler optimisation
« Reply #3 on: August 02, 2021, 02:21:11 pm »
A macro doesn't compile to a separate function. Most likely, your ITM_SendChar is a "static inline" function declared somewhere in some header and gets inserted near the beginning of your C file as described two posts above. A separate copy is also similarly inserted into each other C file which includes that header.
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 837
  • Country: es
Re: GCC compiler optimisation
« Reply #4 on: August 02, 2021, 02:41:19 pm »
And we are back at square #1. You don't need to use tricks to solve this, spend some time on a better architecture and it will save you much more time and efforts (especially if you plan to revisit this project in 10 years, as you say). There are better approaches suggested by several people in your original thread, just pick one and ask to elaborate.

To your specific question: the only reliable way to place something at known address with GCC is to specify the placement in linker script (either by adding section attribute to main() or by the name of file containing main() - main.o(.text*)). But even if you solve this main() placement the next question will be how to place some ISR at fixed offset because your "main" part doesn't have it's own vector table and all interrupts land in your bootloader. Then you'll add another ISR and run that circle again. But a more simple solution would be to have a separate vector table for "main" placed correctly and don't bother with functions placement at all.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #5 on: August 02, 2021, 04:07:06 pm »
This is interesting - thank you.

Yes I have come across .h files generating code, previously.

This one was correctly (I believe) solved by the linker file method. With old simple tools, this was much more obvious. I recall doing a product into which the customer could load an application, linked to run at base of an EEPROM, 16k up, and that also had a similar stub, containing just one jump instruction, which was placed at the top of the module list in the linkfile.

I am not using interrupts in any of this code. Nothing before main() starts enables interrupts.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15444
  • Country: fr
Re: GCC compiler optimisation
« Reply #6 on: August 02, 2021, 04:20:53 pm »
Cube IDE, 32F417.

OK, but it has nothing to do with this actually.

I've just had a funny one. I have a boot loader which transfers control to base+32k (0x08008000) and that is where the linker script places main.o. And I put main() right at the start.
However, if you used the SWV ITM / SWD debug interface, the compiler was putting ITM_SendChar() at the start of main.o! This is actually a macro doing some inline code.
It looks like the compiler is placing inline code before normal functions. This is news to me; I thought that compilers didn't change the order of functions in a .c file :) Why should they?

I don't know where you got the idea that a C compiler would guarantee that the order of "functions" in object code would have anything to do with the order you put them in source code. Nothing of this kind has ever been guaranteed.

As said above, the only way to control code location in the final object code is to specify this at the link step, customizing the linker script.
You can control where code would go putting it in a dedicated section.

For instance, the typical way of defining the location of the "startup code" would look like this: (first section in the linker script)
Code: [Select]
.text.startupcode :
{
. = ALIGN(4);
KEEP(*(.text.startupcode))
. = ALIGN(4);
} >INSTRMEM
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #7 on: August 02, 2021, 05:29:17 pm »
Your architecture is not correct. Just make the bootloader read the reset vector address from the binary vector table and you won't have to rely on fixed addresses and will not have to force the compiler to do unnatural things. This is how all bootloaders for ARM work.

Also, you can't just jump to a random address, you need to read out initial SP value anyway. Otherwise you have a potential to run into all sorts of issues as you recompile things.

Here is a typical code that bootloaders use to run the application:
Code: [Select]
static void run_application(void)
{
  uint32_t msp = *(uint32_t *)(APPLICATION_START);
  uint32_t reset_vector = *(uint32_t *)(APPLICATION_START + 4);

  __set_MSP(msp);

  asm("bx %0"::"r" (reset_vector));
}
APPLICATION_START is the address of the application image in the flash (address of the vector table).
« Last Edit: August 02, 2021, 05:30:49 pm by ataradov »
Alex
 
The following users thanked this post: oPossum, thm_w

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #8 on: August 02, 2021, 05:33:42 pm »
In some ARM compilers, it appears from looking around the web, you can specify a particular function to be placed at a particular address. But apparently not, as far as I can find, with GCC. A while ago I spent ages on it and nothing worked.

If this was possible it would be a neater way around it, than splitting off the module entry point into a little stub.c file and then putting stub.o at a particular address in the linkfile.

In GCC you can locate a RAM buffer that way, or a general variable, but not a function.

Anyway, that this happens only when ITM_SendChar is invoked, is a gotcha, because it will happen only when debugging.

I don't understand why the stack pointer is relevant. You can set the SP to, say, the top of CCM, and it should not need to be touched through the startup, copying the boot loader to RAM, running the boot loader in RAM, etc.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #9 on: August 02, 2021, 05:38:10 pm »
With -ffunction-sections you can place individual functions anywhere you like in the normal linking process. Standard GNU LD does not support flow around, so addresses still must be incrementing monotonically. It is hard to place things in the middle of other tings. GOLD linker is supposed to address this, but I'km not sure of its progress or usability in real life.

SP is relevant because application and bootloader SPs are generally different. It is a bad idea to expect them to be the same. And even if you make them the same, if you are jumping to the application without modifying the SP, then the stack space already used by the bootloader at the time of call will be lost in the application.

You are going against established industry practices without gaining anything in return, but long term troubles. And also short term troubles, given that you have to fight the compiler and force it to do strange things.
« Last Edit: August 02, 2021, 05:47:47 pm by ataradov »
Alex
 
The following users thanked this post: newbrain

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15444
  • Country: fr
Re: GCC compiler optimisation
« Reply #10 on: August 02, 2021, 06:03:31 pm »
You can place any function in any section with just an attribute with GCC (and I guess Clang should support it too.) It's really as simple as it gets.

Code: [Select]
__attribute__((section("SomeSection"))) int SomeFunction(int n)
{
        return n*2;
}

No need to create a dedicated object file.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #11 on: August 02, 2021, 06:13:09 pm »
But you can't easily place it at a fixed address in the middle of the rest of the code. Not that dedicated object file helps with that either.

The whole issue stems from the wrong architecture. Just doing things the right way solved all of this instantly.
Alex
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15444
  • Country: fr
Re: GCC compiler optimisation
« Reply #12 on: August 02, 2021, 06:22:24 pm »
But you can't easily place it at a fixed address in the middle of the rest of the code. Not that dedicated object file helps with that either.

Not that it makes any sense as you said, but you can. You just need to lay out sections in the linker script appropriately, and put code in the respective sections as needed.
For just an entry point, it's OK and usually how things are done. Most often, you just have two code sections: a 'startup' section (whatever you name it), and the 'text' section, where the rest of the code goes. Nothing prevents you from having more than two sections. It would just be annoying to maintain. It would just be a very convoluted way of achieving what the OP wanted to achieve.

 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #13 on: August 02, 2021, 06:27:53 pm »
That's why I said "can't easily". With split address space like this you would have to do linker's job manually. Nobody would realistically do that in real projects, it is just stupid.

So fixed things are just places art the beginning or the end of the address space.

Linker in XC32 can do that automatically, you just specify the address for the function though an attribute, but it makes everything else so annoying that it is not worth it overall.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #14 on: August 02, 2021, 07:32:46 pm »
" if you are jumping to the application without modifying the SP, then the stack space already used by the bootloader at the time of call will be lost in the application."

I don't understand that. I have the SP at top of CCM so there is 64k stack space.

The boot loader gets copied to base of the 128k RAM.

I don't see a problem with stack operation. In fact it looks like the code which transfers control to the RAM-resident boot loader could even pass function parameters to it, which is done via registers or the stack (if passed by value).
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline JOEBOBSICLE

  • Regular Contributor
  • *
  • Posts: 63
  • Country: gb
Re: GCC compiler optimisation
« Reply #15 on: August 02, 2021, 07:42:08 pm »
Have a look at your stack pointer when you jump (sp register). Is it exactly the same as your original stack pointer when you started into the reset vector? Chances are it won't be.



 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #16 on: August 02, 2021, 08:13:41 pm »
I don't understand that. I have the SP at top of CCM so there is 64k stack space.

Look at how the stack looks like at the time you pass control to the application:
Bootloader reset vector is called, local variables from the reset vector are on the stack .
main() is called, main() local variables and return address to the reset handler  are on the stack
some intermediate functions are called (the ones that decide that it is time to run application), their variables are return addresses are on the stack.

All of those functions have a potential to keep stuff on the stack. Now you jump to the application without resetting the stack pointer and you have lost space at the end.

A bigger problem is that you are committing yourself to having the same SP for the remainder of the product life. You can't decided at a later time that CCM is more valuable for something else and you need to place the stack at some other location. You can switch it in the application, of course, but why even create this headache?

If you want to do things "your way", it is fine, but don't you think that the amount of topics you create with things going wrong is indicative of the fact that you are doing something wrong?

And yes, set a breakpoint at the entry point to your application and see how much stack space you lose exactly though that process.
« Last Edit: August 02, 2021, 08:16:27 pm by ataradov »
Alex
 

Offline abyrvalg

  • Frequent Contributor
  • **
  • Posts: 837
  • Country: es
Re: GCC compiler optimisation
« Reply #17 on: August 02, 2021, 08:25:25 pm »
I am not using interrupts in any of this code. Nothing before main() starts enables interrupts.
But what will happen after bootloader finishes it’s work, main() starts and enables interrupts? All interrupts will go into your bootloader (where the vector table is). How are you going to direct them to the main part? You’ll either need to do your fixed placement trick again for all ISRs in the main part (so vector table in bootloader could point to them) or provide some kind of entry point table in the main part (reinventing the wheel a vector table).
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #18 on: August 02, 2021, 08:39:16 pm »
Oh, yes, I assumed that application interrupt table follows that strange startup. With this approach interrupts will not work at all.

You can setup a new interrupt table though SCB->VTOR, but there must be a table somewhere in a first place.

Also, you can't simply jump to the application main(), you need to have startup code that copies initialized data and resets the BSS section. Note that this code is completely separate from what the bootloader is doing.

You need to treat the application as its own entity. Your system must remain operational if the whole bootloader is just a simple stub that jumps to the application. If this is not the case, you are setting yourself up for a failure.
Alex
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 221
  • Country: au
Re: GCC compiler optimisation
« Reply #19 on: August 02, 2021, 08:46:21 pm »
Since you are using the CubeIDE you can follow a very similar method to what we do with all of our ST based products that use a custom bootloader:

in the linker script for your application, set the start of flash to be the location the application will be saved to (in your case 0x8008000)
Code: [Select]
/* Specify the memory areas */
MEMORY
{
RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 128K
FLASH (rx)      : ORIGIN = 0x8008000, LENGTH = 512K - 32K
}

Then in your bootloader you can create a jump to application function wich will do a few things:
De-init any peripherals you were using in the bootloader.
You may not have to do this but it is good practice to as the hal expects peripherals to be in a power on reset state on startup, and other states can cause issues. (but not all the time)
Then it remaps the vector table to your application and sets the main stack pointer.
Then it jumps to the application:

Code: [Select]
void flash_updater_jump_to_application()
{
    platform_specific_hal_deinit(); // de-init all peripherals here

    SysTick->CTRL = 0;
    SysTick->LOAD = 0;
    SysTick->VAL = 0;
    SCB->VTOR = APPLICATION_ADDRESS;
    __set_MSP(*(__IO uint32_t *)APPLICATION_ADDRESS);
    __asm volatile(" mov r1, %0" ::"r"(APPLICATION_ADDRESS));
    __asm volatile(" ldr r1, [r1, #4]");
    __asm volatile(" bx r1");
}

an example of the hal deinit could be something like this (but obviously depends on the peripherals used by your bootloader)
Code: [Select]
void platform_specific_hal_deinit()
{
    HAL_UART_DMAStop(&COMMS_UART);
    HAL_UART_DeInit(&COMMS_UART);
    HAL_UART_DeInit(&TRACE_UART);
    HAL_CRC_DeInit(&hcrc);
    HAL_TIM_Base_Stop_IT(&htim6);
    HAL_TIM_Base_DeInit(&htim6);
}
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #20 on: August 02, 2021, 09:01:21 pm »
I would not do the deinit stuff. It is much easier to set some flag that application needs to be run on the next reset, then request a device reset. If on a reset you see the flag is set, then just run the application without initializing anything else. This way everything is guaranteed to be reset regardless of things you may forget to reset, or later additions.

I typically use first 4 words in the SRAM. I fill them with some known values to reset the application run. But there are other options for that too.
Alex
 

Offline lucazader

  • Regular Contributor
  • *
  • Posts: 221
  • Country: au
Re: GCC compiler optimisation
« Reply #21 on: August 02, 2021, 09:08:36 pm »
For sure, that works if the bootloader uses the same config for peripherals as your main application.

In our case this isnt what happens and we have had some weird system crashes even after re initing all the require peripherals in the application.
For some of our boards it worked fine to not de-init, but others it was a nightmare to track down the issues.
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1353
  • Country: de
Re: GCC compiler optimisation
« Reply #22 on: August 02, 2021, 09:09:15 pm »
Also, you can't simply jump to the application main(), you need to have startup code that copies initialized data and resets the BSS section.

Usually it is done in startup code outside main, but you could also do that with a memcpy() and memset() at the very beginning of main().
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4162
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC compiler optimisation
« Reply #23 on: August 02, 2021, 09:17:33 pm »
This thread has digressed into boot loaders :)

My boot loader exists only for programming the CPU FLASH (with a field-installable application module, or in hopefully rare cases with a complete replacement for the original factory code; another story). If one is not flashing the CPU, there is no "boot loader". Why have one? The whole thing just starts up at 0x08000000, sets up the hardware, and starts up the RTOS.

If the boot loader is used (for CPU flashing) then after it has done that, it reboots. It can't do anything else because it got loaded into the main RAM and crapped over stuff which would otherwise be running in there. The stack will also have some boot loader related crap on it. Originally I was loading it at RAM base 0x20000000 and it worked fine, except that this address is also where some data+bss for the preceeding code (the boot block) gets loaded (it has to go "somewhere") and since the loader was crapping over that, it was unable to call any boot block functions which relied on initialised data or bss. So I now put the loader halfway up the RAM, where there is absolutely nothing, it has the 64k CCM stack to play with, and it could even use most of the RAM underneath it. It's actually quite simple. The key is that it always reboots (with HAL_NVIC_SystemReset); the RAM resident code never returns anywhere.

And interrupts are not enabled at all in the loader. That complicates things too much because you have to switch the ISRs to RAM as well. .

Unless this CPU is doing something totally weird I can't see why the SP should be affected. It gets initialised as the first instruction in the .s startup.

The loader code does use the same hardware functions as the rest of the product does normally, but in any case I am doing a reset after it has run.

Using the RTC SRAM for storage is a cunning trick and we use it to store a magic number which, if not present, causes the RTC to be initialised to some sensible values (Monday 1st Jan 2021 or whatever). The problem is that it is supercap powered and the supercap might not be charged if you do weird power up/down stuff. So I am using a few bytes in a 4MB serial FLASH chip to store flags which tell the CPU to enter the boot loader module on the next boot-up, and copy stuff in the serial FLASH into the CPU FLASH. I have all this working, with lots of verification, but I am stopping just short of actually flashing the CPU until I have 100% tested it all. I posted some Q here
https://www.eevblog.com/forum/microcontrollers/32f417-best-way-to-program-the-flash-from-ram-based-code/new/#new
But perhaps you meant using the normal SRAM for this; I would not risk that not getting corrupted by a reset, and in any case it gets wiped by the startup .s code (zeroing BSS etc).
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11780
  • Country: us
    • Personal site
Re: GCC compiler optimisation
« Reply #24 on: August 02, 2021, 09:19:34 pm »
For sure, that works if the bootloader uses the same config for peripherals as your main application.
What dio you mean? The software requested MCU reset in my scenario would reset all the peripherals to their default values.

There will never be any issues, unless reset of the MCU does not fully reset the peripherals, but that would be stupid and I don't think this ever happens for practical devices.

Alex
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf