Author Topic: STM8 - interesting compiler non-optimisation  (Read 1423 times)

0 Members and 1 Guest are viewing this topic.

Offline SaimounTopic starter

  • Frequent Contributor
  • **
  • Posts: 570
  • Country: dk
STM8 - interesting compiler non-optimisation
« on: October 31, 2020, 04:13:03 pm »
Hi!

Just wanted to share something I found rather interesting. I have a simple code on my STM8 uC where I store the EEPROM values in a struct.
To optimise EEPROM usage (write cycles) I separated 16-bit values into two 8-bit values.

Code: [Select]
typedef struct EEPROM_Struct {
__IO uint8_t param1HIGH;
__IO uint8_t RESERVED1[3]; // separate each value 4 bytes apart to get minimum amount of erase/write cycles
__IO uint8_t param1LOW;
__IO uint8_t RESERVED2[3];
__IO uint8_t param2;
}
EEPROM_TypeDef;

EEPROM EEPROM_TypeDef EEPROM_ACCESS = { // initial values
10, // param1HIGH
{0}, // rsvd[3]
230, // param1LOW
{0}, // rsvd[3]
4 // param2
};

Then to read param1 I wrote a simple code which I was hoping the compiler will understand and simply move each byte into where it should be:

Code: [Select]
uint16_t currentParam1 = 0; // global variable in RAM

void main(void) {
     currentParam1 = (uint16_t)EEPROM_ACCESS.param1HIGH << 8 | (uint16_t)EEPROM_ACCESS.param1LOW;
}

I was expecting the compile to simply move the high byte into the high byte of currentParam1 and the low byte into the low byte. But it actually did the whole shifting and the OR operation!

Code: [Select]
353  0007 c60004        ld a,_EEPROM_ACCESS+4
 354  000a 97            ld xl,a
 355  000b 1f01          ldw (OFST-10,sp),x
 357  000d c60000        ld a,_EEPROM_ACCESS
 358  0010 5f            clrw x
 359  0011 97            ld xl,a
 360  0012 7b02          ld a,(OFST-9,sp)
 361  0014 01            rrwa x,a
 362  0015 1a01          or a,(OFST-10,sp)
 363  0017 01            rrwa x,a
 364  0018 bf00          ldw _currentParam1,x

It turns out that the problem were my (uint16_t) that prevented it from optimising correctly! If I change the main to:
Code: [Select]
void main(void) {
     currentParam1 = EEPROM_ACCESS.param1HIGH << 8 | EEPROM_ACCESS.param1LOW;
}

then I get more what I was expected:
Code: [Select]
353  0007 c60000        ld a,_EEPROM_ACCESS
 354  000a 97            ld xl,a
 355  000b c60004        ld a,_EEPROM_ACCESS+4
 356  000e 02            rlwa x,a
 357  000f bf00          ldw _currentParam1,x

Tricky!! A trap for young players like someone would say ha ha :D

Simon



PS: I still did not get why it is not using simply two mov instructions - I guess mov memory to memory is quite slow?
I can get it to do it by doing a strange workaround:
Code: [Select]
void main(void) {
uint8_t *tmp = (uint8_t*)&currentParam1;
tmp[0] = EEPROM_ACCESS.param1HIGH; // STM8 is big-endian
tmp[1] = EEPROM_ACCESS.param1LOW;
}

Then I get:
Code: [Select]
365  0007 5500000000    mov _currentParam1,_EEPROM_ACCESS
 368  000c 5500040001    mov _currentParam1+1,_EEPROM_ACCESS+4
 

Offline GromBeestje

  • Frequent Contributor
  • **
  • Posts: 285
  • Country: nl
Re: STM8 - interesting compiler non-optimisation
« Reply #1 on: October 31, 2020, 05:37:59 pm »
What compiler are you using? SDCC?
 

Offline SaimounTopic starter

  • Frequent Contributor
  • **
  • Posts: 570
  • Country: dk
Re: STM8 - interesting compiler non-optimisation
« Reply #2 on: October 31, 2020, 06:27:41 pm »
Forgot to mention that yes :D
I am using Cosmic compiler with the following options (in ST Visual Develop):

Code: [Select]
cxstm8 +warn +mods0 +debug -pxp +strict -pp -pc99 -l -iinc $(ToolsetIncOpts) -cl$(IntermPath) -co$(IntermPath) $(InputFile)
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6947
  • Country: fi
    • My home page and email address
Re: STM8 - interesting compiler non-optimisation
« Reply #3 on: October 31, 2020, 06:47:48 pm »
You could rely on type punning via union, too.  It was standardized in C11, but supported in practice by earlier C compilers as well:
Code: [Select]
/* TODO: Wrap the following in endianness detection for portable code */
#define  pack_u16  pack_u16be

static inline uint16_t pack_u16be(const uint8_t high, const uint8_t low)
{
    union {
        uint16_t  u16;
        uint8_t  u8[2];
    } data = { .u8 = { high, low } };
    return data.u16;
}

static inline uint16_t pack_u16le(const uint8_t high, const uint8_t low)
{
    union {
        uint16_t  u16;
        uint8_t  u8[2];
    } data = { .u8 = { low, high } };
    return data.u16;
}
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8829
  • Country: fi
Re: STM8 - interesting compiler non-optimisation
« Reply #4 on: November 01, 2020, 01:12:55 pm »
The fields are defined __IO, if you grep for the headers you'll find __IO is defined as a short-hand for volatile (+ possibly something else).

This is highly important; usually you want the IO port operations to happen exactly as you write them, not to be combined. This is significant with many peripherals: for example, a status register readout may clear flags, and you want to do that exactly once. Or reading a data register pops one from the fifo. You want to do that exactly as many times as you have written it.

It appears your particular use case would be fine with more optimized behavior. You still can't redefine this particular IO port without the volatile qualifier, because then the compiler might optimize too much, i.e., not just combine the two operations, but also perform the operation at a different place altogether (so that it has been performed when the function returns!)

The solution is indeed to type-pun via union, giving all possible access patterns you would like to use, yet still keep the volatile qualifier. If you don't want to modify the headers, you can also just type cast.

Often you need to write out read-modify-write explicitly, which isn't a bad thing:

Code: [Select]
volatile type_t volatile_thing;

type_t tmp = volatile_thing;
tmp |= 0x123;
tmp += 42;
tmp *= 3;
volatile_thing = tmp;

this code reads and writes volatile_thing once, and only once, yet you can do multiple operations on it and let compiler optimise those, because the tmp variable isn't volatile.
« Last Edit: November 01, 2020, 01:18:01 pm by Siwastaja »
 
The following users thanked this post: I wanted a rude username


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf