Author Topic: Fast code in ISR  (Read 3701 times)

0 Members and 1 Guest are viewing this topic.

Offline JesterTopic starter

  • Frequent Contributor
  • **
  • Posts: 887
  • Country: ca
Fast code in ISR
« on: July 23, 2023, 03:04:29 pm »
I'm tweaking some existing code and want to avoid slowing this section of code down because it's in an ISR. This question pertains to the GPIO_SET_STEP; line of code below.

The I/O pin being manipulated is used to drive the step pin of an ac servo controller. Presently the active state of this pin is #defined, so pre compile time defined. Need to make the active state of this pin user configurable. The user configurable aspect is now coded and a bool invertStep gets set or cleared during the initialization portion of the code and is then available for the code below. I want to avoid adding an if statement in the ISR where GPIO_SET_STEP; is (if possible). The actual pin manipulation code required is shown in the comments. Is there a way to substitute the required code and avoid the if statement in the ISR?

EDIT: uC = TMS320F280049
Code: [Select]

inline void ServoDrive :: ISR(void)
{
        switch( this->state ) {

        case 0:
            if( this->desiredPosition < this->currentPosition ) {
                GPIO_SET_STEP;                                   // clear if inverted and set if not inverted         GpioDataRegs.GPACLEAR.bit.GPIO0 = 1; OR  GpioDataRegs.GPASET.bit.GPIO0 = 1;  /// this line
                this->state = 2;
            }
            else if( this->desiredPosition > this->currentPosition ) {

« Last Edit: July 23, 2023, 10:40:34 pm by Jester »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4778
  • Country: dk
Re: Fast code in ISR
« Reply #1 on: July 23, 2023, 03:27:03 pm »
have you looked at the assembly code and determined that it actually makes enough of a difference to be worth worrying about?

I guess you could make a pointer to the set/clear address, but that also needs to be read so it might be worse
 
The following users thanked this post: Jester

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6277
  • Country: es
Re: Fast code in ISR
« Reply #2 on: July 23, 2023, 03:42:01 pm »
Why so obsessed to avoid the if/else?
The switch will make a branch anyways, one more won't hurt.
Lacking actual code, first do what you  need, then optimize.
If invertStep is #defined then you can simply do:

Code: [Select]
GpioDataRegs.GPACLEAR.bit.GPIO0 = invertStep && 1;Or
Code: [Select]
GpioDataRegs.GPACLEAR.bit.GPIO0 = !invertStep;

Though I barely understood the message.
« Last Edit: July 23, 2023, 11:40:10 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 
The following users thanked this post: Jester

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4064
  • Country: ua
Re: Fast code in ISR
« Reply #3 on: July 23, 2023, 03:55:24 pm »
The actual pin manipulation code required is shown in the comments. Is there a way to substitute the required code and avoid the if statement in the ISR?

Yes. It is possible, you can have two version of ISR handlers for each if statement state and write proper handler into a vector table when statement state is changed. But be careful, because interrupt may happens when you update vector table.

Additional suggestions to improve performance:
1) Do not use OOP and objects, because it requires to involve object pointer as context and as result needs to do additional operations
2) Enable max speed optimizations in your compiler
3) If your ISR code uses floating point operation, make sure that hardware acceleration for floating point is enabled.
« Last Edit: July 23, 2023, 03:57:27 pm by radiolistener »
 
The following users thanked this post: Jester

Offline JesterTopic starter

  • Frequent Contributor
  • **
  • Posts: 887
  • Country: ca
Re: Fast code in ISR
« Reply #4 on: July 23, 2023, 05:30:17 pm »

Additional suggestions to improve performance:
1) Do not use OOP and objects, because it requires to involve object pointer as context and as result needs to do additional operations
2) Enable max speed optimizations in your compiler
3) If your ISR code uses floating point operation, make sure that hardware acceleration for floating point is enabled.

That's interesting. Are you implying that if I were to rewrite this as a conventional C function that it would run a tiny bit quicker?

I have actually been converting other portions of this code to conventional C for simplicity.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4778
  • Country: dk
Re: Fast code in ISR
« Reply #5 on: July 23, 2023, 06:03:51 pm »
The actual pin manipulation code required is shown in the comments. Is there a way to substitute the required code and avoid the if statement in the ISR?
Additional suggestions to improve performance:
1) Do not use OOP and objects, because it requires to involve object pointer as context and as result needs to do additional operations

are you sure? depending on platform I'm not sure a pointer makes much of a difference
 
The following users thanked this post: Jester

Offline jnk0le

  • Regular Contributor
  • *
  • Posts: 79
  • Country: pl
Re: Fast code in ISR
« Reply #6 on: July 23, 2023, 06:10:53 pm »
Unless virtual inheritance is used, this "OOP" code should perform exactly the same as C, given that all variables are kept in struct.
Otherwise "simple C" approach could get worse like in this thread:https://www.eevblog.com/forum/microcontrollers/stm32g431-how-can-it-be-that-the-amount-of-lines-of-code-in-main()-affect-isr/
(globals:2.56us vs structure:1.96us)
« Last Edit: July 23, 2023, 06:17:30 pm by jnk0le »
 
The following users thanked this post: Jester

Online Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3902
  • Country: nl
Re: Fast code in ISR
« Reply #7 on: July 23, 2023, 06:55:20 pm »
I've always been uncomfortable with an ISR as a part of a class.
Classes are built around having the ability to have multiple instances around, while having multiple instances of an ISR in general does not make much sense.
In a lot of ways the code looks "cleaner" when the ISR is part of a class, but I'm always struggling to attempt to do so without generating compiler errors or warnings, especially when compiling under the later C++ standards.

When writing the ISR as a generic C function, you can drop all the references to the this pointer, and use generic simple variables instead. Global variables are also "ugly", but you can use either automatic or static variables in your C ISR function.
 
The following users thanked this post: Jester

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 28102
  • Country: nl
    • NCT Developments
Re: Fast code in ISR
« Reply #8 on: July 23, 2023, 07:01:15 pm »
Unless virtual inheritance is used, this "OOP" code should perform exactly the same as C, given that all variables are kept in struct.
Otherwise "simple C" approach could get worse like in this thread:https://www.eevblog.com/forum/microcontrollers/stm32g431-how-can-it-be-that-the-amount-of-lines-of-code-in-main()-affect-isr/
(globals:2.56us vs structure:1.96us)
Indeed. With OOP the C++ compiler can do a much better job at optimising compared to functions that deal with function pointers. Also, typically ARM code fetches constants from PC+offset so you'll be looking at a pointer operation anyway because it is the fastest solution.
« Last Edit: July 23, 2023, 07:55:06 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: Jester

Offline jnk0le

  • Regular Contributor
  • *
  • Posts: 79
  • Country: pl
Re: Fast code in ISR
« Reply #9 on: July 23, 2023, 07:16:16 pm »
I've always been uncomfortable with an ISR as a part of a class.
Classes are built around having the ability to have multiple instances around, while having multiple instances of an ISR in general does not make much sense.
In a lot of ways the code looks "cleaner" when the ISR is part of a class, but I'm always struggling to attempt to do so without generating compiler errors or warnings, especially when compiling under the later C++ standards.

When writing the ISR as a generic C function, you can drop all the references to the this pointer, and use generic simple variables instead. Global variables are also "ugly", but you can use either automatic or static variables in your C ISR function.

It's usually something like:
Code: [Select]
ServoDrive instance1();

extern "C"
void TIM16_IRQHandler(void) {
    TIM16->SR = 0; // clear pending
    instance1.ISR();
}

 
The following users thanked this post: Jester

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3932
  • Country: us
Re: Fast code in ISR
« Reply #10 on: July 23, 2023, 07:43:34 pm »

Additional suggestions to improve performance:
1) Do not use OOP and objects, because it requires to involve object pointer as context and as result needs to do additional operations
2) Enable max speed optimizations in your compiler
3) If your ISR code uses floating point operation, make sure that hardware acceleration for floating point is enabled.

That's interesting. Are you implying that if I were to rewrite this as a conventional C function that it would run a tiny bit quicker?

If you have C++ virtual function calls and you replace them direct C calls, it may be faster, but if you replace C++ objects with actually equivalent C it will probably be slower.  That's because C++ compilers are much more able to statically analyze the types and replace virtual function calls with inline code to the concrete method than they are to do the same with C using function pointers.
 
The following users thanked this post: Jester

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2735
  • Country: us
Re: Fast code in ISR
« Reply #11 on: July 23, 2023, 07:45:40 pm »
Since the inversion is configured at startup and presumably fixed, the fastest "branchless" implementation would be to set up a pointer to whichever register should be used, as langwadt suggested in the first reply. 

A branchless implementation of the inversion logic in the ISR would just be a more complicated way of doing the same thing.  You might do something like:

Code: [Select]
*(uint32_t*)(( (!outputInverted) * (uint32_t)&outSet ) + ( outputInverted * (uint32_t)&outClr )) = pinMask;
or
Code: [Select]
   
const int32_t offset = ((uint32_t)&outClr) - ((uint32_t)&outSet);
*(uint32_t*)((uint32_t)&outSet + (offset * outputInverted)) = pinMask;

Some ugly casts are necessary to allow arbitrary arithmetic on pointer addresses, the specific types cast to will depend on the platform and register sizes.  This incurs 3-4 arithmetic operations not counting the final assignment, plus some of the arguments may need to be loaded with separate instructions, versus a single assignment from a constant.  The harder question is if that will be faster than a branch, which depends entirely on the platform.  An equally important question is what you care the most about: minimum latency, most consistent latency, or minimum time spent in the ISR?
 
The following users thanked this post: Jester

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15441
  • Country: fr
Re: Fast code in ISR
« Reply #12 on: July 23, 2023, 07:53:08 pm »
Use uintptr_t to cast pointers to integers and conversely in a portable way.
 
The following users thanked this post: ajb, newbrain, Jester

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3251
  • Country: ca
Re: Fast code in ISR
« Reply #13 on: July 23, 2023, 09:07:27 pm »
If you can limit your operations to pin toggling (that is pin always change the state when you manipulate it), then you simply toggle it (Many MCUs have registers to toggle pins). In such case, the polarity is only needs to be set once at the beginning.

Otherwise, if you MCU has SET and CLR registers for the pin, then store their addresses in pointer variables at the beginning, say

Code: [Select]
if (want_to_invert) {
  clear_reg = &SET;
  set_reg = &CLR;
} else {
  set_reg = &SET;
  clear_reg = &CLR;
}

Then you simply write a mask into the registers using these pointers.

Code: [Select]
*set_reg = MASK; // to set (clear if inverted)
*clear_reg = MASK; // to clear (set if inverted)

This is probably the fastest.

But the most important thing is this: How fast does it need to be? How fast it is now? Until you know the answer to these questions, trying to make it faster is simply waste of your time.

Then, try to optimize your whole ISR. Often it'll give you more gain than the micro level.
« Last Edit: July 23, 2023, 09:09:48 pm by NorthGuy »
 

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4064
  • Country: ua
Re: Fast code in ISR
« Reply #14 on: July 23, 2023, 10:42:56 pm »
That's interesting. Are you implying that if I were to rewrite this as a conventional C function that it would run a tiny bit quicker?

Yes, but it doesn't means that any code written in C is faster, you're needs to write it properly in order to get max speed. If you wanna even higher speed optimization it requires to write in ASM, but it takes a lot of time and needs to take into account a lot of things, so it has a sense only if you're really want to be on the cutting edge of maximum performance. Using ASM will be time waste if you write usual code which is not performance critical.

are you sure? depending on platform I'm not sure a pointer makes much of a difference

yes, when even 1 extra CPU tact is critical it makes a big difference. But for a usual task where speed is not critical there is no big difference if you're using plain C or C++ with objects.

Using low level language is much more complicated and needs to keep in mind a lot of thigs, but it allows to be more flexible. On the other hand higher level language make things more easy and simple, but less flexibility for performance tuning. Both ways have pros and cons, and the best solution depends on your actual needs.
« Last Edit: July 23, 2023, 11:07:33 pm by radiolistener »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4778
  • Country: dk
Re: Fast code in ISR
« Reply #15 on: July 23, 2023, 11:03:28 pm »
That's interesting. Are you implying that if I were to rewrite this as a conventional C function that it would run a tiny bit quicker?

Yes, but it doesn't means that any code written in C is faster, you're needs to write it properly in order to get max speed. If you wanna even higher speed optimization it requires to write in ASM, but it takes a lot of time and needs to take into account a lot of things, so it has a sense only if you're really want to be on the cutting edge of maximum performance. Using ASM will be time waste if you write usual code which is not performance critical.

are you sure? depending on platform I'm not sure a pointer makes much of a difference

yes, when even 1 extra CPU tact is critical it makes a big difference. But for a usual task where speed is not critical there is no big difference if you're using plain C or C++ with objects.

you don't know it if it takes more or less cycles unless you try it ...

 

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4064
  • Country: ua
Re: Fast code in ISR
« Reply #16 on: July 23, 2023, 11:11:13 pm »
you don't know it if it takes more or less cycles unless you try it ...

if you know that it should do more things (like get object context pointer, etc), then you can be sure that it cannot take less cycles and will require more cycles. At best, the same amount of cycles, but only with a successful combination of circumstances. If it was written properly of course. Usually it will be slower and as practice shows, OOP code requires more memory, which is bad for MCU, where code FLASH is just 128 kB and RAM is just 4-32 kB.

Of course you can write functional code in a plain C which can be slower than object oriented code in C++. And can write functional code in a plain C which will be more faster. It all depends on how you write it. The main idea here is to use lower level coding in order to make sure that performance critical code (such as ISR handler) doesn't use extra operations (such as object context pointers, vtables, etc).


For example, I just bought Chinese blackpill board with STM32G431CBU6 and played with it's DAC. Adding just one assign operation in ISR handler leads to the fact that I need to reduce DAC sampling rate for about 300 kHz so that the TMR ISR handler has time to update the DAC data in time. So I get 3.8 MHz DAC sample rate without that assignment operation and 3.5 MHz DAC sample rate with adding just one assignment operation.

Tried to generate sine on it :) Works pretty good, it's 12 bit DAC produce pretty clean sine.
« Last Edit: July 23, 2023, 11:45:37 pm by radiolistener »
 

Offline JesterTopic starter

  • Frequent Contributor
  • **
  • Posts: 887
  • Country: ca
Re: Fast code in ISR
« Reply #17 on: July 23, 2023, 11:29:45 pm »
If you can limit your operations to pin toggling (that is pin always change the state when you manipulate it), then you simply toggle it (Many MCUs have registers to toggle pins). In such case, the polarity is only needs to be set once at the beginning.

Otherwise, if you MCU has SET and CLR registers for the pin, then store their addresses in pointer variables at the beginning, say

Code: [Select]
if (want_to_invert) {
  clear_reg = &SET;
  set_reg = &CLR;
} else {
  set_reg = &SET;
  clear_reg = &CLR;
}

Then you simply write a mask into the registers using these pointers.

Code: [Select]
*set_reg = MASK; // to set (clear if inverted)
*clear_reg = MASK; // to clear (set if inverted)

This is probably the fastest.

But the most important thing is this: How fast does it need to be? How fast it is now? Until you know the answer to these questions, trying to make it faster is simply waste of your time.


My goal is to not add any additional time while in the ISR. Latency is not important.

Your suggestion seems like what I'm looking for.
Bear with me I'm not a programmer. If I understand what your suggesting I need to determine the addresses of:
address 1: GpioDataRegs.GPACLEAR.bit.GPIO0 = 1
address 2: GpioDataRegs.GPASET.bit.GPIO0 = 1
 
Assuming:
1)  I have a global bool InvertStepPin that is set at startup and does not change

Then in the initialization code I use:

uint16_t clear_reg, set_reg;

Code: [Select]
if (InvertSetPin) {
  clear_reg = &GpioDataRegs.GPASET.bit.GPIO0 = 1;
  set_reg = &GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;
} else {
  set_reg = &GpioDataRegs.GPASET.bit.GPIO0 = 1;
  clear_reg = &GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;
}


assuming InvertSetPin is true, then this line:  GPIO_SET_STEP;               //  executes:  GpioDataRegs.GPACLEAR.bit.GPIO0 = 1 if InvertStepPin is true,  or exceutes:  GpioDataRegs.GPASET.bit.GPIO0 = 1 if Inv
gets replaced with: set_reg;
Does that make sense?
Did I get the syntax correct?
« Last Edit: July 23, 2023, 11:41:25 pm by Jester »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4778
  • Country: dk
Re: Fast code in ISR
« Reply #18 on: July 23, 2023, 11:50:26 pm »
you don't know it if it takes more or less cycles unless you try it ...

if you know that it should do more things (like get object context pointer, etc), then you can be sure that it cannot take less cycles and will require more cycles. At best, the same amount of cycles, but only with a successful combination of circumstances.

maybe..

For example, I just bought Chinese blackpill board with STM32G431CBU6 and played with it's DAC. Adding just one assign operation in ISR handler leads to the fact that I need to reduce DAC sampling rate for about 300 kHz so that the TMR ISR handler has time to update the DAC data in time. So I get 3.8 MHz DAC sample rate without that assignment operation and 3.5 MHz DAC sample rate with adding just one assignment operation.

sure, but that doesn't show that one language will generate more code than the other

 

Offline HwAoRrDk

  • Super Contributor
  • ***
  • Posts: 1564
  • Country: gb
Re: Fast code in ISR
« Reply #19 on: July 24, 2023, 12:18:18 am »
Code: [Select]
if (InvertSetPin) {
  clear_reg = &GpioDataRegs.GPASET.bit.GPIO0 = 1;
  set_reg = &GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;
} else {
  set_reg = &GpioDataRegs.GPASET.bit.GPIO0 = 1;
  clear_reg = &GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;
}

You'll probably find that doesn't work, because it's likely that the .bit.GPIOx members are bitfields, and you can't take the address of a bitfield with the '&' operator. You'll need to use .all with a bitmask as suggested.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3932
  • Country: us
Re: Fast code in ISR
« Reply #20 on: July 24, 2023, 01:18:04 am »
you don't know it if it takes more or less cycles unless you try it ...

if you know that it should do more things (like get object context pointer, etc), then you can be sure that it cannot take less cycles and will require more cycles.

In the code discussed here, there is no reason  C++ code using  objects should need to do any more than a C function.  If there is no state data, the C++ code won't have to fetch it and if there is state data you would have to fetch it in C as well.

There are constructs in C++ that do have overhead, and some that have surprisingly high overhead (exceptions and RTTI being the usual examples).  But just putting your data in objects isn't usually one of them, and nothing about the specific code shown suggests that there would be any (performance) advantage to rewriting in C.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3251
  • Country: ca
Re: Fast code in ISR
« Reply #21 on: July 24, 2023, 01:30:51 am »
Did I get the syntax correct?

Not exactly. The pointer must point to whole bytes, so you do

Code: [Select]
uint16_t *clear_reg, *set_reg; // pointers, not integers

if (InvertSetPin) {
  clear_reg = &GpioDataRegs.GPASET; // whole registers, not bits
  set_reg = &GpioDataRegs.GPACLEAR;
} else {
  set_reg = &GpioDataRegs.GPASET;
  clear_reg = &GpioDataRegs.GPACLEAR;
}

...

*clear_reg = 1 << pin_number;
*clear_reg = 0x01; // clearing (setting if inverted) pin 0

 
The following users thanked this post: Jester

Offline radiolistener

  • Super Contributor
  • ***
  • Posts: 4064
  • Country: ua
Re: Fast code in ISR
« Reply #22 on: July 24, 2023, 01:59:24 am »
and nothing about the specific code shown suggests that there would be any (performance) advantage to rewriting in C.

nobody said that any kind of rewriting code in C is enough to give you speed advantage. It depends on how you rewrite it. You can use object oriented style and objects in C code, but it doesn't means that it will run faster. But usual functional C code style has better performance and less memory usage than object oriented C++ code style.

The same thing with ASM, rewriting code in ASM doesn't means that it give you any performance advantage. On the contrary, in most cases your first code rewritten in ASM will be worse than code in C. But ASM allows you more ways for optimizations and if you take into account all details and utilize all available options in ASM, then you can get the code which will be faster than code written in any high level language. But it will be much more complicated to write it in ASM and such ASM code will be almost non-readable for human.

« Last Edit: July 24, 2023, 02:09:55 am by radiolistener »
 
The following users thanked this post: Jester

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2735
  • Country: us
Re: Fast code in ISR
« Reply #23 on: July 24, 2023, 02:14:15 am »
Did I get the syntax correct?

Not exactly. The pointer must point to whole bytes, so you do

Code: [Select]
uint16_t *clear_reg, *set_reg; // pointers, not integers

if (InvertSetPin) {
  clear_reg = &GpioDataRegs.GPASET; // whole registers, not bits
  set_reg = &GpioDataRegs.GPACLEAR;
} else {
  set_reg = &GpioDataRegs.GPASET;
  clear_reg = &GpioDataRegs.GPACLEAR;
}

...

*clear_reg = 1 << pin_number;
*clear_reg = 0x01; // clearing (setting if inverted) pin 0

If you really want to retain the bitfield syntax, you could find the type definition for the bitfield from the device header files and use that as the pointer type.  That would be sort of silly for sequential gpio bits, but might be more valuable for a more complex peripheral, like switching between UARTs or something.
 
The following users thanked this post: Jester

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 855
Re: Fast code in ISR
« Reply #24 on: July 24, 2023, 03:33:59 am »
As already mentioned, the use of a bitfield for a SET/CLR/TOGGLE type register makes no sense. You end up doing a read/modify which is not needed. The bitfields lead you to believe you are acting on a single bit, but that is not what actually happens when the compiler generates the code.

    GpioDataRegs.GPASET.bit.GPIO0 = 1; //read, modify, write, although the write is atomic, the read/modify is not needed

    GpioDataRegs.GPASET.all = 1<<0; //write pin bitmask, no need read/modify

If you want to tweak, there is no way to know what is better/worse until you see the asm output. If you are not viewing the asm output, then you are simply guessing and may still end up with something worse. I have a hard time believing just about any way you make a pin change would make any difference. Write readable code, use the registers in a optimal way as describe above, and forget about trying to shave a handful of cpu cycles in every piece of code as its not worth the effort.
 
The following users thanked this post: Jester


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf