Author Topic: STM 32F4 FPU registers and main() gotcha  (Read 6804 times)

0 Members and 1 Guest are viewing this topic.

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
Re: STM 32F4 FPU registers and main() gotcha
« Reply #100 on: September 09, 2024, 07:00:53 pm »
I've written a small RTOS (for RISC-V for now) and this is how I implemented it. Each "task" has parameters, among which whether it uses the FPU (and soon will be also for vector instructions). If so, during context switch, FPU registers will be saved/restored; if not, only regular registers are. Simple and effective. (But requires determining in advance which task will use the FPU and which won't. So requires some discipline while maintaining the software.)

is it like ucOS/2? or smaller?
« Last Edit: September 10, 2024, 06:25:55 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline bson

  • Supporter
  • ****
  • Posts: 2434
  • Country: us
Re: STM 32F4 FPU registers and main() gotcha
« Reply #101 on: September 09, 2024, 08:18:09 pm »
Quote
It doesn't work if there are two threads using FP and no ISR, as this will produce no stacking and the FP regs won't get saved across context switches.

IIRC, FreeRTOS saves the FPU state across task switches, so multiple tasks (threads) can all use the FPU, with lazy stacking. It certainly seems to work fine.
Yeah, actually, the scheduler when performing a context switch can do a dummy destructive FP operation, like zero an FP reg, to force the state to be stored if it hasn't already.  I like this!

It can be slightly inefficient though if the ISR uses FP and lazy stacks the state, then chains a SVC for a context switch, which then again stacks the state.  The state can't change during the chaining so theoretically wouldn't need to be restacked, but I suppose it's a reasonable price to pay for using FP in an ISR.
 

Offline bson

  • Supporter
  • ****
  • Posts: 2434
  • Country: us
Re: STM 32F4 FPU registers and main() gotcha
« Reply #102 on: September 09, 2024, 08:28:17 pm »
It is what I would expect. The FPU must issue an internal WAIT while it is working, otherwise the compiled code would need to insert a loop waiting on some FPU status and that loop would then be vulnerable to interrupts / RTOS task switching and the FPU regs would need saving.
By default FP operations complete out of order vs integer GPR operations, but this can be disabled.

If disabled, the main pipeline will stall on long-running FP operations.

Not sure what the interrupt behavior is in the first case, presumably if OOO the CPU has to stall waiting for the FP operation to finish before dispatching an interrupt.  In the second case it can abort and discard a running FP operation, so maybe this is why the OOO can be disabled - to improve interrupt latency.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15254
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #103 on: September 09, 2024, 08:58:13 pm »
I've written a small RTOS (for RISC-V for now) and this is how I implemented it. Each "task" has parameters, among which whether it uses the FPU (and soon will be also for vector instructions). If so, during context switch, FPU registers will be saved/restored; if not, only regular registers are. Simple and effective. (But requires determining in advance which task will use the FPU and which won't. So requires some discipline while maintaining the software.)

Why do it like that?

OK, it's easier to get going at the start, I guess, but lazy save/restore is pretty easy.

Well, why not? It's simple, gives you full control and has negligible cost (just needs to check a flag in the task switching routine and branch accordingly), with zero potential for save/restore bugs as long as you don't use the FPU in tasks that are defined not to support it.

I haven't really thought of doing this differently though so far - this isn't meant at this point to be a full general-purpose RTOS, widely used like FreeRTOS or such, so if it's slightly less developer-friendly, that's fine. But I admit I haven't thought about that lazy save/restore thing. How would you do this efficiently on RISC-V?
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15254
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #104 on: September 09, 2024, 09:12:13 pm »
I've written a small RTOS (for RISC-V for now) and this is how I implemented it. Each "task" has parameters, among which whether it uses the FPU (and soon will be also for vector instructions). If so, during context switch, FPU registers will be saved/restored; if not, only regular registers are. Simple and effective. (But requires determining in advance which task will use the FPU and which won't. So requires some discipline while maintaining the software.)

is it like ucOS/2? or smaller?

A bit in the same vein, but certainly smaller at this point. It's a total of about 2k LOCs (the task switching low-level routine being about 300 LOCs of assembly, the rest is C), including generic support for RV32/64 and a "port" for specifics of the CH32V307 (on which I've tested it so far, but I intend to test it on the CV1800B next, which I'm currently working on).

It supports priorities, variable time slots, sleep/yield and events tasks can wait on and signal. There's a lot more that can be added next, but so far it's enough to be usable.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4458
  • Country: nz
Re: STM 32F4 FPU registers and main() gotcha
« Reply #105 on: September 10, 2024, 12:39:45 am »
I admit I haven't thought about that lazy save/restore thing. How would you do this efficiently on RISC-V?

There are many strategies possible (policy), depending on what you want to optimise for, but the mechanism is well described over four pages in section 3.1.6.6 "Extension Context Status in mstatus Register" on e.g. p25 and following of riscv-privileged-20211203.pdf

Read that and ask if you have any remaining questions.

Note that according to the V ABI the vector state is undefined after any function call, of which system calls are a subset, so on any syscall the OS is entitled to set the current VS to "off" (and future use of V after the syscall returns will trap, at which point the OS can clear the state to 0s) or alternatively clear all the registers right away and set the current VS to "initial", which means it won't need to be saved on the next task switch if no more V instructions are executed. So the (rather large) V state has to be saved and restored only on a timeslice expired task switch (if dirty), not on a synchronous system call task switch.

So VS and FS differ in that way, by OS/ABI policy, not in hardware.

> zero potential for save/restore bugs as long as you don't use the FPU in tasks that are defined not to support it

That means you have to know for sure what all the code in all the libraries you call does. Practical in small embedded software, not in desktop software. How do you know printf() doesn't run any FP instructions at all, even if there are no %f in the format string? Or any other library you didn't write.

On the other hand, your scheme does allow the skilled developer to lie to the OS, say you don't use FP, but then actually do some small amount of FP computation, using a limited register set, and saving and restoring only the registers actually used (and the FP state: rounding mode, accrued exception flags, ...)

That is, if the OS doesn't police it by turning the FPU off for threads that say they don't use it, and trapping and printing an error message if they do. That would also solve the "how do I know printf() doesn't use FP?" problem.

Or I guess you could start with "uses FP" turned off for all threads, and turn it on the first time they use an FP instruction -- and then leave it on permanently, saving and restoring their FP registers on every task switch from then on.

It might be ideal to have your "I use FP" / "I don't use FP" flag be instead "I use FP heavily, please eagerly restore my registers" / "I don't use FP, or rarely, please restore my FP registers lazily".
 
The following users thanked this post: Nominal Animal

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15254
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #106 on: September 10, 2024, 06:24:49 am »
I admit I haven't thought about that lazy save/restore thing. How would you do this efficiently on RISC-V?

There are many strategies possible (policy), depending on what you want to optimise for, but the mechanism is well described over four pages in section 3.1.6.6 "Extension Context Status in mstatus Register" on e.g. p25 and following of riscv-privileged-20211203.pdf

Going to read that.

That means you have to know for sure what all the code in all the libraries you call does. Practical in small embedded software, not in desktop software. How do you know printf() doesn't run any FP instructions at all, even if there are no %f in the format string? Or any other library you didn't write.

Yes, as I said, it's not meant as a general-purpose OS, let alone anything desktop-like!
But I may make it evolve in that direction. I'm certainly going to consider implementing lazy save/restore once I grabbed the concepts in those few pages.

On the other hand, your scheme does allow the skilled developer to lie to the OS, say you don't use FP, but then actually do some small amount of FP computation, using a limited register set, and saving and restoring only the registers actually used (and the FP state: rounding mode, accrued exception flags, ...)

That is, if the OS doesn't police it by turning the FPU off for threads that say they don't use it, and trapping and printing an error message if they do. That would also solve the "how do I know printf() doesn't use FP?" problem.

Or I guess you could start with "uses FP" turned off for all threads, and turn it on the first time they use an FP instruction -- and then leave it on permanently, saving and restoring their FP registers on every task switch from then on.

It might be ideal to have your "I use FP" / "I don't use FP" flag be instead "I use FP heavily, please eagerly restore my registers" / "I don't use FP, or rarely, please restore my FP registers lazily".

Well, eventually I'll probably keep the "manual" approach as an option (so if you use that, you'll be supposed to know what you're doing), and otherwise make the lazy save/restore mechanism as the default.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4458
  • Country: nz
Re: STM 32F4 FPU registers and main() gotcha
« Reply #107 on: September 10, 2024, 06:38:00 am »
section 3.1.6.6 "Extension Context Status in mstatus Register" on e.g. p25 and following of riscv-privileged-20211203.pdf

Going to read that.

One key thing to keep in mind right from the start:

- modifying mstatus.FS does not alter the the FP registers or FP status CSRs in any way.  Setting it to "off" does not clear the registers or anything like that. If you later set the status to any of the other values you can then read and use the previous contents as if nothing happened. The same applies to setting FS to "initial". That doesn't clear the registers -- it's you recording the fact that you HAVE cleared them.

The only other thing is the effect that FS has on normal FP instructions, and vice versa:

- if FS is "off" then any FP instruction use will trap

- if FS is "initial" or "clean" then any FP instruction that alters FP registers/CSRs will change FS to "dirty".
 
The following users thanked this post: SiliconWizard, Nominal Animal


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf