I think I'll continue this here rather than in the more specific thread about how the magic return value works.
I'm looking at an NXP document:
https://www.nxp.com/docs/en/application-note/AN12078.pdfIt lists interrupt latency for various cores as:
CPU core | Cycles |
Cortex-M0 | 16 |
Cortex-M0+ | 15 |
Cortex-M3/M4 | 12 |
Cortex-M7 | 10~12 |
This document shows toggling a GPIO pin on and off after a timer interrupt (which also sends a signal to an output pin) using the following code on an i.MX RT1050 (Cortex-M7) with zero wait state memory:
LDR.N R0, [PC, #0x78] ; GPIO2_DR
MOV.W R1, #8388608 ; 0x800000
STR R1, [R0]
MOVS R1, #0
STR R1, [R0]
BX LR ; not shown but I assume
With an oscilloscope they get figures of 10 cycles to enter the interrupt handler, 34 cycles to toggle the pin on, 32 cycles to toggle the pin off. (STR to IO space is much slower than the core speed)
Cortex-M is easy to use, and that's cool, but very "one size fits all". There WILL have been 8 words of stuff stacked by the time you get to the first instruction in your own handler code.
RISC-V instead puts you in the handler with only a pipeline flush of delay (typically 2-3 cycles), but nothing at all has been saved. But it does give you flexibility.
There are some examples in:
https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#interrupt-handling-softwareHere's a simple non-preemptable interrupt handler that just increments a counter in RAM.
addi sp, sp, -8 # Create a frame on stack.
sw a0, 0(sp) # Save working register.
sw a1, 4(sp) # Save working register.
lui a0, %hi(INTERRUPT_FLAG)
sw x0, %lo(INTERRUPT_FLAG)(a0) # Clear interrupt flag.
lui a1, %hi(COUNTER)
addi a1, a1, %lo(COUNTER) # Get counter address.
li a0, 1
amoadd.w x0, (a1), a0 # Increment counter in memory.
lw a1, 4(sp) # Restore registers.
lw a0, 0(sp)
addi sp, sp, 8 # Free stack frame.
mret # Return from handler using saved mepc.
I've rearranged that slightly from the code at the link, expanding two pseudo-instructions, assigning concrete frame size, and scheduling and grouping instructions for a hypothetical simple in-order dual-issue core that can do two stores (into a store buffer) or two ALU ops in the same clock cycle, and the 2nd ALU op can depend on the first one (skewed pipes). If I understand the materials I found properly, this is right for the Cortex-M7, so I'm assuming similar µarch for a RISC-V.
What we see is that we're already into the first instruction of the actual useful interrupt code with two working registers available on the 6th clock cycle (3rd for dual-issue), or probably 9 and 6 cycles respectively once you add the pipeline refill.
This same example needs only the amoadd modified to instead set or clear a GPIO pin. Something like reading a character from a UART buffer and writing it into a software buffer could be done with the same two working registers and a handful more instructions.
There is example code at ...
https://github.com/riscv/riscv-fast-interrupt/blob/master/clic.adoc#c-abi-trampoline-code... for enabling interrupt handlers to be written as standard ABI C functions, with support for interrupt chaining and late-arrival of high priority interrupts. There is extensive commentary there of which parts are run with interrupts disabled and which with interrupts enabled, and also how it all works in general.
The code there is for the standard RISC-V ABI, which requires 16 registers to be saved, vs 8 (including PSW) on Cortex-M.
There are proposals to define an "embedded ABI" with fewer argument registers (perhaps 4 like ARM, vs 8 normally) and fewer temporary registers (perhaps 2 instead of 7) so that only maybe 7 registers need to be saved. While this would certainly make interrupt latency for C handlers much lower, experiments with modifying the compiler for this ABI show slow down and code expansion of normal mainline (background code) of up to 30% because of all the extra register spills required.
So unless the interrupt rate is extremely high or the background processing undemanding it's probably better to stick with the standard ABI! And if there is some particular interrupt that needs very low latency, it can always be written in assembly language. Or in C using __attribute__((interrupt)), which saves only the registers the function actually uses -- calling a normal ABI function from the interrupt function results in a full register save.