AFAIK the STM32F42x has a bunch of new peripherals like LCD controller and DMA2D, but also just
more of the same peripherals. So perhaps the pinmux is slightly different. Backporting a F42x project to F40x sounds tougher if you're unlucky.
That's awesome work, Hans Thank you.
Also don't know how comfortable I would be running an application with application code in HardFault_Handler though
Can it not be emulated transparently, so in effect the invalid address access generates an interrupt which emulates a real memory access?
Yes it can.. that's what the code is basically already doing. Stepping over the ptr accesses doesn't trigger any breakpoint or hint that it's handled by a HardFault handler. In principle, some hardfaults are recoverable, so perhaps that's what the ARM CPU designers had in mind when they wrote that.
The issue is that this has a very dramatic speed vs robustness trade off. If you want to add more sanity checks to the routine, then that will always cost more cycles. For example, the code I posted naively assumes that all hardfaults are the memory redirects. What if a genuine fault is generated (e.g. div by zero)? More checks would be needed to check the relevant hardfault/usage registers (and if it's not a memory redirect, then write debug trace, lock up, reset the program, etc.). More checks = bigger slowdown, even on top of what it already is.
Also the code I posted naively assumes all accesses are 32-bit wide. It sounds like that was fine, but what if it's not? Also, unaligned/aligned memory access is something to think about (Cortex M3/M4/M7 supports this, but M0+ does not, and AFAIK on RISC-V it's not mandatory). For a SPI ram that's byte addressable it's probably fine, but the way I addresses the 32-bit vmem array, it won't be... Not all register indices are tested. Unfortunately due the way how exception entry/leave works, the r0-r3 registers are in a different place than the r4-r11 registers. I hope they are in the right place, with the right arithmetic, and with the right handling of the stack push/pop.. It's all quite fiddly IMO, but I was just happy to get something
working this morning to give an impression how fast/slow it is
In terms of "generating" an interrupt: you could use a full software-triggered interrupt for that like a SVcall, however, that may be in use by some RTOS's. In this case you may also be fine by using a C function callback instead. After all, the hardfault context is already an interrupt..
Re the suggestions on paging, yes, I was doing that in the Z80/Z180 days, but there you had say 1MB physical RAM and you selected a 4k bank (typically the top 4k of the 64k address space) with an 8-bit bank # register. One never actually copied data. IAR implemented this in their Z180 compiler, "large model", but for code only, and you could not have a function bigger than the bank size. This cannot be done in a 32F4 because you don't have the address lines available, and if you did (because you aren't using all the pins for GPIO, ETH, USB, etc) then this is all moot. What might work ok is if you had multiple RTOS threads and every time the RTOS switches to Thread X it invokes a DMA transfer to save/restore the memory contents from the SPI RAM. You would need to hook into the RTOS...
That sounds like a complex solution to modify the RTOS to change it's behaviour for this special "task". I assume you'd swap out the application RAM data with the 'proprietary' protocol data, and then proceed to run the protocol code like normal.
Then if that task is done, swap the application data back in.
I imagine you would also need to block the application tasks while you're swapping things around, which doesn't sound trivial nor optimal.
Also the swapping is a write/read transaction for a pretty large block of memory. Say you would be (re)storing 4K of memory: that's 32768 bits to transfer twice, taking 1.64ms. Hoping also that any processor time for that time is continuous, as switching back is very costly.
A context switch on this STM chip is probably in the order of single-digit microseconds