Author Topic: STM 32F4: any way to make it self program its FLASH from an external SPI FLASH?  (Read 1914 times)

0 Members and 3 Guests are viewing this topic.

Offline fchk

  • Frequent Contributor
  • **
  • Posts: 255
  • Country: de
Yes; the SWD debugger route is another way. Probably the most robust of them all.

I'd use the ROM bootloader and reserve JTAG/SWD for development only. You would have the option to disable SWD while keeping the non-brickable update. Plus the bootloader is easier to use.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
I tend to agree - because a 2nd CPU which controls /NRST and BOOTx pins should always be able to wake up the 1st CPU, since the factory bootloader cannot be corrupted. Well, in reality the factory bootloader is almost certainly in FLASH as well, but with an undocumented way of reprogramming ;)

I mentioned SWD earlier because - short of somebody having set RDP2 - it will always be able to bring up the CPU into a known state.

And if the firmware contains code for setting RDP2 then the risk goes way up. Ideally, RDP2 should be selected only in the factory (ST Utility or whatever it is called now, has a tab for the various option bytes) and not in the firmware.
« Last Edit: September 23, 2024, 01:38:29 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
Don't forget that the ST's bootloader is a mysterious black box outside of your control. It might have bugs that pop up in some rare edge cases triggered by... something. And it's also subject of random changes between batches of chips. I'm specifically concerned on the mechanisms inside the ST's code which decide which interface they try to listen to. That could be triggered from some other peripheral (e.g. I2C/UART) pins having some unlucky activity. Say, some timing thing that after a certain delay after NRST some I2C device does something that causes the bootloader to enter I2C mode and stop listening to your desired SPI upgrade path. And maybe that never happens at lab, but with some oscillator start-up time changing or temperature shift or whatever now it starts to repeatedly happen at a customer -> device effectively bricked. And as your 2nd MCU is locked-down design you can't do anything about it, even if fix was simple in a lab.

I'm not saying failure is likely but we are talking about proving the non-brickability, and the alternative scheme was dismissed because some very improbable failure pattern (runaway PC hitting exactly after the function address input validation) to begin with, so I'm a bit sceptical this interface-to-ST-bootloader being any more robust than the usual single-chip single-bootloader solution.

I'm always worried when some relatively simple and easily analyzable failure mode is replaced by a lot of extra complexity and unknowns.
« Last Edit: September 24, 2024, 10:54:39 am by Siwastaja »
 
The following users thanked this post: Nominal Animal

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
Quote
some other peripheral (e.g. I2C/UART) pins having some unlucky activity

Just looked it up in the bootloader appnote
https://www.st.com/resource/en/application_note/an2606-stm32-microcontroller-system-memory-boot-mode-stmicroelectronics.pdf

The 417 and 437 (I am using both of these) both look for 0x7f on either usart1 or usart3, and this is sufficient to make it look for the rest of the data. I am curious about what happens if a peripheral is not used, due to a lack of package pins. Its clock should not be enabled, obviously. In my project usart1 and 3 are both used but e.g. usart 4 and 5 are not. Now what happens if somebody is not using usart1. Could its RX pin be floating? Hopefully not!

I am not using the factory boot loader but I agree there are lots of funny gotchas. Especially as ST change the way it works (though perhaps not for the uarts).
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 2790
  • Country: ca
Yes. Incidentally I recall an STM32 project (not mine) where the customer demanded a provably non-brickable firmware upgrade feature, and a 2nd CPU was the only way to do it.
That actually makes the problem worse as you now have 2 different firmwares to manage and upgrade.
The proper way to implement is to use a write-protected golden image (most QSPI flash devices allow to write-protect a portion of entire array), plus one/two/however-many-you-want update "slots" so that these new slots are attempted first, and "golden" image is only booted if new images fail to boot.

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3522
  • Country: it
What's wrong with just writing your own bootloader? 
Have I missed something?

I keep asking myself the same. I had to do a CAN bootloader once, then i went on reading other manufacturers, looking for inspiration. I stumbled upon the various ST Bootloader APIs and i decided to look no more and do my own, again. It was as if CAN didn't define already in the standard a way to handle multipacket transactions (CAN-TP) and they had to make the most basic, stupid protocol they could think of
Then as other mentioned, by trusting the manufacturer provided bootloader you end up relying on a black box. Good luck. Your own bootloader has your own bugs, true, but also your own features.
 
The following users thanked this post: Siwastaja

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
I keep asking myself the same. I had to do a CAN bootloader once, then i went on reading other manufacturers, looking for inspiration. I stumbled upon the various ST Bootloader APIs and i decided to look no more and do my own, again.

Heh, last time I did a upgrade-by-CAN thing on STM32 I also thought about taking a look at ST's own bootloader. After all, it could be handy - maybe I could save time and find some command line tool already written by someone else.

Go figure. ST thought that they don't have enough different protocols already and had a shiny new protocol on H7 series back then, not compatible with any existing code. And most funnily, the protocol was FD-only, so unable to flash the device using standard CAN frames. I didn't even have a CANFD adapter back then at my disposal. What do you need to smoke to come up with this kind of design limitation?

The simple USART bootloader I have used a lot, but then again it also requires forking some of the many tool projects and making modifications in their codebases. Or use some Windows tool from ST. What PITA. Just simpler to decide on your own protocol and do both sides at once.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
One advantage of the factory loader is that it "cannot" be corrupted by a runaway program.

You can write protect your own loader (sitting at 0x08000000) but equally rogue code can remove that protection - unless the codes to do that are not present in your code :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
... or just put a little bit of thinking how to protect the code parts that modify flash from things like Program Counter corruption/runaway or jumping to random function pointers. Just good old input validation on the function which does the write is 99.99% foolproof against those, you would need to accidentally jump at a very specific location with the accuracy of a few instructions, easy to do maliciously but very unlikely to happen by accident.
 
The following users thanked this post: nctnico, JPortici, Nominal Animal

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
Well... the case I mentioned of the 28C256 EEPROM getting corrupted was on a box which a customer installed next to the piezo igniter of some huge industrial boiler. The stuff should have gone into a metal box, shielded cables, etc, but evidently not. So spikes reaching PCB tracks (via EM, or conducted via wires) crashed the CPU repeatedly, and eventually it did, evidently, execute the EEPROM write function :)

There was a MAX706 hardware watchdog (1.6 sec fixed) but it didn't help.

The product has been selling since 1995 and is super solid, zero bugs ever found, except for this one customer.

"Very unlikely"? Not if a bunch of your customers start controlling piezo igniters with your box :)

One lesson - apart from customer education - is to avoid your firmware containing the magic unlock codes. Not always possible though. My firmware does not contain the RDP2 codes (depending on details, a great way to f**k your company if those got executed ;) ) but it does unavoidably contain codes for writing CPU FLASH (in the RAM loader for fw updates), codes for writing the Adesto SPI FLASH, etc.

Thinking about it, one could wrap the FLASH write code with a GPIO pin test, and have a jumper in there. That would stop a runaway PC. But very inconvenient.
« Last Edit: September 24, 2024, 04:10:23 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27766
  • Country: nl
    • NCT Developments
I have had similar issues a long time ago (back in the 8051 days). A protection scheme like the flash section layer I described earlier, fixed those problems. The closer the check is to the actual write action in code, the less likely you'll have unwanted write actions.

Input checking in general is a very powerful way to halt rogue execution.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: Siwastaja, Nominal Animal

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
Well... the case I mentioned of the 28C256 EEPROM getting corrupted was on a box which a customer installed next to the piezo igniter of some huge industrial boiler. The stuff should have gone into a metal box, shielded cables, etc, but evidently not. So spikes reaching PCB tracks (via EM, or conducted via wires) crashed the CPU repeatedly, and eventually it did, evidently, execute the EEPROM write function :)

If this is what you fear then surely this on-PCB exposed interface of NRST and SPI is orders of magnitude more likely to generate a valid programming pattern from EMI, than EMI coupling to inside of STM32 chip and corrupting Program Counter to hit within a few instructions of the 4 gigabyte address space. Specifically SPI is quite susceptible since it lacks UART-like filtration.

You are just painting yourself deeper in the corner. In reality, you won't be hit by a corrupted PC. You will be hit by any other type of bug that can sneak in within the added firmware(s) of additional management devices and third party black boxes. Replacing an EMI resilient solution by one more affected by EMI because you fear EMI is ridiculous when at the same time you add more surface to other types of bugs, too.
« Last Edit: September 24, 2024, 04:36:31 pm by Siwastaja »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27766
  • Country: nl
    • NCT Developments
Well... the case I mentioned of the 28C256 EEPROM getting corrupted was on a box which a customer installed next to the piezo igniter of some huge industrial boiler. The stuff should have gone into a metal box, shielded cables, etc, but evidently not. So spikes reaching PCB tracks (via EM, or conducted via wires) crashed the CPU repeatedly, and eventually it did, evidently, execute the EEPROM write function :)

There was a MAX706 hardware watchdog (1.6 sec fixed) but it didn't help.

The product has been selling since 1995 and is super solid, zero bugs ever found, except for this one customer.
The main question is: has this product been tested for CE compliance including immunity, surge and ESD? And how far did it got tested beyond the required limits? For consumer devices the immunity test level is like 3V/m for example. In my experience this is too low for a product to be reliable when deployed in the field. My preference is to get to at least 20V/m (with and without modulation) which shouldn't cause any problems (except for ethernet links going down). ESD is similar. Step it up a few kV over the limit during pre-compliance testing to see how much margin there is.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
Yeah and old designs with CPUs and memories on parallel buses on PCBs, especially when just 2-layer, are quite susceptible to EMI. Comparing that to similar corruption happening inside a single microcontroller chip is far-fetched. If even that is of concern, I would recommend looking at higher end safety-critical microcontrollers, like those used in automotive brake systems etc., but this is where my expertise stops so I'll let others give further recommendations.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
Yes - CE tested of course, by multiple customer labs and a "proper" lab.

It means very little. It's just an arbitrary figure. Something could produce 100V/m.

That design runs code from the EEPROM, so there you go... a 32F4 uC should be way better.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3522
  • Country: it
One advantage of the factory loader is that it "cannot" be corrupted by a runaway program.

You can write protect your own loader (sitting at 0x08000000) but equally rogue code can remove that protection - unless the codes to do that are not present in your code :)

which can't happen if the main program can't modify flash, or if any access (read/write/execute) from the application to the "boot" partition (however it's called on your microcontroller of choice, even PIC16 have it now) raises a security exception. True, F4 is old and may lack such feature
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
As the threat model is EMI coupling inside the chip and corrupting internal CPU registers such as PC, usage of STM32F4 is wrong choice anyway, having no per-sector write-once flash locking fuses is least of the concerns. peter-h needs to look at some high reliability automotive or probably aerospace grade rad hard devices.
 

Offline pcprogrammer

  • Super Contributor
  • ***
  • Posts: 4280
  • Country: nl
I don't know, certainly not an expert, but is it not possible that bits in a FLASH or EEPROM flip due to EMI?

If so the only safe solution would be to have a bootloader in a mask programmed ROM. Without support to boot from an external memory not possible on lots of MCU's, which leads to the need to look for MCU's that do support this.

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
is it not possible that bits in a FLASH or EEPROM flip due to EMI?

No, not really, in the context we are talking about.

I mean, if you are developing an EMP device or something then surely you will be able to flip bits inside of FLASH. But for any legal device on market accessible to normal fields of industry this is not the case.

And then of course bits can flip on their own without any EMI. This is called "reliability" and can be measured e.g. by metrics like MTBF. Every component has some failure rate.

But if you expose a SPI bus on a PCB and do poor job on layout (e.g. lacking ground plane) then it's pretty easy to get data corruption from simple EMI. This is what needs to be tested at labs but as nctnico points out lab testing is not a guarantee of never having a problem, you may want to voluntarily test to higher standards.
 
The following users thanked this post: pcprogrammer

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
As I wrote, the EMC issue was probably corrupted program data being fetched from the EEPROM chip. Or anything else picked up on PCB tracks. This cannot be handled by software design. You need a shielded box!

The point of my example was to show that a runaway PC can be created if you crash the CPU enough times and in enough random ways. Not different to the RDP cracking schemes actually - another recent thread! They put spikes on VCC, basically, and move them relative to the clock, with picosecond resolution, until they get lucky.

All my CPU boards are 4 layer, 2 planes.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Read an2606, that's all the embedded bootloader can do, no magic tricks.
Either make your own bootloader or use a 2nd mcu like stated by others.

both look for 0x7f on either usart1 or usart3, and this is sufficient to make it look for the rest of the data.
Not that simple. You won't randomly perform erase or write operations, the frames have checksum, ack/nak, not in a million years it'll pick random noise and do something.
an3155
« Last Edit: September 25, 2024, 10:05:44 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
Yes; I forgot. This came up in that "breaking RDP2" thread, where the VCC pulsing method was used to execute the memory read command in the boot loader.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
Not that simple. You won't randomly perform erase or write operations, the frames have checksum, ack/nak, not in a million years it'll pick random noise and do something.

... except if it has bug. And the point is moot anyway. Remember, in peter-h's scenario the threat model was EMI coupling inside the chip and corrupting program counter. It can as well happen to the 2nd MCU, jumping into the ST bootloader of that, right into handling an erase command, and wiping the 2nd MCU out. And I'm quite sure ST has not done any attempts to "rad-hard" the code of their bootloader, why should they. If you write your own bootloader, it is then at least possible to do very local error checking and nearly eliminate runaway PC issues.

But the main issue is peter-h's excessive threat model. ST very likely ignores the possibility of internal PC corruption, and that's a sane thing to do for a product line like this.

And my point about the interface noise is that it is enough to pick up repeatedly that 0x7f on either UART. It won't erase anything, but it locks out and is therefore effectively bricked. If this is non-user accessible (no one to power cycle it), then one random noise event is enough.
« Last Edit: Yesterday at 06:41:42 am by Siwastaja »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4045
  • Country: gb
  • Doing electronics since the 1960s...
Quote
EMI coupling inside the chip

No. Read again. It would be very hard to get EMI inside a chip.
« Last Edit: Yesterday at 07:14:50 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8724
  • Country: fi
Quote
EMI coupling inside the chip

No. Read again. It would be very hard to get EMI inside a chip.

That's what I have been saying all along, yet you bring the example of EMI coupling into external EEPROM bus causing wrong code to be run as some kind of example that same could happen within a single STM32 and jumping in the middle of flash code, writing to bootloader area. I don't think it's a valid concern at all: as I have been saying, just protect the flash erase/write operations by making them functions and adding input argument verification and you are as safe as it gets. Sure you can't prove it's 100% non-brickable but you can't prove any other solution either and I'm pretty sure every attempt to fix this by making things more complicated and adding second MCUs are only going to further increase risk of bricking, possibly significantly, for reasons already mentioned.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf