Welp, the way Teensy LC, 3.x, 4.x and Micromod have successfully used NXP MKL02 and MKL04 chips for exactly this purpose – to load the firmware update program to the RAM of the main MCU, doing the actual new firmware upload stuff; see
basic schematic – indicates it is definitely possible to do this right. These buggers are surprisingly hard to brick (excluding actual electrical damage).
Note how these are implemented: the bootloader MCU is connected only to a button initiating the firmware upload procedure, and to the JTAG interface, plus boot mode selection pins, DCDC_PSWITCH, and POR_B pin so that it can reboot and handle the power sequencing details on i.MX RT1062. It only transfers the bootloader code to RAM on the main MCU, which takes over the bootloading procedure.
The only way I know software can brick this (without electric damage, that is), is disabling the JTAG interface. As the "bootloader chip" is one of the secure ones, it can also provide signature verification for the firmware image (in as large chunks as fit in the rest of the RAM).
I do disagree somewhat with Siwastaja's
"Adding a second MCU only adds even more potential points of failure" assessment, simply because it is not the number of potential points of failure, but the complexity in the points of failure, that tends to be the deciding factor in robustness or lack thereof.
As an example, if you have say 3
N potential points of failure, but each one is simple, it is preferable to say
N potential
complex points of failure, because the probability of complex point failure is more than three times that of a simple point failure.
Here, using the secondary chip allows keeping the bootloader completely separate from the application on the main MCU. During normal operation, the bootloader is not on the main MCU at all; it is only loaded during firmware updates. This simplifies things
a lot. Really, it boils down to how robust can you make the bootloader MCU and main MCU communications, the part that loads the firmware updater program to the main MCU RAM, and the firmware updater itself.
Unfortunately, I do not have enough design or practical implementation experience to say whether that simplification is more significant than the number of added possible failure points. It just depends too much on the practical implementation.