You just need a diode between the STM32 Vcc and main power rail. This way when programming the programmer/debugger can inject a 3.3V into the STM32 power domain without affecting the rest of the system.
I don't like that. I don't like a voltage drop between the MCU's VDD and the rest of the system in normal condition.
You are not supposed to program at lower voltages in the first places as documented, and for most uses I know of the STM32 does operate with a 3.3V rail. I feel that ST is just not bothering with the maybe 5% case anymore.
I think you're stuck to the F-series/"high-performance" STM32's. Operating voltages below 3.3V are very common in low-power, battery-operated designs. And the STM32L series are targetted at this market, and very well positioned actually. I have read nothing about a higher min programming voltage than the min operating voltage (1.71V as I reckon) in the L4 series datasheets.
I'm looking for this info in a STM32F407 datasheet right now and now get why you said that. Seems specific to the F4 (and maybe other F-series). There's a stated Vprog which depends on the program width actually. 32-bit programming requires 2.7V min, but 8-bit programming can be done at 1.8V, so there's still a way to program them reliably down to 1.8V provided that you can force the 8-bit programming mode. There's no such restriction that I know of for the L-series, they don't even mention specific Vprog figures.
I'm guessing this limitation on the STLINK/V3 vs. V2 comes more from a cost reduction strategy than anything else. Since it's a stated "modular" design, they may release a wider operating voltage through some adapter for JTAG/SWD later on.
As for me, I'll definitely stick to a JTAG-lock-pick Tiny 2 which is much faster than the STLINK (at least up to V2), works down to 1.4V and can be used with many other chips.