Yeah, something has not been completely cleared up yet IMO.
Is the OP going to use an FPGA with embedded Flash, NVM or not? The MachXO3 can come in different versions I think, some with Flash, some with NVM that can be reprogrammed just a couple of times. The iCE40 line, unless I've missed something, doesn't have any kind of Flash or NVM and thus requires to be configured upon every power-on. The OP may also select a Xilinx part or whatever, which would be in the same case.
So if the FPGA has embedded Flash, reprogramming it at every power-on doesn't really seem to make sense. Reprogramming should only be required for updates. But if it doesn't, you'll have to program (configure) it every time. Not exactly the same use cases already.
In any case, if using a "biggish" FPGA (a large bitstream), and you want to reconfigure it at each power-on, emulating an external SPI flash chip would probably yield the fastest "boot" time. If it's a small bitstream, JTAG would be perfectly fine, even at a slowish rate.
To get the best of both worlds, I may suggest implementing both. Just a few more I/Os needed. Emulate an SPI flash IC, but also implement JTAG. With the added JTAG, you have additional means of debugging, re-reading the content, etc. Problem with emulating an SPI flash chip at an high clock freq is that it may be impossible to do from a typical CPU (much easier to do with an MCU with an SPI slave peripheral). If the CPU can do SPI slave, and you run Linux on it, and you have proper drivers available, that would be an option. You could also use an intermediate SPI Flash or RAM chip. Simply bit-banging JTAG from the CPU itself could be much slower than what you'd get with an MCU. Using an intermediate FTDI chip is of course an option, but the OP rejected it.
There are of course other options as seen above, so that's just a suggestion.