Author Topic: Cheap in-system FGPA programming (Read 6452 times)

Harvs · « **on:** June 18, 2019, 11:24:06 pm »

I'm using a small FPGA tied to an A53 board running Linux via SPI to effectively glue a bunch of simultaneous sampling ADCs on at a decent data rate.

At the moment prototyping has been done with a machXO3, but I'll probably switch to something that's easily available in a smaller easily soldered package (e.g. the ICE40 have some nice QFN parts.)

But regardless, I want to have the capability for the A53 to do reprogramming of the device, so we can do over the network updates of all the firmware without physical access. Looking around everyone is using FT2232h as the JTAG programmer, which will easily double cost and board area of using the FPGA.

So I was thinking of using a SPI flash and reprogramming that with the A53 while holding the FPGA in reset or something similar.

Has anyone done anything like this before? I figure there has to be a good way of doing this, seems like it should be a normal thing to do these days.

Thanks

Harvs · « **Reply #1 on:** June 18, 2019, 11:44:27 pm »

Thanks, I obviously missed that TN. I didn't realise the ICE40 could be an SPI slave, which makes the whole thing a lot easier.

In face looks like I should be able to use the pins to complete configuration, then swap over to the main SPI bus afterward, which would be very ideal...

mikeselectricstuff · « **Reply #2 on:** June 18, 2019, 11:47:05 pm »

Wouldn't it be simpler to just soft-load the FPGA from the Linux system, and not use the SPI flash at all.

Harvs · « **Reply #3 on:** June 19, 2019, 12:44:59 am »

Quote from: mikeselectricstuff on June 18, 2019, 11:47:05 pm

Wouldn't it be simpler to just soft-load the FPGA from the Linux system, and not use the SPI flash at all.

Yep that's exactly what I was thinking with it being a slave. I figured it would be master only, which would be somewhat more complex to deal with. But being slave it looks like it should be easy to load, then use the same bus for the main 'application'.

mikeselectricstuff · « **Reply #4 on:** June 19, 2019, 01:22:54 am »

Quote from: Harvs on June 19, 2019, 12:44:59 am

Quote from: mikeselectricstuff on June 18, 2019, 11:47:05 pm
Wouldn't it be simpler to just soft-load the FPGA from the Linux system, and not use the SPI flash at all.

Yep that's exactly what I was thinking with it being a slave. I figured it would be master only, which would be somewhat more complex to deal with. But being slave it looks like it should be easy to load, then use the same bus for the main 'application'.

It also means you don't have to worry about the main app and FPGA having different versions as the app effectively carries the FPGA code with it.

OwO · « **Reply #5 on:** June 19, 2019, 05:13:20 am »

It doesn't matter if the FPGA is programmed via SPI or JTAG, you can always bitbang the protocol on some GPIOs. In the case of JTAG the tools can usually output a .svf file which contain commands that you can parse and replay (by bitbanging).

aandrew · « **Reply #6 on:** June 26, 2019, 01:44:04 am »

This is exactly what I'm doing with an ICE40 design I'm using. There was a bit of a trick to it (CS# must be *low* when FPGA comes out of reset so it knows it's a slave, and you must also toggle the clock a bunch of times after sending the image so that the FPGA exits configuration mode, but the trickiest part was just figuring out how to get the .bit file in the appropriate format:

under [tool options], make sure BitmapDisableHeader=yes is set, and then convert the .bin file to something like a C header that you can easily access from your program:

xxd -i foo_implmnt/sbt/outputs/bitmap/foo_bitmap.bin /tmp/fpga_image.c

aandrew · « **Reply #7 on:** June 26, 2019, 01:45:07 am »

Quote from: OwO on June 19, 2019, 05:13:20 am

It doesn't matter if the FPGA is programmed via SPI or JTAG, you can always bitbang the protocol on some GPIOs. In the case of JTAG the tools can usually output a .svf file which contain commands that you can parse and replay (by bitbanging).

You can always take the bruteforce approach and JTAG it like this, but many FPGAs let you send an image in easier ways.

technix · « **Reply #8 on:** July 26, 2019, 01:32:46 pm »

AFAIK all recent-ish SRAM FPGAs from Xilinx and Intel can SPI boot. And it can be done using spidev driver and a shell script doing "cat /usr/lib/fpga/bitstream.bin > /dev/spidev0.0" at its heart. This means you can skip FPGA configuration memory and instead integrate the FPGA boot process into the Linux kernel boot process, and system updates to the CPU-side software can also update FPGA by packing in a new bitstream file. If you have wired the master reset pin of the FPGA to the CPU, you can even hot reinitialize the FPGA.

SiliconWizard · « **Reply #9 on:** July 26, 2019, 02:08:25 pm »

Quote from: OwO on June 19, 2019, 05:13:20 am

It doesn't matter if the FPGA is programmed via SPI or JTAG, you can always bitbang the protocol on some GPIOs. In the case of JTAG the tools can usually output a .svf file which contain commands that you can parse and replay (by bitbanging).

Yes. You can either emulate an SPI flash chip, program it as an SPI slave or use JTAG. Many options.
The SPI slave approach is probably a bit easier? But yes from SVF files it's still relatively easy to use JTAG.

NorthGuy · « **Reply #10 on:** July 26, 2019, 03:05:24 pm »

It's not just SPI. You need to manipulate/test other pins to enter the programming mode and verify the results.

With JTAG you reset/check state through JTAG commands, and also can do other things - such as read device id or temperature/voltage sensors. You can also access JTAG from your design, so you can establish communications between FPGA and the other end.

Either way you can use configuration flash, but with SPI you need to manipulate mode pins to switch between SPI boots and flash boots. With JTAG you just do nothing and it configures from flash. If you wish you can program the configuration flash through JTAG. You probably can do it through SPI too, but it requires more work.

Thus, I would go with JTAG over SPI.

Bassman59 · « **Reply #11 on:** July 26, 2019, 07:23:56 pm »

Quote from: Harvs on June 18, 2019, 11:24:06 pm

Has anyone done anything like this before? I figure there has to be a good way of doing this, seems like it should be a normal thing to do these days.

Once upon a time I hung an SPI flash part off of an SiLabs 8051, and the 8051 read from that flash and bit-banged each byte to a Virtex-4's configuration port. The SPI flash had enough memory for two configurations, and one or the other was loaded when necessary if the user changed a system setting that demanded a different FPGA config.

The whole system connected to the host computer and a standard UART COM port was used to send commands to and get status from that 8051. One could use XMODEM to upload new FPGA configurations.

What I'd do now depends on the system design and what interfaces are available.

technix · « **Reply #12 on:** July 29, 2019, 01:03:23 pm »

Quote from: NorthGuy on July 26, 2019, 03:05:24 pm

It's not just SPI. You need to manipulate/test other pins to enter the programming mode and verify the results.

With JTAG you reset/check state through JTAG commands, and also can do other things - such as read device id or temperature/voltage sensors. You can also access JTAG from your design, so you can establish communications between FPGA and the other end.

Either way you can use configuration flash, but with SPI you need to manipulate mode pins to switch between SPI boots and flash boots. With JTAG you just do nothing and it configures from flash. If you wish you can program the configuration flash through JTAG. You probably can do it through SPI too, but it requires more work.

Thus, I would go with JTAG over SPI.

JTAG requires bit banging, thus the maximum speed is fairly restricted. If you are booting something like a XC6SLX9 it is tolerable, one step up to XC6SLX100 you would be asking for trouble using any bit banged interface, let alone beasts like XC7VX1140T.

For chips like XC6SLX100 I need hardware SPI for the speed. It is almost impossible to bit bang JTAG from Linux kernel at a bus speed of a few megahertz, but piping a mmap'd file down SPI hardware using DMA can achieve speeds no less than tens of megabits per second or above depending on SoC used.

As of that beast of a chip known as XC7VX1140T, I would have to use at least 16-bit SelectMAP attached to the SoC's EBI SRAM interface and dump the mmap'd bitstream in using memcpy. That can give hundreds of megabits per second.

NorthGuy · « **Reply #13 on:** July 29, 2019, 01:37:53 pm »

Quote from: technix on July 29, 2019, 01:03:23 pm

JTAG requires bit banging, thus the maximum speed is fairly restricted. If you are booting something like a XC6SLX9 it is tolerable, one step up to XC6SLX100 you would be asking for trouble using any bit banged interface, let alone beasts like XC7VX1140T.

You bit-bang it to the point where you're about to output the bitstream. Then, it's the same as SPI during bitstream loading as TMS doesn't need any manipulations. Thus, you can use the same method as with SPI. Then you go back to
bit-banging to verify the status. So, it is not any slower than SPI. Other times, you can bit-bang it to do other neat things.

FTDI FT232H can do JTAG at 30 MHz. After you sold your house to buy XC7VX1140T, would you have $5 left for an FTDI chip?

Morgan127 · « **Reply #14 on:** July 29, 2019, 02:38:45 pm »

For in field updates we use these Ethernet Cores. These works for Spartan 6 and other Xilinx FPGAs. However that requires that you got an Ethernet port on your board and that your FPGA boots in master SPI mode.

SiliconWizard · « **Reply #15 on:** July 29, 2019, 03:00:59 pm »

Quote from: Morgan127 on July 29, 2019, 02:38:45 pm

For in field updates we use these Ethernet Cores. These works for Spartan 6 and other Xilinx FPGAs. However that requires that you got an Ethernet port on your board and that your FPGA boots in master SPI mode.

Well, as I understood it, the OP wants to be able to reprogram the FPGA via the main processor, which already has a network connection (ethernet or WIFI?), so adding a second network port dedicated to reprogramming would be cumbersome here and more costly anyway.

Also I don't know if your field update system implements full TCP/IP connections or just raw point-to-point ethernet, in which case it would be a lot less flexible than the OP's solution especially if they have to deal with several devices in the same network.

technix · « **Reply #16 on:** July 29, 2019, 04:06:32 pm »

Quote from: NorthGuy on July 29, 2019, 01:37:53 pm

You bit-bang it to the point where you're about to output the bitstream. Then, it's the same as SPI during bitstream loading as TMS doesn't need any manipulations. Thus, you can use the same method as with SPI. Then you go back to
bit-banging to verify the status. So, it is not any slower than SPI. Other times, you can bit-bang it to do other neat things.

There would be pinmux conflicts all over the place if I am loading it from Linux, as it really don't want to permit dynamic pin reallocating. Or I have to use both SPI and GPIO pins, and add an external mux chip. SPI-only mode alleviates all of that, and it still allows verification depending on exact implementation.

Additional benefit of native SPI is that after the FPGA booted the logic programmed in there can take over the SPI pins allowing the same set of pins being reused for actual application.

Quote from: NorthGuy on July 29, 2019, 01:37:53 pm

FTDI FT232H can do JTAG at 30 MHz. After you sold your house to buy XC7VX1140T, would you have $5 left for an FTDI chip?

I am just using that chip as an example of a "crazy big SRAM FPGA that has a ginormous bitstream file" that can require ages to be sent over either JTAG or SPI. Also the EMIF-to-SelectMAP link also allow both rapid FPGA configuration (using memcpy here as the FPGA is mapped directly to SoC memory space) and communication between SoC and FPGA post configuration with the same set of pins.

NorthGuy · « **Reply #17 on:** July 29, 2019, 04:21:05 pm »

Quote from: technix on July 29, 2019, 04:06:32 pm

There would be pinmux conflicts all over the place if I am loading it from Linux, as it really don't want to permit dynamic pin reallocating. Or I have to use both SPI and GPIO pins, and add an external mux chip.

Yes, people use these Linux systems thinking it's a great power (or great progress), then they cannot do a simple thing which is a snap on PIC16. Go figure.

Quote from: technix on July 29, 2019, 04:06:32 pm

Quote from: NorthGuy on July 29, 2019, 01:37:53 pm
FTDI FT232H can do JTAG at 30 MHz. After you sold your house to buy XC7VX1140T, would you have $5 left for an FTDI chip?
I am just using that chip as an example of a "crazy big SRAM FPGA that has a ginormous bitstream file" that can require ages to be sent over either JTAG or SPI.

Ages? It is not that bad. 385 Mbit bitstream. About 12 sec at 30 MHz. Compared to all that time you spend to generate a bitstream for it in Vivado ...

technix · « **Reply #18 on:** July 29, 2019, 04:43:25 pm »

Quote from: NorthGuy on July 29, 2019, 04:21:05 pm

Yes, people use these Linux systems thinking it's a great power (or great progress), then they cannot do a simple thing which is a snap on PIC16. Go figure.

Bare metal programming and embedded Linux each have their benefits and weaknesses. For example AFAIK the network, security and GUI on linux is much more mature and proven than that for embedded systems, as Linux code for those components are derived from code intended for servers (network and security) and Android (GUI.) Being a standard UNIX system it allows direct code reuse from other platforms.

Quote from: NorthGuy on July 29, 2019, 04:21:05 pm

Ages? It is not that bad. 385 Mbit bitstream. About 12 sec at 30 MHz. Compared to all that time you spend to generate a bitstream for it in Vivado ...

I want the final system to boot in a few seconds, not minutes. The bitstream file is generated once per revision on a high powered workstation, but it is loaded a lot more frequently using a relatively low speed processor.

NorthGuy · « **Reply #19 on:** July 29, 2019, 07:53:45 pm »

Quote from: technix on July 29, 2019, 04:43:25 pm

I want the final system to boot in a few seconds, not minutes. The bitstream file is generated once per revision on a high powered workstation, but it is loaded a lot more frequently using a relatively low speed processor.

My Artixes load in under a second through JTAG with FT232H. A good portion of that time is loading of the FTDI DLL. If you use your own chip, you thereby eliminate DLL bloat and can use 66 MHz signaling. Then it'll be around 300 ms, even faster if you compress the bitstream.

technix · « **Reply #20 on:** July 30, 2019, 01:51:24 am »

Quote from: NorthGuy on July 29, 2019, 07:53:45 pm

My Artixes load in under a second through JTAG with FT232H. A good portion of that time is loading of the FTDI DLL. If you use your own chip, you thereby eliminate DLL bloat and can use 66 MHz signaling. Then it'll be around 300 ms, even faster if you compress the bitstream.

Combine that with the SPI/SelectMAP signal reusing, that is why I prefer SPI over bit bang JTAG.

legacy · « **Reply #21 on:** July 30, 2019, 11:56:11 am »

What does the FPGA need to implement?

Cannot you have an SPI-flash multiplexed to both FPGA and SoC?

This way when the mux connects the flash to the SoC it can program the flash once and for all (even with a slow method), while in the default configuration the mux connects the flash to the FPGA so it can bootstrap in parallel to the SoC.

Why do you need to load the bitstream at every bootstrap?

SiliconWizard · « **Reply #22 on:** July 30, 2019, 02:38:11 pm »

Yeah, something has not been completely cleared up yet IMO.

Is the OP going to use an FPGA with embedded Flash, NVM or not? The MachXO3 can come in different versions I think, some with Flash, some with NVM that can be reprogrammed just a couple of times. The iCE40 line, unless I've missed something, doesn't have any kind of Flash or NVM and thus requires to be configured upon every power-on. The OP may also select a Xilinx part or whatever, which would be in the same case.

So if the FPGA has embedded Flash, reprogramming it at every power-on doesn't really seem to make sense. Reprogramming should only be required for updates. But if it doesn't, you'll have to program (configure) it every time. Not exactly the same use cases already.

In any case, if using a "biggish" FPGA (a large bitstream), and you want to reconfigure it at each power-on, emulating an external SPI flash chip would probably yield the fastest "boot" time. If it's a small bitstream, JTAG would be perfectly fine, even at a slowish rate.

To get the best of both worlds, I may suggest implementing both. Just a few more I/Os needed. Emulate an SPI flash IC, but also implement JTAG. With the added JTAG, you have additional means of debugging, re-reading the content, etc. Problem with emulating an SPI flash chip at an high clock freq is that it may be impossible to do from a typical CPU (much easier to do with an MCU with an SPI slave peripheral). If the CPU can do SPI slave, and you run Linux on it, and you have proper drivers available, that would be an option. You could also use an intermediate SPI Flash or RAM chip. Simply bit-banging JTAG from the CPU itself could be much slower than what you'd get with an MCU. Using an intermediate FTDI chip is of course an option, but the OP rejected it.

There are of course other options as seen above, so that's just a suggestion.

technix · « **Reply #23 on:** July 30, 2019, 07:19:27 pm »

Quote from: SiliconWizard on July 30, 2019, 02:38:11 pm

In any case, if using a "biggish" FPGA (a large bitstream), and you want to reconfigure it at each power-on, emulating an external SPI flash chip would probably yield the fastest "boot" time. If it's a small bitstream, JTAG would be perfectly fine, even at a slowish rate.

Flash emulation may not be possible given certain SoC used. For example BCM2837 of Raspberry Pi 3 lacked either SPI slave or I2C slave interface. If you have a SoC it is likely better to put the FPGA in SPI slave mode instead of SPI master mode for the sake of compatibility.

SiliconWizard · « **Reply #24 on:** July 30, 2019, 08:07:26 pm »

Quote from: technix on July 30, 2019, 07:19:27 pm

Quote from: SiliconWizard on July 30, 2019, 02:38:11 pm
In any case, if using a "biggish" FPGA (a large bitstream), and you want to reconfigure it at each power-on, emulating an external SPI flash chip would probably yield the fastest "boot" time. If it's a small bitstream, JTAG would be perfectly fine, even at a slowish rate.
Flash emulation may not be possible given certain SoC used. For example BCM2837 of Raspberry Pi 3 lacked either SPI slave or I2C slave interface. If you have a SoC it is likely better to put the FPGA in SPI slave mode instead of SPI master mode for the sake of compatibility.

Sure, hence all the "ifs" in my last paragraph. I just think it would be the faster option if it's possible. One additional thing is that yes, MachXO2/3 and iCE40 support SPI slave configuration, but (correct me if I'm wrong) I don't think the Xilinx parts do. So it would also all depend on the FPGA you select.

One last remark and, really obvious one, but we don't know what the FPGA is going to be used for. In the system, it may be required to be fully operational after a short while and BEFORE the SoC would have ended its booting and gotten a chance to upload the bitstream to the FPGA. Of course in that case, unless you're using a very lightweight OS which boots blazingly fast or you are using it bare metal, using an FPGA with embedded Flash (or using an external Flash IC) would be the only viable option. The SoC could then just update the Flash content if needed.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Cheap in-system FGPA programming (Read 6452 times)

Share me