Author Topic: Designing an FFT co-processor. How Parallel can FFT be? (Read 1573 times)

InterestedTom · « **on:** June 17, 2018, 03:27:33 pm »

I want to design an FFT co-processor to fit in the Cyclone V SoC on the Terasic DE10-Nano.

I have seen a paper about implementing FFT in an FPGA, but they only used one "Butterfly Unit" and just clocked through the data and twiddle factor RAM inside the FPGA in the correct sequence for a radix-2 algorithm.

I want to explore implementing an FFT co-processor with built in configurability, i.e. an array of butterfly units with a configurable number of points for the FFT.

How feasible is this?

donmr · « **Reply #1 on:** June 17, 2018, 06:15:17 pm »

This depends on things like:

- How many points of what size do you want to handle?
- Can you present all of the inputs at once?
- How big an FPGA do you have?
- How fast does it need to run?

Besides the gate count of the butterfly units, large multiplexers can get big and slow too.

Try drawing some diagrams of what it would look like and estimate the sizes.

Any function can be make fully parallel if you can present all of the inputs, you have enough gates, and you are willing to take long enough for the result. Think of a huge ROM.

tggzzz · « **Reply #2 on:** June 17, 2018, 06:28:28 pm »

Concentrate on the routing. ISTR there is an irreducible minimum number of wires that cross over each other.

With FPGAs the routing is always the problem. (I exaggerate, of course, but there's truth in that jest!)

coppice · « **Reply #3 on:** June 17, 2018, 06:54:15 pm »

There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

InterestedTom · « **Reply #4 on:** June 17, 2018, 08:18:19 pm »

Quote from: coppice on June 17, 2018, 06:54:15 pm

There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

Ok so pipelining would suggest to me that I split the FFT into 2ⁿ parts and then combine them with a number of subsequent stages. Or did you mean pipeline the single butterfly unit by adding registers between each operation?

jbb · « **Reply #5 on:** June 17, 2018, 08:26:34 pm »

Quote from: coppice on June 17, 2018, 06:54:15 pm

There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

This is true. Each output term will eventually map back to all the input terms, so there’s a lot of data path to think about.

FFTs also require you to have all of the inputs before you can begin, so there’s always the latency of collecting your data samples. Might it be possible to begin the FFT as your samples come in? I did an FIR filter bank this way once; it precomputed as much as possible and then waited for the final sample to come in.

Finally, you mentioned variable size. I expect that this would, in hardware terms, involve more MUXing, which is difficult for place and route of the FPGA. I suspect the way forward there is to look at a 2^n radix, and use software to zero pad inputs of less than 2^n

coppice · « **Reply #6 on:** June 17, 2018, 08:39:27 pm »

Quote from: InterestedTom on June 17, 2018, 08:18:19 pm

Quote from: coppice on June 17, 2018, 06:54:15 pm
There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

Ok so pipelining would suggest to me that I split the FFT into 2ⁿ parts and then combine them with a number of subsequent stages. Or did you mean pipeline the single butterfly unit by adding registers between each operation?

You can pipeline within the butterfly, so that you can crank up the clock rate of the butterfly. You can pipeline the columns of butterflies, using multiple sets of butterfly hardware, to increase throughput for the same clock rate. There is a lot of potential for pipelining in the FFT, depending how much hardware you want to throw at the problem.

chris_leyson · « **Reply #7 on:** June 17, 2018, 09:06:55 pm »

Quote

There is limited potential for parallel operations in an FFT

There is lots of potential for parallel operations. Consider a 144 point complex FFT, if you do it radix 12 then you have to calculate 24 12-point butterflies, it's 24 because there are two stages. If you use the prime factor algorithm then you don't need twiddle factors for the 12-point FFTs or butterflies so that save a bit of hardware, however, you still need them for the 144-point FFT. With two pipelined 12-point FFT engines you can do a 144-point FFT in 12 clock cycles. Add two more 12-point FFT engines and you can do fast convolution with the same throughput. It was for doing fast 2D convolution and I only got as far as implementing one fully parallel pipelined 12-point FFT and it was huge. I never got to looking at adding another one as the company went bust.

rstofer · « **Reply #8 on:** June 17, 2018, 11:02:29 pm »

Google...

Try searching for 'pipeline fft architectures optimized for fpgas' or any other combination of fpga & fft

rhb · « **Reply #9 on:** June 18, 2018, 12:37:09 am »

Texas Memory Systems made dedicated FFT engines which they sold for medical and military applications. The company was founded by Holly Frost and later acquired by IBM.

The FFT is fundamentally a matrix multiplication problem which can be decomposed into prime factor sizes. So it is "trivially" parallel in one dimension. Things get stickier as you move to higher dimensions because of the data access.

In any case, the way to implement an FFT co-processor today is with an FPGA.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Designing an FFT co-processor. How Parallel can FFT be? (Read 1573 times)

InterestedTom

Designing an FFT co-processor. How Parallel can FFT be?

donmr

Re: Designing an FFT co-processor. How Parallel can FFT be?

tggzzz

Re: Designing an FFT co-processor. How Parallel can FFT be?

coppice

Re: Designing an FFT co-processor. How Parallel can FFT be?

InterestedTom

Re: Designing an FFT co-processor. How Parallel can FFT be?

jbb

Re: Designing an FFT co-processor. How Parallel can FFT be?

coppice

Re: Designing an FFT co-processor. How Parallel can FFT be?

chris_leyson

Re: Designing an FFT co-processor. How Parallel can FFT be?

rstofer

Re: Designing an FFT co-processor. How Parallel can FFT be?

rhb

Re: Designing an FFT co-processor. How Parallel can FFT be?

Share me