Author Topic: How to create SPI slave in FPGA when SCLK frequency similar to FPGA system clock (Read 7344 times)

matrixofdynamism · « **on:** March 19, 2021, 01:16:13 pm »

I need to design an SPI slave peripheral inside an FPGA that shall be used to communicate with a Microcontroller and configure the behaviour of the FPGA design. I have a few questions.

If the FPGA clock frequency is significantly higher than the input SCLK frequency, it is possible to sample the SCLK and detect the rising and falling edges. The design can use this information to shift or latch data. But,
(a) What if the SCLK input clock is almost of same order as the FPGA system clock? Does the clock signal connect directly into the FPGA registers?
(b) If not then what is the alternative?
(c) If (a) above is true then how do we write the timing constraints?
(d) The SCLK does not need to use global clock routing. Does this mean that any FPGA pin can be used for it?

I am in this situation where the FPGA system clock is not 4 times or more than the SCLK frequency.

How should such a slave be designed, and timing constrained and work in reliable way?

AndyC_772 · « **Reply #1 on:** March 19, 2021, 02:03:59 pm »

One way to do this is as follows:

- Call the SPI clock and the FPGA's main clock 'sclk' and 'mclk' respectively
- Create a process in the sclk domain which contains an implementation of the SPI slave interface
- When the SPI master writes a register, your SPI slave process latches the address and data, and asserts a signal to indicate that a write has occurred (call this 'write_sig_s')
- write_sig_s is double sampled to cross it into the mclk domain - call the resulting signal write_sig_m
- A process in the mclk domain monitors write_sig_m, and samples the latched address and data when write_sig_m has gone active

This isn't perfect. Even though it's guaranteed that the address and data will have been valid for some time before they're sampled in the mclk domain, it's hard to actually guarantee that timing is met. To fix that you might instantiate a DPRAM and use that to transfer the data from one domain into the other - and if you're supporting reads too then this is likely mandatory.

NorthGuy · « **Reply #2 on:** March 19, 2021, 03:25:10 pm »

If the SPI clock is continuous (that is it never stops and another signal marks transactions) and fast enough, you can bring it into FPGA and use as a clock.

If the SPI clock isn't continuous (that is is stops between transfers), you cannot really use it as an FPGA clock because all the logic clocked by this clock will stop along with the clock. In this case, create an internal clock which is at least 4-5 times faster than SPI and sample the SPI clock and data as needed.

rstofer · « **Reply #3 on:** March 19, 2021, 05:02:55 pm »

The PLL on the Artix 7 will generate an 800 MHz clock
See PLL_F_OUTMAX on page 30:

https://www.xilinx.com/support/documentation/data_sheets/ds181_Artix_7_Data_Sheet.pdf

That must be some uC to be able to clock SPI at 100 MHz or is the FPGA being clocked slow? Numbers would help...

Not knowing which uC and which FPGA makes any hardware specific response meaningless.

SiliconWizard · « **Reply #4 on:** March 19, 2021, 05:30:55 pm »

Quote from: NorthGuy on March 19, 2021, 03:25:10 pm

If the SPI clock is continuous (that is it never stops and another signal marks transactions) and fast enough, you can bring it into FPGA and use as a clock.

If the SPI clock isn't continuous (that is is stops between transfers), you cannot really use it as an FPGA clock because all the logic clocked by this clock will stop along with the clock. In this case, create an internal clock which is at least 4-5 times faster than SPI and sample the SPI clock and data as needed.

But why would you do this instead of just maintaining two different clock domains and synchronizing between them?
A relatively straightforward approach is to just use TDP RX and TX FIFOs, then you don't even have to bother with clock domain crossing.

NorthGuy · « **Reply #5 on:** March 19, 2021, 07:52:08 pm »

Quote from: SiliconWizard on March 19, 2021, 05:30:55 pm

Quote from: NorthGuy on March 19, 2021, 03:25:10 pm
If the SPI clock is continuous (that is it never stops and another signal marks transactions) and fast enough, you can bring it into FPGA and use as a clock.

If the SPI clock isn't continuous (that is is stops between transfers), you cannot really use it as an FPGA clock because all the logic clocked by this clock will stop along with the clock. In this case, create an internal clock which is at least 4-5 times faster than SPI and sample the SPI clock and data as needed.

But why would you do this instead of just maintaining two different clock domains and synchronizing between them?
A relatively straightforward approach is to just use TDP RX and TX FIFOs, then you don't even have to bother with clock domain crossing.

Which one are you asking about? You quoted two paragraphs from my post which are opposite to each other.

The first one, I would do when the SPI clock is relatively fast and continuous. For example, I did this with FT601 (100 MHz clock), although this is not exactly SPI. I routed FT601's clock through a clock-capable pin to FPGA, used it to clock all the related logic, and then used IO FIFOs to cross to faster FPGA's internal clock.

The second I usually use to communicate with MCUs, where the SPI clock isn't continuous. Typically, my FPGA clock is already fast enough to sample MCU's SPI, so there's no need for different clock domains, but if it wasn't (I guess this is the situation the OP has in mind) I would generate a faster clock to sample SPI lines. I'm sure with SERDES and DDR (or even QDR) sampling I could handle frequencies which are way above anything than any MCU's SPI can produce.

asmi · « **Reply #6 on:** March 19, 2021, 07:56:06 pm »

If SPI clock is static and known in advance, you can try using ISERDES in 4x oversampling mode to find a sampling point. I think there is even an appnote about this approach.

matrixofdynamism · « **Reply #7 on:** March 19, 2021, 08:02:36 pm »

The uC SPI SCLK is 20MHz. The FPGA system clock is 60MHz.

julian1 · « **Reply #8 on:** March 19, 2021, 08:42:05 pm »

Haven't tried it, but maybe something this strategy.

Shift the data into a tmp register on the uC clock (global in).
On spi CS deassert - latch the tmp value into the register in the fpga clock domain.

So CDC edge detection - is only needed for the CS signal, instead of incoming data and clk.

NorthGuy · « **Reply #9 on:** March 19, 2021, 08:46:53 pm »

Quote from: matrixofdynamism on March 19, 2021, 08:02:36 pm

The uC SPI SCLK is 20MHz. The FPGA system clock is 60MHz.

The sampling will certainly work when receiving from MCU.

The transmission is more problematic.

If you respond on the opposite edge of where MCU samples, that's 25 ns between the edge you see and the edge where MCU samples your response. With 60 MHz, it's 17 ns between your sample points, which leaves 8 ns for the round-trip delay and the MCU's setup time. This may not meet formal STA, but is likely to work in real life.

If you respond on the same edge, you have 50 ns, which is plenty.

asmi · « **Reply #10 on:** March 19, 2021, 08:57:46 pm »

Quote from: NorthGuy on March 19, 2021, 08:46:53 pm

With 60 MHz, it's 17 ns between your sample points, which leaves 8 ns for the round-trip delay and the MCU's setup time. This may not meet formal STA, but is likely to work in real life.

Just for reference, 8 ns of delay equals to 1.2 meter long trace.

mikeselectricstuff · « **Reply #11 on:** March 19, 2021, 09:09:31 pm »

If all you are doing is reading and writing config/status registers ( as opposed to streaming blocks of data), then the whole SPI interface would be clocked by the SPI clock.

Resampling it with the 60MHz clock is asking for trouble, especially if the MCU and FPGA clocks aren't synchronised as you have to consider possible metastability issues.

(d) yes, SPI clock could be on any FPGA pin

NorthGuy · « **Reply #12 on:** March 19, 2021, 09:14:40 pm »

Quote from: asmi on March 19, 2021, 08:57:46 pm

Quote from: NorthGuy on March 19, 2021, 08:46:53 pm
With 60 MHz, it's 17 ns between your sample points, which leaves 8 ns for the round-trip delay and the MCU's setup time. This may not meet formal STA, but is likely to work in real life.
Just for reference, 8 ns of delay equals to 1.2 meter long trace.

The wire is the least of the worries. In addition to the wire, there's input buffer, some logic delay, then output buffer, then allowance for the setup time of the receiving MCU. So, 8 ns may or may not be enough.

SiliconWizard · « **Reply #13 on:** March 19, 2021, 09:20:48 pm »

Quote from: NorthGuy on March 19, 2021, 07:52:08 pm

Quote from: SiliconWizard on March 19, 2021, 05:30:55 pm
Quote from: NorthGuy on March 19, 2021, 03:25:10 pm
If the SPI clock is continuous (that is it never stops and another signal marks transactions) and fast enough, you can bring it into FPGA and use as a clock.

If the SPI clock isn't continuous (that is is stops between transfers), you cannot really use it as an FPGA clock because all the logic clocked by this clock will stop along with the clock. In this case, create an internal clock which is at least 4-5 times faster than SPI and sample the SPI clock and data as needed.

But why would you do this instead of just maintaining two different clock domains and synchronizing between them?
A relatively straightforward approach is to just use TDP RX and TX FIFOs, then you don't even have to bother with clock domain crossing.

Which one are you asking about? You quoted two paragraphs from my post which are opposite to each other.

Yeah, I should have trimmed the quote. I was replying to the second one.

But thing is: it depends. On the SPI clock frequency, and the internal clock frequency. Sometimes sampling is not the best option, especially if the oversampling factor is too low.
For the OP's case, 20 MHz SPI, 60 MHz internal clock - it doesn't look too great.

Even if it probably looks horrible to many, especially on FPGAs, the following approach has proven to work as long as the SPI clock is low enough (that would work fine up to at least 50 MHz on typical mid-range FPGAs IME): use the SPI clock as a clock input. That part doesn't look bad. But for "low" frequencies, you don't even need to use a dedicated clock input pin or internal clock network. A typical SPI slave block is small and simple enough that using fabric logic for the clock usually won't cause any timing issues (again for low enough frequencies). The received word can be clocked into the RX FIFO at the last SPI clock pulse, which can also trigger the read of the next word to send from the TX FIFO. And now the uglier part is using the CS signal as an asynchronous reset for your SPI state machine. It looks ugly, be since the SPI clock, in classic SPI, starts after a minimum delay after the first edge of CS, and stops before the last edge of CS, there should be no metastability issues to fear. And as long as the SPI clock is not too high, no timing issues to fear either.

At 20 MHz, I would definitely try this option first. I'm sure some people would cringe at the thought of doing that, though.

nctnico · « **Reply #14 on:** March 19, 2021, 09:27:49 pm »

Quote from: AndyC_772 on March 19, 2021, 02:03:59 pm

One way to do this is as follows:

- Call the SPI clock and the FPGA's main clock 'sclk' and 'mclk' respectively
- Create a process in the sclk domain which contains an implementation of the SPI slave interface
- When the SPI master writes a register, your SPI slave process latches the address and data, and asserts a signal to indicate that a write has occurred (call this 'write_sig_s')
- write_sig_s is double sampled to cross it into the mclk domain - call the resulting signal write_sig_m
- A process in the mclk domain monitors write_sig_m, and samples the latched address and data when write_sig_m has gone active

This isn't perfect. Even though it's guaranteed that the address and data will have been valid for some time before they're sampled in the mclk domain, it's hard to actually guarantee that timing is met. To fix that you might instantiate a DPRAM and use that to transfer the data from one domain into the other - and if you're supporting reads too then this is likely mandatory.

Using a DPRAM isn't necessary at all. What is needed is a way to tell the mclk domain that a transaction has occured. A 1 bit signal which toggles for every transfer can do that. After that the only requirement is that the delay from the sclk domain to the mclk domain is not more than the mclk cycle time. So the mclk domain keeps sampling the status bit to change and if that happens, the information is taken over from the SPI interface (or vice versa). Vice versa has the same timing constraint but reading data will require some dummy clock cycles from the sclk in order to give the mclk domain time to react. This is a common way to implement SPI reads on chips as well. The SPI clock doesn't need to be continuous at all.

NorthGuy · « **Reply #15 on:** March 19, 2021, 09:56:18 pm »

Quote from: SiliconWizard on March 19, 2021, 09:20:48 pm

Even if it probably looks horrible to many, especially on FPGAs, the following approach has proven to work as long as the SPI clock is low enough (that would work fine up to at least 50 MHz on typical mid-range FPGAs IME): use the SPI clock as a clock input. That part doesn't look bad. But for "low" frequencies, you don't even need to use a dedicated clock input pin or internal clock network. A typical SPI slave block is small and simple enough that using fabric logic for the clock usually won't cause any timing issues (again for low enough frequencies). The received word can be clocked into the RX FIFO at the last SPI clock pulse, which can also trigger the read of the next word to send from the TX FIFO. And now the uglier part is using the CS signal as an asynchronous reset for your SPI state machine. It looks ugly, be since the SPI clock, in classic SPI, starts after a minimum delay after the first edge of CS, and stops before the last edge of CS, there should be no metastability issues to fear. And as long as the SPI clock is not too high, no timing issues to fear either.

Of course, if you can do everything with the SPI clock, there's no reason to use a different clock. Especially if you have a friendly MCU which can clock a byte or two for you. But imagine you get a command from an MCU and you need to decide what to respond on the next clock edge, but there's no clock to do the processing.

My FPGA clocks are typically many times faster than any data coming from MCU, so I don't even think about using the external SPI clock.

Of course, 60 MHz to sample 20 MHz signal is not great.

SiliconWizard · « **Reply #16 on:** March 19, 2021, 10:04:42 pm »

Quote from: NorthGuy on March 19, 2021, 09:56:18 pm

But imagine you get a command from an MCU and you need to decide what to respond on the next clock edge, but there's no clock to do the processing.

Ah of course. In this case, this becomes tricky. But I don't think it's impossible.
At the last clock pulse, you get the current received word, and can decide on this pulse what to respond next, and place the corresponding word in the "next word to transmit" register. This should be workable. Of course, this will require a longish logic path, so it will work only for relatively low frequencies. I can't see a scenario which would prevent this from being done, otherwise, though, but I may miss it right now.

asmi · « **Reply #17 on:** March 19, 2021, 10:05:21 pm »

Quote from: NorthGuy on March 19, 2021, 09:56:18 pm

Of course, 60 MHz to sample 20 MHz signal is not great.

I'm kind of wondering what the reason for FPGA system clock to be 60 MHz. Too low end of FPGA? I think even Cyclone-4 and Max10 can run pretty much anything at 100 MHz. Can't remember how fast ice40 devices are, but I suspect they will also have little problems running stuff at 100 MHz.

So what's the device in question??

NorthGuy · « **Reply #18 on:** March 19, 2021, 10:05:43 pm »

Quote from: nctnico on March 19, 2021, 09:27:49 pm

A 1 bit signal which toggles for every transfer can do that.

Sampling your 1-bit signal is not conceptually different than sampling SPI clock and data signals directly. As long as the timing is Ok.

langwadt · « **Reply #19 on:** March 19, 2021, 10:41:35 pm »

Quote from: NorthGuy on March 19, 2021, 10:05:43 pm

Quote from: nctnico on March 19, 2021, 09:27:49 pm
A 1 bit signal which toggles for every transfer can do that.

Sampling your 1-bit signal is not conceptually different than sampling SPI clock and data signals directly. As long as the timing is Ok.

a signal that toggles per transfer is going to be much slower than the spi clock/data, so there is much more time to do synchronization

nctnico · « **Reply #20 on:** March 19, 2021, 10:49:04 pm »

Quote from: NorthGuy on March 19, 2021, 10:05:43 pm

Quote from: nctnico on March 19, 2021, 09:27:49 pm
A 1 bit signal which toggles for every transfer can do that.

Sampling your 1-bit signal is not conceptually different than sampling SPI clock and data signals directly. As long as the timing is Ok.

It is completely different because sampling the 1 bit signal doesn't require the FPGA clock to be substantially higher compared to the SPI clock. In fact the SPI clock can be much higher compared to the FPGA clock. Using an internal PLL inside the FPGA to create a higher clock to sample the SPI clock can be an option but you generally choose SPI to have fast transfers. So the SPI clock may be in the 50MHz to 200MHz range.

Oversampling would take a 200Mhz to 500MHz clock and timing towards external pins will be iffy due to internal routing delays. On top of that running logic at such high speeds quickly diminishes the complexity of the logic you can achieve and/or restrict the ability of the FPGA place &route software to put the design together quickly. Last but not least you'll still need a clock domain transfer from the high frequency oversampling clock to a lower housekeeping clock. Running all FPGA logic at clock speeds over 50MHz usually is going to limit routeability of an FPGA design (especially if you have address / data busses and intertwined logic throughout the design). In all my FPGA designs I use the lowest possible clock frequencies in order to avoid trouble getting the FPGA design to meet timing requirements.

Say you have 1024 registers in an FPGA with a 32 bit word size. Each transfer (read or write_ consists of a 64bit word with the register address, mode and data. Using a 200MHz SPI clock you can do 3 million transfers in a second. Housekeeping logic clocked at 10MHz which transfers 32 bits words at a time can keep up easely.

Someone · « **Reply #21 on:** March 19, 2021, 11:15:11 pm »

Quote from: SiliconWizard on March 19, 2021, 10:04:42 pm

Quote from: NorthGuy on March 19, 2021, 09:56:18 pm
But imagine you get a command from an MCU and you need to decide what to respond on the next clock edge, but there's no clock to do the processing.

Ah of course. In this case, this becomes tricky. But I don't think it's impossible.
At the last clock pulse, you get the current received word, and can decide on this pulse what to respond next, and place the corresponding word in the "next word to transmit" register. This should be workable. Of course, this will require a longish logic path, so it will work only for relatively low frequencies. I can't see a scenario which would prevent this from being done, otherwise, though, but I may miss it right now.

Works fine for a generic r/w register interface up to a reasonable size, flatten the design and make it mostly combinatorial on the SPI clock side. Proper constraints will make sure it meets the timing of the host.

If you're stressing about the last 1/16th or 1/128th of a clock cycle of latency, its time to rethink a 20MHz SPI link.

NorthGuy · « **Reply #22 on:** March 19, 2021, 11:43:29 pm »

Quote from: SiliconWizard on March 19, 2021, 10:04:42 pm

Ah of course. In this case, this becomes tricky. But I don't think it's impossible.
At the last clock pulse, you get the current received word, and can decide on this pulse what to respond next, and place the corresponding word in the "next word to transmit" register. This should be workable. Of course, this will require a longish logic path, so it will work only for relatively low frequencies. I can't see a scenario which would prevent this from being done, otherwise, though, but I may miss it right now.

Anything which requires clock cannot be done in a combinatorial way. For example, if you need to access BRAM multiple times.

You can probably work out something in the majority of the cases, but it is likely to be more complicated than doing things with FPGA clock.

SiliconWizard · « **Reply #23 on:** March 19, 2021, 11:48:27 pm »

Quote from: NorthGuy on March 19, 2021, 11:43:29 pm

Quote from: SiliconWizard on March 19, 2021, 10:04:42 pm
Ah of course. In this case, this becomes tricky. But I don't think it's impossible.
At the last clock pulse, you get the current received word, and can decide on this pulse what to respond next, and place the corresponding word in the "next word to transmit" register. This should be workable. Of course, this will require a longish logic path, so it will work only for relatively low frequencies. I can't see a scenario which would prevent this from being done, otherwise, though, but I may miss it right now.

Anything which requires clock cannot be done in a combinatorial way. For example, if you need to access BRAM multiple times.

You can probably work out something in the majority of the cases, but it is likely to be more complicated than doing things with FPGA clock.

Yes of course.
But, in this case, you can still use this approach for SPI handling, and do some clock domain sync to handle whatever would require several clock cycles, in the other clock domain.

NorthGuy · « **Reply #24 on:** March 20, 2021, 01:03:05 am »

Quote from: nctnico on March 19, 2021, 10:49:04 pm

So the SPI clock may be in the 50MHz to 200MHz range.

200+ MHz transmissions are probably all source-synchronous, so you're not concerned with latencies, rather you just collect everything in the FIFO and process somehow on the other end of the FIFO. FIFO crosses the clock domain boundary for you, so there's no need for anything else, unless you want to spare the FIFO. This has nothing to do with the OP situation.

What I had in mind is rather slow SPI transfers at frequencies which much less than the FPGA clock. In such situation, I always sample the SPI signals with my FPGA clock. This lets me do all the processing with the same clock which is used for many other things in FPGA. Slave SPI responses require very low latency as measured in SPI clocks - one clock only, otherwise the master has to insert delays or extra clocks. IMHO, fast FPGA clock helps here.

What to do if the FPGA clock is very slow, so that sampling the signal may be difficult, such as this 60 MHz/20 MHz situation. I agree that this is not the same situation where you already use a fast clock to sample. Creating a faster sampling clock may not be a good idea because you still need clock domain crossing back to your normal clock. DDR sampling may be better. Using SPI clock seems right for simple cases, but IMHO may over-complicate things if you need more complex processing. In short, there's no good catch-all solution. I'm certainly glad I do not run my designs at 60 MHz


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: How to create SPI slave in FPGA when SCLK frequency similar to FPGA system clock (Read 7344 times)

Share me