Author Topic: Cheap 250Mbs Link between Boads  (Read 7129 times)

0 Members and 4 Guests are viewing this topic.

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #50 on: August 15, 2024, 10:49:58 pm »
When we say '1b0', what we are saying is if you look as the serial output on a scope, it means you will see:


...HxLHxLHxLHxLH.... and so on...

The 'x' will be set high or low according to the data you wish to transmit.

The 'H' will always be high and the 'L' will always be low.  Every transition L->H looks like a positive edge clock.  If you were to lock your scope on this rising edge, you will see a clean data bit right after 'H'.

So, if we choose a 24bit parallel to serial 1 bit SERDES transmitter, the 24 bit on the parallel input will look like:

"1x01x01x01x01x01x01x01x0", where X would be an 8 bit word...

But, the receiving end needs to know which 'x' is the true first bit to begin.

For the DLL/PLL clock input, we will use the DLL where at every '01' transition, it will divide that by 2 making a clean 125MHz reference clock as the 'x' data bit after the 1 will be random noise.  A simple PLL (Spartain 6 has one of those) can create a single or multiple clock outputs by multiplying and dividing the source clock by a set of fixed integers, like 1,2,3,4,5,6,7,8,9...1023 and also deliver you optional multiple output offset phases.  I'm assuming the Spartain 6 DLL is a bit simpler than it's PLL as it can only multiply by integers of 1,2,4,8,16 and offer something like 4 or 8 optional different phases.

Ok, back to you bandwidth requirements:
125 bits x 10 000 x 200 = 250 000 000.  No room for anything.

Let's say we go for 130 bits, 1 start bit, 125 data bits and 4 stop bits.
(130*10000*200)= 260 000 000 baud, x4bits = 1040mbaud.  The speed limit of the OSERDES2 for the Spartain6.  (I'm using x4 instead of x3)  This means H[xx]LH[xx]LH[xx]LH...  The [xx] need to be the same, IE 1 bit value, they are just twice as wide instead of using the x3 pattern.  This means compatibility with the DLL saving your PLL for something else if needed.

Every time you get a start bit, start your next sample, while feeding through the previous data appending your previous sample to the end of the stream being fed from your previous board to the next board.
Will this work for you?

Each board will sample in parallel with an approximate 5-10ns delay + cable length from each other since they need to decode the serial stream looking for that first 'start/go' bit.  So, board #200's sample will be delayed by ~100-200ns.  Though, with additional coding, you can counteract that 10ns by predicting the start bit's arrival because of it's perfectly repetitive nature.  Basically your internal 10lhz clock will be set to begin sampling early by the 4-8 clocks on the 260MHz side it takes to see the 'start/go' bit.  I'm not sure how you will deal with the cable length, but with a 10khz sample rate, I don't think a global 200ns offset can be interpreted.

If everything is ok, these are my recommended next steps:

Design a SystemVerilog test-bench which will synthesize your master board's serial data chain.  Then when you begin coding for your Spartan 6, you will add that Spartan 6 code in your testbench, feeding it your custom 125bits serial input and see if your FPGA will lock onto your clock data and create a new internal clock from it while decoding and passing all the data though.

Then you can append your own Spartan 6 temporary dummy data onto the stream.

Then you can modify your Spartan 6 code to synthesize it's own master serial data option to replace your test-bench's beginner stream, basically clocking that Spartan 6 from a regular crystal with an IO pin set high or low to define whether it will run as a slave serial input, or run as the first master board from a crystal oscillator input.

Next, add multiple boards of your Spartan 6 code to your testbench, chained together as if wired in real life to verify each board adds it's own data into the stream without errors or missing bits and verify the phase of your internal generated 10khz sampling clock.

Then you may add you data acquisition sampling IOs to feed the true data into the chain.  (This will be a separate testbench just to verify you sampler connections as you already did the com, then you may merge the 200 board setup with the sampling IO version if you like.)

The goal it to create your entire 200 board system in something like ModelSim (So long as Xilinx has it's DLL and OSERDES models for ModelSim, or, whichever simulator Xilinx uses.) and see the entire board-boars system power-up and function.

You want to test everything before even creating a schematic so you know what you build will work.

afaict using the start bit to trigger sampling would make the last device sample 200 "frames" later than the first, unless you make each device aware of it's position so it can delay sampling N bits depending on position
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #51 on: August 15, 2024, 11:03:35 pm »
 :palm: The maximum for the -2 Spartan 6 is 950mb, not 1050mb.

So, we need to do a few modifications:

Let's say we go for 136 bits, 1 start bit, 125 data bits and 10 stop bits.
(136*10000*200)= 272 000 000 baud, x3bits = 816mbaud.

But, we will need to use the PLL to divide our input clock by 2, the multiply it by 3 to get 408mhz reference clock while running the OSERRDES2 in 4:1 mode making it's parallel 4 bit IO port run at 204MHz.

136 bits TIMES (1 data bit + 2 clock bits) = 408 bit word frame.
408 bit word frame / 4 bit OSERDES2 = 102 -> 4 bit byte packets.  (Good, no ugly odd numbers half 4 bit parallel cycles to deal with)

With all 200 boards installed, how at the end, how will you know which was the first board's sample?
I guess now that we have an extra 10 stop bits, use 1 or 2 bits to signify an index and a final board in the list.
« Last Edit: August 15, 2024, 11:07:47 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #52 on: August 15, 2024, 11:05:44 pm »
afaict using the start bit to trigger sampling would make the last device sample 200 "frames" later than the first, unless you make each device aware of it's position so it can delay sampling N bits depending on position
Each board sampled ahead by 1 clock from the beginning of the first/index start bit, while presenting that stored result data, the FPGA should have at the same time begun the sample for the next packet.  So, all samples will be delayed by 1 clock, all acquisitions beginning in parallel, but the data coming out throughout the 200 board packet.
« Last Edit: August 15, 2024, 11:10:28 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #53 on: August 15, 2024, 11:10:09 pm »
:palm: The maximum for the -2 Spartan 6 is 950mb, not 1050mb.

So, we need to do a few modifications:

Let's say we go for 136 bits, 1 start bit, 125 data bits and 10 stop bits.
(136*10000*200)= 272 000 000 baud, x3bits = 816mbaud.

But, we will need to use the PLL to divide our input clock by 2, the multiply it by 3 to get 408mhz reference clock while running the OSERRDES2 in 4:1 mode making it's parallel 4 bit IO port run at 204MHz.

136 bits TIMES (1 data bit + 2 clock bits) = 408 bit word frame.
408 bit word frame / 4 bit OSERDES2 = 102 -> 4 bit byte packets.  (Good, no ugly odd numbers half 4 bit parallel cycles to deal with)

With all 200 boards installed, how at the end, how will you know which was the first board's sample?
I guess now that we have an extra 10 stop bits, use 1 or 2 bits to signify an index and a final board in the list.

yeh, much much simpler with a clock on a separate pair, they could trigger the sampling with a pause in the clock
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #54 on: August 15, 2024, 11:14:30 pm »
yeh, much much simpler with a clock on a separate pair, they could trigger the sampling with a pause in the clock

We now have the extra bits to embed it all.

I will not help ali_asadzadeh any further if he doesn't do a real test bench and fully test his chained design there, so we will see if everything clocks and syncs up before attempting to build anything.

That is I believe that you can simulate a Xilinx Spartan 6's PLL and OSERDES2, right?  (Otherwise, I made the right choice using Altera as you can simulated all their FPGA's IP primitives.)
« Last Edit: August 15, 2024, 11:17:04 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #55 on: August 15, 2024, 11:17:47 pm »
yeh, much much simpler with a clock on a separate pair, they could trigger the sampling with a pause in the clock

We now have the extra bits to embed it all.

I will not help ali_asadzadeh any further if he doesn't do a real test bench and fully test his chained design there, so we will see if everything clocks and syncs up before attempting to build anything.

That is I believe that you can simulate a Xilinx Spartan 6's PLL and OSERDES2, right?  (Otherwise, I made the right choice using Altera as you can simulated all their FPGA's IP primitives.)

yes you can simulate it all in ISIM that comes with ISE

(or in any other simulator once you figure out how to point to all the right libs)
« Last Edit: August 15, 2024, 11:32:35 pm by langwadt »
 
The following users thanked this post: BrianHG

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #56 on: August 15, 2024, 11:37:10 pm »
https://docs.amd.com/v/u/en-US/spartan6_hdl
DLL 'DCM_SP', page 83.

Yes, you can use a multiple of 3 or 6 for the CLKFX clock output multiplier.

If you can use the CLKFX to drive the OSERDES2 transceiver, then you are clear to go with my 816mb calculation so long as you can clock the rest of the FPGA logic from a CLKFX/2 output.  Maybe use the CLKDV output set to divide by 2 if it's source is the CLKFX.

Use the switch clock input to select the external serial bus clock VS the PLL's local reference 204MHz clock.

Page 234, PLL_BASE, the normal full PLL.  Connect to a 12MHz or 24MHz crystal oscillator to generate the 204MHz core clock when you need it for the first master acquisition PCB.

The you might need to do more in the PLL.

The full Spartan 6 clocking datasheet: https://docs.amd.com/v/u/en-US/ug382
« Last Edit: August 15, 2024, 11:44:06 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #57 on: August 15, 2024, 11:47:49 pm »
https://docs.amd.com/v/u/en-US/spartan6_hdl
DLL 'DCM_SP', page 83.

Yes, you can use a multiple of 3 or 6 for the CLKFX clock output multiplier.

If you can use the CLKFX to drive the OSERDES2 transceiver, then you are clear to go with my 816mb calculation so long as you can clock the rest of the FPGA logic from a CLKFX/2 output.  Maybe use the CLKDV output set to divide by 2 if it's source is the CLKFX.

Use the switch clock input to select the external serial bus clock VS the PLL's local reference 204MHz clock.

Page 234, PLL_BASE, the normal full PLL.  Connect to a 12MHz or 24MHz crystal oscillator to generate the 204MHz core clock when you need it for the first master acquisition PCB.

The you might need to do more in the PLL.

The full Spartan 6 clocking datasheet: https://docs.amd.com/v/u/en-US/ug382

and before that make it all work with sampling and synchronization with just a continuous running separate clock and data
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #58 on: August 16, 2024, 12:05:02 am »
https://docs.amd.com/v/u/en-US/spartan6_hdl
DLL 'DCM_SP', page 83.

Yes, you can use a multiple of 3 or 6 for the CLKFX clock output multiplier.

If you can use the CLKFX to drive the OSERDES2 transceiver, then you are clear to go with my 816mb calculation so long as you can clock the rest of the FPGA logic from a CLKFX/2 output.  Maybe use the CLKDV output set to divide by 2 if it's source is the CLKFX.

Use the switch clock input to select the external serial bus clock VS the PLL's local reference 204MHz clock.

Page 234, PLL_BASE, the normal full PLL.  Connect to a 12MHz or 24MHz crystal oscillator to generate the 204MHz core clock when you need it for the first master acquisition PCB.

The you might need to do more in the PLL.

The full Spartan 6 clocking datasheet: https://docs.amd.com/v/u/en-US/ug382

and before that make it all work with sampling and synchronization with just a continuous running separate clock and data
He wanted a single serial link.
Having his testbench create a '1x0' pattern at a specific speed to feed his fpga, then having that FPGA's DLL lock onto the pattern where the 'x' is random junk is the first basic thing to achieve.  Seeing the simulator output a stable locked clock on a dummy IO or signal tap from the Spartan 6 code is all you want to see here.

Reading and sorting the junk with a connected OSERDES2 is part 2.

Making the testbench replace the random 'x' data with something useful, a dummy 136 bit packet and spaced to 10khz is next.

Having his FPGA now reading stuff which is no longer junk, and sorting that stuff into a 136bit serial register is next.  The start bit will begin the 36bit register and the index 10khz clock should be the next after the start bit.  Everything else is the 125 bit packet.

After receiving the first 125bit packet, the next once should already be ready to merge into the output OSERDES2.

If you are having trouble with this, then this project will be too tough to debug in the field.

Take these small steps, 1 at a time is all you need to do to move towards success.
« Last Edit: August 16, 2024, 12:08:36 am by BrianHG »
 

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1931
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #59 on: August 16, 2024, 10:24:10 am »
Thanks BrianHG for your feedback, I will try to make a test project in Xilinx ISE and make some simulation,
But I need help on setting up the IP,


First We need a transmitter Block, Please check if I'm selecting the right choices in here
Page 1 of the IP wizard

Page 2 of the IP wizard

Page 3 of the IP wizard

Page 4 of the IP wizard

Page 5 of the IP wizard

Page 6 of the IP wizard


Then we need a receiver part, so here are the pages for the receiver side IP
Page 1 of the IP wizard

Page 2 of the IP wizard

Page 3 of the IP wizard

Page 4 of the IP wizard

Page 5 of the IP wizard

Page 6 of the IP wizard


Please check if I made the right choices in the wizard, also these questions comes to my mind, for the transmitter part the clk_in is the output of a pll that has multiplied for example the onboard 50Mhz clock to 1040MHz?
Also for the reciver part there is clk_in , where should I connect it? and I assume that I should send data to the transmitter and connect the diffoutputs to the reciver to recive the data.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #60 on: August 16, 2024, 11:41:53 am »
Thanks BrianHG for your feedback, I will try to make a test project in Xilinx ISE and make some simulation,
But I need help on setting up the IP,

Wait, you went ahead 1 step too quickly...
1 last thing first....

Option 1:
You need to decide whether to use 1 twisted pair to transmit data and clock.
Basically an ~1ghz link using our 3 bit trick using OSERDES2 & a special DLL setup.

Or,

Option 2:
Use the 2 balanced lines in the SATA cable to transmit data and clock separately.
Basically an ~300mb link with an ~150mhz clock using DDRIO buffers as a 2:1 serializers.


Here are the pros and cons:
Option 1 means you will generate code which will be heavily tied down to Spartan 6 specific DLL and PLL.
The important data bit will be a 1ns wide pulse in the signal, meaning with IO tolerances, looking at an ~ +/- 250ps jitter at the IO pins, to read capture this bit, your PCB wiring will need to be good enough to operate in an ~ +/-250ps wide read window.
This method will be a little tricky to get working right.

Option 2, (langwadt recommended and my preferred method)
Means you will generate code which will work not only on Spartan6, but, it will also work on Efinix, Lattice, Altera, and any other FPGA which has IO with a general purpose IO DDR capable pins.
The jitter at the IO pins will still be ~ +/-250ps, however, the data bits will now be 3ns long, meaning you serial capture window will be around +/- 1250ps.  Far easier for your PCB routing and cable tolerances.
Your core clock will operate around 150mhz, fairly easy on modern FPGA, even slow ones.
No special DLL/PLL timing considerations, all vendors FPGAs can handle a 150Mhz 1:1 clock in, clock out.
Much easier to code everything.


(I know ASMI said that the SATA spec doesn't specify twisted pair timing matching, but come on, a +/-1250ps read window between both pairs in the cable means to mess this up, you will need 1 pair in the cable to be over 10 inches longer or shorter than the adjacent pair and looking at the crimped SATA ribbon cable connector, I do not think that is mechanically possible within the laws of our universe.  If you were doing the speeds of Option 1 with a 250ps/bit timing window, then yes, things would be a lot more critical here.)


Which option will you choose 1 or 2?

And, if you do not have the experience handling 1gbit routing on a pcb or how to order impedance controlled PCBs, then to be safe, I would recommend only going with option #2.  Though, if you need to send data in the opposite direction, depending on how much bandwidth you need, you might be stuck with option #1 with a new problem, you cannot exactly use the embedded clock as precisely the data going in the opposite direction at the same speed.  Otherwise, you PCB routing and coding will be a nightmare as now each cable length matters, you, you will need to use 2x DLL / PLLs in each FPGA to custom clock each direction.
« Last Edit: August 16, 2024, 11:49:51 am by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #61 on: August 16, 2024, 12:18:14 pm »
Let me put it this way, do you even have a scope to inspect a 1GHz serial link?
This means at least 1GHz bandwidth, 10gsps for 1 channel, 5gsps with 2 channels on.
Proper 1GHz probes?
Preferably J-Fet amplified for low capacitance loads when reading data?

If you are not properly equipped, you might be forced into option #2, otherwise, you will be operating completely in the blind, relying purely on the FPGA internal digital signal spying utilities.
« Last Edit: August 16, 2024, 12:32:59 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #62 on: August 16, 2024, 02:00:26 pm »
with the two pairs of SATA you could also split the string in two, every other device on one pair, the other devices on the other pair. That would half the bandwidth requirement for each pair
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #63 on: August 16, 2024, 04:47:17 pm »
with the two pairs of SATA you could also split the string in two, every other device on one pair, the other devices on the other pair. That would half the bandwidth requirement for each pair
Once you try to embed a clock somewhere on one pair or both, your screwed up to the higher bandwidth.

For 320mb, you need a 160mhz channel for a clock with a 160mhz bandwidth for the 320mb serial stream.
Remember, at 320mb, 1 bit high, the next low, and back equals a 160mhz clock.

(Yes, I'm bumping up the bitrate so there is room for 224 packets, 142 bits each, instead of 200x125.  The gives a huge 24 packet break in between every 200 packet chunk for the option of a reverse direction message, optional checksum, while the extra 17 bits from the 125 offers initial start, message direction and board number signature IDs for each of the 200 sequential boards.  IE: sequential boards may now be placed out of sequence and for the final FPGA, the will each be able to write directly into dual port memory to a correct address instead of a dumb address increment counter where the chain will always be in order.)

 (224*142+192)*10000) = 320 mbits per second.  The absolute worst FPGAs DDRIOs can do this 320mb.
The dedicated clock line is nothing more than that, a pure 160.000 MHz clock.  The stupidest FPGA, even ones without a PLL can pass this one through.

Operating like this, ali_asadzadeh can potentially go to a bottom end 2-3$ Lattice FPGA, or a 2$ PLD with 512 macrocells, though your better off with the FPGA for cell density.


(Note: (224*142+192)*10000) = 320mbps.  / 16 = cheap 20MHz reference crystal, that 192 extra cycles are dumb empty filler to make the 10khz sample clock and packet size a perfect multiple of 20/40/80/160MHz.  Think of it as preamble and a guaranteed dead zone which cannot be filled for sync alignment.)
« Last Edit: August 16, 2024, 07:00:42 pm by BrianHG »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Cheap 250Mbs Link between Boads
« Reply #64 on: August 17, 2024, 03:47:18 am »
Just to chime in... a simple encoding system like 8b10b would:

- Keep bit rate to 300Mb/s - (Only a 25% coding over head rather than 200%)

- Provide the timing information as edge transitions are present in the bit stream

- Ensure that the signals are DC balanced, could be AC coupled.

- You would have the K symbols to allow for framing of data (e.g, you could wrap the data blocks with chosen "Start of Frame" and End Of Frame" symbols

- Would allow a crude for of error detection though invalid symbols.

- Could be implemented with a IDDR and ODDR privative and a 150MHz clock domain, rather than the work needed to get SERDES blocks running.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1931
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #65 on: August 17, 2024, 09:50:32 am »
Quote
Which option will you choose 1 or 2?
I have designed high speed PCB's before and I would choose Option 1, also I noticed something if we shoot for the 1Ghz link and only 4X serializer, the actual core should run @ 250Mhz, which I think the best speed I could get from spartan 6 is around 100Mhz.


Quote
Just to chime in... a simple encoding system like 8b10b would:

- Keep bit rate to 300Mb/s - (Only a 25% coding over head rather than 200%)

- Provide the timing information as edge transitions are present in the bit stream

- Ensure that the signals are DC balanced, could be AC coupled.

- You would have the K symbols to allow for framing of data (e.g, you could wrap the data blocks with chosen "Start of Frame" and End Of Frame" symbols

- Would allow a crude for of error detection though invalid symbols.

- Could be implemented with a IDDR and ODDR privative and a 150MHz clock domain, rather than the work needed to get SERDES blocks running.
I'm not familiar with 8b10b encoding, would you explain more, consider me a total noobe in this area
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Cheap 250Mbs Link between Boads
« Reply #66 on: August 17, 2024, 11:00:46 am »
8b10b is really neat. Originally discovered/Invented by IBM the patent has now expired. It is a great entry into coding theory.

Much of this is from memory, so you really want to refer to the Wikipedia page - https://en.wikipedia.org/wiki/8b/10b_encoding

At a very simple level, you replace every 8-bit byte with a 10-bit symbol and transmit that instead. The symbols are special.

The symbol set is 256 'D' (data) symbols, and 13 'K' symbols (Control Symbols - which don't represent data bytes).

Simplicity

To generate the codes the data byte is split into 5 bits and 3 bits, giving two numbers- one between 0 and 31, the other between 0 and 7. These parts is mapped to 6 bits and 4 bits to build the final symbol - so (almost) two simple lookup tables.

This gives the naming system of D.x.y or K.x.y - e.g. D.10.7 is the data symbol for 01010111 ('01010' = 10 dec, '111' = 7 dec).

Traditionally it was implemented using a relatively small number of logic gates, but of course you use lookup tables in FPGAs to do  the same mapping.

DC Balanced

Each D or K symbol has either five 1s and five 0s, or there are two encodings for the symbol, one with four 1s, one with six 1s. You chose between the two encodings based on the running disparity of the symbols before it, so the disparity jumps between +1 and -1.

This ensures that on any stream you can have a equal number of 1s and 0s (to within +/-1 bit). As the resulting stream has a long term average of 0.5, and can be AC coupled.

Clock recovery
Because of the way the symbols are constructed there will a large number of transitions between 1s and 0s. This allows the receiver to clearly identify bit boundaries and recover the data clock. You will never transmit "011111111111111111110" so the receiver will never be left wondering if there are 19 or 20 bits in that long run of 1s.

Synchronization

If the K.28.7 symbol (0011111000 or 1100000111) is not used , then the sequence 00111110 or 11000001 will never appear in a 8b10b stream, other then in the K.28.1 (0011111001 or 1100000110) and K.28.5 (0011111010/1100000101) symbols. This is neat because if you use this as your 'idle' symbol then you look for that pattern and recover symbol alignment.

The other K symbols can be used as non-date symbols. In your case assigning a 'start of frame' and 'end of frame' K symbol might be a good idea.

Efficiency
Because every 8 bits is mapped to 10-bit symbols, the overhead is fixed at 20%. This is the same as RS232 (8-data bits, one start bit, one stop bit), but is much better then schemes like Manchester coding.

« Last Edit: August 17, 2024, 11:21:56 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #67 on: August 17, 2024, 01:58:49 pm »
8b10b is really neat. Originally discovered/Invented by IBM the patent has now expired. It is a great entry into coding theory.

Much of this is from memory, so you really want to refer to the Wikipedia page - https://en.wikipedia.org/wiki/8b/10b_encoding

At a very simple level, you replace every 8-bit byte with a 10-bit symbol and transmit that instead. The symbols are special.

The symbol set is 256 'D' (data) symbols, and 13 'K' symbols (Control Symbols - which don't represent data bytes).

Simplicity

To generate the codes the data byte is split into 5 bits and 3 bits, giving two numbers- one between 0 and 31, the other between 0 and 7. These parts is mapped to 6 bits and 4 bits to build the final symbol - so (almost) two simple lookup tables.

This gives the naming system of D.x.y or K.x.y - e.g. D.10.7 is the data symbol for 01010111 ('01010' = 10 dec, '111' = 7 dec).

Traditionally it was implemented using a relatively small number of logic gates, but of course you use lookup tables in FPGAs to do  the same mapping.

DC Balanced

Each D or K symbol has either five 1s and five 0s, or there are two encodings for the symbol, one with four 1s, one with six 1s. You chose between the two encodings based on the running disparity of the symbols before it, so the disparity jumps between +1 and -1.

This ensures that on any stream you can have a equal number of 1s and 0s (to within +/-1 bit). As the resulting stream has a long term average of 0.5, and can be AC coupled.

Clock recovery
Because of the way the symbols are constructed there will a large number of transitions between 1s and 0s. This allows the receiver to clearly identify bit boundaries and recover the data clock. You will never transmit "011111111111111111110" so the receiver will never be left wondering if there are 19 or 20 bits in that long run of 1s.

Synchronization

If the K.28.7 symbol (0011111000 or 1100000111) is not used , then the sequence 00111110 or 11000001 will never appear in a 8b10b stream, other then in the K.28.1 (0011111001 or 1100000110) and K.28.5 (0011111010/1100000101) symbols. This is neat because if you use this as your 'idle' symbol then you look for that pattern and recover symbol alignment.

The other K symbols can be used as non-date symbols. In your case assigning a 'start of frame' and 'end of frame' K symbol might be a good idea.

Efficiency
Because every 8 bits is mapped to 10-bit symbols, the overhead is fixed at 20%. This is the same as RS232 (8-data bits, one start bit, one stop bit), but is much better then schemes like Manchester coding.

@hamster_nz, I don't think the OP has room for any ansyc clock recovery in his com.  At least according to the way the project was described.  If you may get the com working, but what happens to the perfect sync required for all the 10khz samplers.  What if one slightly slips ahead, where does the extra bit of 125 bits go?

On the other hand, I'll let you explain to the OP how to program his clock recovery system and how to make a new synchronous global PLL from it for a clean sampler clock.

The only reason for the original option#1, was to make a pure PLL locked parallel shared system clock between boards with data on 1 cable with the SERDES capabilities built into a Spartan 6.  I'll let you 2 work out how to do this without an embedded clock and how to simulate it verifying proper error free functionality.

I know with 3x or 4x oversampling, you can do software clock realignment seeing the slipping edge bit.  But this means each board-to-board will not have a dead perfect acquisition 10khz clock.  The will all have a go signal and skew the start of their 10khz samplers either a few clocks early or late.

The '1x0' trick was the same as a 3x oversampling clock, ( '1xx0' basically a 4x oversampling equivalent ) but you have a new guaranteed timing bit at the beginning of each transmitted bit.  If fact, the pattern is so small and repetitive, you can now PLL lock onto the rising edge of that signal so long as it is broadcasted continuously.  This trick doesn't require any SERDES trickery, you literally have bits 0 and 3 hard tied high and low while bits 1 and 2 are tied together to transmit a 125+extra serial shift register containing your package.

If the op want to continue with a clock embedded into every transmitted bit going out, then I need him to verify if the OSERDES2 can do 6:1 instead of 4:1 or 8:1.

No matter where he goes, DDR oversampling give him a maximum bitrate of 750mb while OSERDES's QDR oversampling gives him 950mb on a -2 speed grade Spartan.  A -3 speed grade Spartan can go to 1080mb, but that one might not be available in industrial temperature range which the OP needs, unless the OP isn't afraid of overclocking.


Also, the OP's original break neck 125 bits by 200 packets exactly without room for an extended break might be too close for comfort, though I see how he may want to originally minimize bandwidth.
« Last Edit: August 17, 2024, 02:24:43 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #68 on: August 17, 2024, 02:38:38 pm »
Also, dont forget with the async design, you repeating a message coming in from the previous board in the chain.  If you reference crystal is ever so slightly slower than the previous board, you may clean up and soft-reclock capturing the data coming in correctly, but your repeat transmission will be going out ever so slightly slower meaning your 4-8bit buffer may over flow breaking your transmission.

I'm not going to help the OP deal with all those implications as I though all he wanted was simple bit pass through synchronous pipe so it can fit into the cheapest tiny FPGA without any grand memory buffering and message re-timing/re-pacing code.  I was thinking about a complete com section being about 1 page of very dumb in->out verilog code with a 2-4bit shift register to transfer in to out with the other 125bit shift register to pipe through the current board's stored acquisition sample.
« Last Edit: August 17, 2024, 02:58:28 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #69 on: August 17, 2024, 08:22:54 pm »

The '1x0' trick was the same as a 3x oversampling clock, ( '1xx0' basically a 4x oversampling equivalent ) but you have a new guaranteed timing bit at the beginning of each transmitted bit.  If fact, the pattern is so small and repetitive, you can now PLL lock onto the rising edge of that signal so long as it is broadcasted continuously.  This trick doesn't require any SERDES trickery, you literally have bits 0 and 3 hard tied high and low while bits 1 and 2 are tied together to transmit a 125+extra serial shift register containing your package.

If the op want to continue with a clock embedded into every transmitted bit going out, then I need him to verify if the OSERDES2 can do 6:1 instead of 4:1 or 8:1.

I had a play with it in simulation and OSERDES will do 3:1, but I'd say forget it, it works in simulation but getting clocking and routing just right to meet timing  will likely take lots of work work

I had a much more naughty idea that seems it could work at 200MHz, so two strings would do 400Mbit

use a PLL to (re)generate 200MHz from the 200Mbit data, same PLL can generate 200MHz with 70% duty cycle, and 200MHz with 40% duty cycle

then use mux controlled by tx data to switch between the two clocks, should be possible to do glitch free since the switch always happen in the high period



 

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #70 on: August 17, 2024, 09:38:36 pm »
I had a play with it in simulation and OSERDES will do 3:1, but I'd say forget it, it works in simulation but getting clocking and routing just right to meet timing  will likely take lots of work work

I had a much more naughty idea that seems it could work at 200MHz, so two strings would do 400Mbit

use a PLL to (re)generate 200MHz from the 200Mbit data, same PLL can generate 200MHz with 70% duty cycle, and 200MHz with 40% duty cycle

then use mux controlled by tx data to switch between the two clocks, should be possible to do glitch free since the switch always happen in the high period
Your sim is not showing the DLL status.
Was the DLL already locked in and stable so soon in your sim?
Did you set the DLL input to divide by 2?
Why didn't you show your DLL's output 150MHz clock in your sim VS source data?

On the receive end, if tuned right, the 6:1 deserialize should show a parallel output of 6 bits 6'b{1,data,0,1,data,0} if tuned properly on every 150mhz clock.

How did your home made auto-phase lock adjustment tuning algorithm do?
How many phase steps / degrees of valid data window did the deserializer output the proper mask 6'b{1,x,0,x,1,x,0} ?

A 6:1 OSERDES2 running at 150Mhz should give you 900mb.
Since there are 2 data bits running at 150Mhz, you should have a 300megabit link.

Remember, you cannot use the PLL_BASE, it does not have that crucial divide by 2 on the reference clock input.
The PLL's phase comparator will get all confused with the noisy bit.

Note that there is still an async way to do this with a 300mb straight serial data, but the OP wanted a synced clock.
« Last Edit: August 17, 2024, 09:46:54 pm by BrianHG »
 

Offline glenenglish

  • Frequent Contributor
  • **
  • Posts: 454
  • Country: au
  • RF engineer. AI6UM / VK1XX . Aviation pilot. MTBr
Re: Cheap 250Mbs Link between Boads
« Reply #71 on: August 17, 2024, 10:24:55 pm »
It would be fine at 125 MHz/ 250Mbps, or even double that if you used a T20 or a S7.....

but why why why restrict yourself to Spartan 6 ? A part from 15 years ago ?
There are plenty of easy to use modern parts...

Why design a product for a customer using obsolete parts ? Obsolete parts and obsolete + unsupported tools? this isnt doing any favours for the customer....

Modern parts will easy do 400 MHz / 800 Mbps in even the lowest speed grade.
« Last Edit: August 17, 2024, 10:38:16 pm by glenenglish »
 
The following users thanked this post: BrianHG

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #72 on: August 17, 2024, 10:38:55 pm »
Modern parts will easy do 400 MHz / 800 Mbps in even the lowest speed grade.
900mb for the slowest spartan 6, and we need it anyway we approach the project.  But yes, I wouldn't use such an old part.

1280mb RX, ie 4x over sampling a 320mb serial com would make a simple async solution.
960mb RX, ie 3x oversampling a 320mb serial com would make a async solution where you have to shift an odd number of bits on the RX end.  Just a bit more annoying to work the logic.

350mb would give us breathing room.
Above, I'm speaking in old fashioned 8N1 serial characters.
« Last Edit: August 17, 2024, 11:28:46 pm by BrianHG »
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4735
  • Country: dk
Re: Cheap 250Mbs Link between Boads
« Reply #73 on: August 17, 2024, 11:07:19 pm »
I had a play with it in simulation and OSERDES will do 3:1, but I'd say forget it, it works in simulation but getting clocking and routing just right to meet timing  will likely take lots of work work

I had a much more naughty idea that seems it could work at 200MHz, so two strings would do 400Mbit

use a PLL to (re)generate 200MHz from the 200Mbit data, same PLL can generate 200MHz with 70% duty cycle, and 200MHz with 40% duty cycle

then use mux controlled by tx data to switch between the two clocks, should be possible to do glitch free since the switch always happen in the high period
Your sim is not showing the DLL status.
Was the DLL already locked in and stable so soon in your sim?
Did you set the DLL input to divide by 2?
Why didn't you show your DLL's output 150MHz clock in your sim VS source data?

On the receive end, if tuned right, the 6:1 deserialize should show a parallel output of 6 bits 6'b{1,data,0,1,data,0} if tuned properly on every 150mhz clock.

How did your home made auto-phase lock adjustment tuning algorithm do?
How many phase steps / degrees of valid data window did the deserializer output the proper mask 6'b{1,x,0,x,1,x,0} ?

A 6:1 OSERDES2 running at 150Mhz should give you 900mb.
Since there are 2 data bits running at 150Mhz, you should have a 300megabit link.

Remember, you cannot use the PLL_BASE, it does not have that crucial divide by 2 on the reference clock input.
The PLL's phase comparator will get all confused with the noisy bit.

Note that there is still an async way to do this with a 300mb straight serial data, but the OP wanted a synced clock.

the duty cycle is (barely) within limits for 200MHz, the PLL only uses rising edge why else would it have such a wide allowable duty cycle range?
but you can divide input before the pll


 

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8101
  • Country: ca
Re: Cheap 250Mbs Link between Boads
« Reply #74 on: August 17, 2024, 11:35:18 pm »
With 2 channel com, he could have gone with a $1.50 Efinix FPGA.  (800mb peak LVDS)
Using a HDMI cable, he would have had 4 twisted pairs, 2 for TX, another 2 for RX allowing for a sync clock plus additional.  Though, I am not certain on the consistency of available HDMI cables, though running at a slow 300mb should make the worst cable AOK.  Also, with HDMI cables, you also get power pins plus another 5 wires to share between PCBs.  SATA is exclusive 2 twisted pairs and GND.

In Async mode, it is that slipping bit you need to keep track of which is why I recommend at least 3x oversampling when running in async mode.  (Async having no clock embedded in the serial stream.  You are literally re-adjusting the read data position based on every detected transition in the RX stream in real time, so the transmitter runs at 300mb, but the receiver reads with a local crystal oscillator at 900mb, slipping the read position to the next adjacent bit at every detected input transition.  You would want something like a 1-8N2, or 1-16N2 serial pattern to allow slippage up or down. 1 start bit, 2 stop bits.)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf