Author Topic: Is there any standard reference designs for a high speed inter-FPGA bridge?  (Read 7440 times)

0 Members and 3 Guests are viewing this topic.

Online tszaboo

  • Super Contributor
  • ***
  • Posts: 7755
  • Country: nl
  • Current job: ATEX product design
Cost of a Zynq 7010: $15 (new).
Not everyone is willing to put up with dodgy Chinese sellers selling you random parts while trying to fool you into believing it's something else. Some like to only buy through official channels. This is especially so for commercial products when you need to be sure the seller you bought chips from will still be there in 5 years from now. I tried it personally and the seller sent me a SG1 device instead of SG2 I ordered and somehow though that it was OK to do so :palm: I got my money back (and still got a chip, thou I haven't soldered it yet to see if it works at all), but if it would be a commercial project, I'd be totally screwed as for some time I'd have no money nor chips.
Oh, and please stop calling this crap "new". It's not. It might be from old stocks, or it may be desoldered reballed junk that may or may not work, or (my case) it might be a different chip altogether from the one you ordered.
You do realize that FPGAs are not sold at the same price "as seen on digikey". It is a lot lower.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
You do realize that FPGAs are not sold at the same price "as seen on digikey". It is a lot lower.
Nowhere in my post you'll see mentions of a price. That is intentional.
The problem with buying FPGAs on aliexpress is not price. It's the fact that it's not an official channel, so there are no guarantees whatsoever that you will actually get what you've ordered, and that it will be brand new chips, and they will work. It's true that you can get your money back if things go south, but for commercial projects this is not good enough because time wasted costs money too.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27558
  • Country: nl
    • NCT Developments
For example, I have a board with a Zynq-7020 and a Cyclone IV connected using some traces. Now I want to implement some kind of high speed bridge that exposes an AMBA AHB interface from the Zynq to the Cyclone IV. Is there any standard implementation for that, which can encapsulate AMBA AHB transactions into some form of external signaling format and shoot it over the traces, and decapsulate it on the other end?

I want to avoid SerDes at all costs (since my chips don't have any to begin with) but differential serial transmission is preferred (so length matching between pairs can be less critical.)
Some peope at CERN have cooked up something for the Wishbone bus called Etherbone ( http://cds.cern.ch/record/1563867 ).
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
For inter-chip or inter-board, we have "ChipLink". For example this is used to send coherent memory traffic between the HiFive Unleashed board and either a Xilinx VC707 FPGA board, or MicroSemi "HiFive Unleashed Expansion Board" over an FMC connector using 2x35 pins @200 MHz. The boards bridge this to PCIe (and other things) and we run video cards, SSDs etc there.
Why haven't you just integrated PCIe root complex instead? This would allow building a RISC-V PC without having to use $5k FPGA devboards, not to mention all schematic complexities this brings. If it would have PCIe, I would've bought these chips in a heartbeat to design a motherboard for such PC. This feels such a missed opportunity. Nobody in the right mind would make a baseboard with super-expensive FPGA only to get PCIe capability. And decision to use home-brewed bus is even more annoying.
« Last Edit: June 06, 2019, 03:40:50 pm by asmi »
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
For inter-chip or inter-board, we have "ChipLink". For example this is used to send coherent memory traffic between the HiFive Unleashed board and either a Xilinx VC707 FPGA board, or MicroSemi "HiFive Unleashed Expansion Board" over an FMC connector using 2x35 pins @200 MHz. The boards bridge this to PCIe (and other things) and we run video cards, SSDs etc there.
Why did you just integrated PCIe root complex instead? This would allow building a RISC-V PC without having to use $5k FPGA devboards, not to mention all schematic complexities this brings. If it would have PCIe, I would've bought these chips in a heartbeat to design a motherboard for such PC. This feels such a missed opportunity. Nobody in the right mind would make a baseboard with super-expensive FPGA only to get PCIe capability. And decision to use home-brewed bus is even more annoying.
Actually how about just adopting AMD AM4 socket in the next ASIC version? That socket can require just DDR4 and PCIe (to both the graphics card slot and the southbridge) to work, so RISC-V in AM4 means people can just buy the chip and drop it in a standard COTS AMD Ryzen motherboard, and it would more or less work.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
Actually how about just adopting AMD AM4 socket in the next ASIC version? That socket can require just DDR4 and PCIe (to both the graphics card slot and the southbridge) to work, so RISC-V in AM4 means people can just buy the chip and drop it in a standard COTS AMD Ryzen motherboard, and it would more or less work.
No it would absolutely not work. For example, BIOS/UEFI code is x86 binary which RV would obviously not be able to execute. Same goes for video and network BIOSes - they also contain x86 code. Using a socket would only increase the price of a chip (and a baseboard) for zero gain.

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4327
  • Country: nz
For inter-chip or inter-board, we have "ChipLink". For example this is used to send coherent memory traffic between the HiFive Unleashed board and either a Xilinx VC707 FPGA board, or MicroSemi "HiFive Unleashed Expansion Board" over an FMC connector using 2x35 pins @200 MHz. The boards bridge this to PCIe (and other things) and we run video cards, SSDs etc there.
Why haven't you just integrated PCIe root complex instead? This would allow building a RISC-V PC without having to use $5k FPGA devboards, not to mention all schematic complexities this brings. If it would have PCIe, I would've bought these chips in a heartbeat to design a motherboard for such PC. This feels such a missed opportunity. Nobody in the right mind would make a baseboard with super-expensive FPGA only to get PCIe capability.

"just"

Because it's a test chip, SiFive's first 28nm chip, the world's first Linux-capable 64 bit quad core 1.5 GHz RISC-V, and by far more complicated than SiFive's only previous chip, a single core 32 bit low speed FE310 microcontroller.

*Obviously* future volume production chips are highly likely to have PCIe built in -- that's hardly giving any secrets away -- but it's needless complication for a first test chip when PCIe can be provided externally with off-the-shelf boards without severely limiting performance (unlike, say, DDR).

If those boards cost $2000 or $3500 (vc707), that's an absolute bargain compared to the cost and time and risks of developing in-house PCIe and associated SerDes and PHY technology for a test chip that's only ever going to have a few hundred made. (or licensing it, obviously, but people already complain enough about the proprietary DDR4 IP that was used)

Quote
And decision to use home-brewed bus is even more annoying.

Not exactly home-brewed. TileLink has been under development and used in numerous Berkeley projects for a number of years.

Your "home brewed" is other people's "free and open non-proprietary standard" that is just as good as AMBA4 that is controlled by the company with the most to lose if RISC-V or other open ISAs gain significant market share.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3237
  • Country: ca
... the cost and time and risks of developing in-house PCIe and associated SerDes and PHY technology for a test chip that's only ever going to have a few hundred made. (or licensing it, obviously, but people already complain enough about the proprietary DDR4 IP that was used)

I guess by the time it gathers enough financing to climb to PC-level, it won't be free and open source any more :(
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Actually how about just adopting AMD AM4 socket in the next ASIC version? That socket can require just DDR4 and PCIe (to both the graphics card slot and the southbridge) to work, so RISC-V in AM4 means people can just buy the chip and drop it in a standard COTS AMD Ryzen motherboard, and it would more or less work.
No it would absolutely not work. For example, BIOS/UEFI code is x86 binary which RV would obviously not be able to execute. Same goes for video and network BIOSes - they also contain x86 code. Using a socket would only increase the price of a chip (and a baseboard) for zero gain.
If Loongson boards is something to go by, the CPU itself contains some internal ROM code serving the purpose of a BIOS. The x86 UEFI is just skipped and unused.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 5017
  • Country: si
Don't know how we ended up on RISC-V but reusing a AMD motherboard doesn't make much sense.

These days motherboards only really provide IO ports and power, that's it. All of the major parts just connect directly to the CPU socket like DDR memory, PCIe slots, integrated video etc.. What the motherboard actually does is turn some of those PCIe lanes into other ports (SATA,USB,Audio,Network...) using the south bridge and some extra chips around it, since some of these ports are critical for booting it also provides a bios that can bring these ports up and load the OS from them. It also converts 12V into whatever core voltages it needs. That's about it for a typical consumer motherboard.

If RISC-V had a built in PCIe controller (as a modern motherboard would require it) then making a dev board with PCIe would just be a matter of running a few traces to a PCIe card edge connector.

Devboards for brand new technology will always be expensive. You need the large mass volume to bring production costs down and cover the huge engineering costs.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4327
  • Country: nz
... the cost and time and risks of developing in-house PCIe and associated SerDes and PHY technology for a test chip that's only ever going to have a few hundred made. (or licensing it, obviously, but people already complain enough about the proprietary DDR4 IP that was used)

I guess by the time it gathers enough financing to climb to PC-level, it won't be free and open source any more :(

That's entirely up to whoever develops said PC-level chip. If it's you, you're free to open-source the whole thing if you want.

Right now we're at about early 2000s PC level, or low end Android phone now (plenty of them for sale with quad core A53 only).

My guess is the proprietary RISC-V chips will stay three to five years ahead of the totally open ones, but not much more than that.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
These days motherboards only really provide IO ports and power, that's it. All of the major parts just connect directly to the CPU socket like DDR memory, PCIe slots, integrated video etc.. What the motherboard actually does is turn some of those PCIe lanes into other ports (SATA,USB,Audio,Network...) using the south bridge and some extra chips around it, since some of these ports are critical for booting it also provides a bios that can bring these ports up and load the OS from them. It also converts 12V into whatever core voltages it needs. That's about it for a typical consumer motherboard.
BIOS/UEFI is also responsible for detecting memory modules and configuring memory controller(s), performing PCIe enumeration, bus assignment and BAR mapping (among other things). Also south bridge (at least on Intel platform) is connected using DMI bus, which is substantially different from PCIe.
« Last Edit: June 07, 2019, 12:35:28 pm by asmi »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
That's entirely up to whoever develops said PC-level chip. If it's you, you're free to open-source the whole thing if you want.
I'm actually working on PCIe root complex for my own hobby RISC-V PC that is going to be based on FPGA (since I have no money for ASICs), which I plan to open source once there will be enough of it to actually work. This is why I got so excited when I heard about your chip, and was so disappointed to see the lack of PCIe onboard  - this is a really major showstopper. At least in my case I don't have to design my own PCIe PHY, data link and transaction layers as there is pre-exising hard IP for that purpose integrated into FPGAs I'm planning to use, so I only have to design higher-level logic of handling TLPs. The idea for the module is to be as similar to x86 PCIe as possible as it's by far the most straightforward implementation I've ever seen (all PCIe IPs I've read about in ARM SoCs are just a big mess). I want using it to be as easy as it is on x86, where once configured and mapped, PCIe endpoints become essentially the same as using any other memory mapped peripheral with zero PCIe-related programming required.

Right now we're at about early 2000s PC level, or low end Android phone now (plenty of them for sale with quad core A53 only).
The latter is probably more correct, as the cornerstone of PC (and the reason they were so successful) has always been extensibility (via ISA, PCI, PCIe buses).

My guess is the proprietary RISC-V chips will stay three to five years ahead of the totally open ones, but not much more than that.
I hope they will be open from the beginning, but I'm afraid you'll probably be right. All ideals of OS tend to be tossed aside when big money are getting involved :(
« Last Edit: June 07, 2019, 12:37:32 pm by asmi »
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 5017
  • Country: si
BIOS/UEFI is also responsible for detecting memory modules and configuring memory controller(s), performing PCIe enumeration, bus assignment and BAR mapping (among other things). Also south bridge (at least on Intel platform) is connected using DMI bus, which is substantially different from PCIe.

Yes PCIe enumeration is part of "getting it to boot into a OS" because PCI and PCIe cards can act as boot devices since they can hold PCIe SSDs, SATA controller cards, network cards with network boot support etc.. Same reason why USB is enumerated in BIOS since it might be holding a USB drive with bootable code. Legacy software also might not support a USB mouse/keyboard so it enumerates those and provides its keyboard abstraction driver for it. The boot code and all of this also needs to fit somewhere so it also brings the RAM so it has room to put it into.

Once the OS really gets onto its feet, it offten throws a lot of this out, such as tossing out the BIOS keyboard driver and grabbing the USB controller to talk to it directly. Similar is done to a lot of the other hardware (This is why older Windows OSes are so difficult to boot from NVME SSDs, they start booting fine, but then upon tossing out the default bios disk driver find themselves without a working system drive).

On other platforms such as ARM this whole job is typically done by Uboot or similar. That Uboot firmware is only compatible with one board/cpu so its not portable at all.

And yes Intel does use a different bus to the southbridge that does the same thing really. But AMD pretty much uses a repurposed PCIe bus for this.
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
BIOS/UEFI is also responsible for detecting memory modules and configuring memory controller(s), performing PCIe enumeration, bus assignment and BAR mapping (among other things). Also south bridge (at least on Intel platform) is connected using DMI bus, which is substantially different from PCIe.

Yes PCIe enumeration is part of "getting it to boot into a OS" because PCI and PCIe cards can act as boot devices since they can hold PCIe SSDs, SATA controller cards, network cards with network boot support etc.. Same reason why USB is enumerated in BIOS since it might be holding a USB drive with bootable code. Legacy software also might not support a USB mouse/keyboard so it enumerates those and provides its keyboard abstraction driver for it. The boot code and all of this also needs to fit somewhere so it also brings the RAM so it has room to put it into.

Once the OS really gets onto its feet, it offten throws a lot of this out, such as tossing out the BIOS keyboard driver and grabbing the USB controller to talk to it directly. Similar is done to a lot of the other hardware (This is why older Windows OSes are so difficult to boot from NVME SSDs, they start booting fine, but then upon tossing out the default bios disk driver find themselves without a working system drive).

On other platforms such as ARM this whole job is typically done by Uboot or similar. That Uboot firmware is only compatible with one board/cpu so its not portable at all.

And yes Intel does use a different bus to the southbridge that does the same thing really. But AMD pretty much uses a repurposed PCIe bus for this.
Most boot-related PCIe and USB drivers are standardized: PCIe itself, XHCI, EHCI, OHCI and UHCI for USB, AHCI and NVMe for local storage and USB MSC for USB-attached storage. It is normal for a modern OS to perform a full system re-enumeration during boot to establish driver usage too.

The whole point of reusing an AMD motherboard is allowing the ASIC to carry just the processing cores, DRAM controller, PCIe root complex and an alternative BIOS, while with little external effort achieve a usable system.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
The whole point of reusing an AMD motherboard is allowing the ASIC to carry just the processing cores, DRAM controller, PCIe root complex and an alternative BIOS, while with little external effort achieve a usable system.
1. It will increase the price of the chip. BGA packages are cheap, specialized sockets/connectors are not - especially if you only want to do a (relatively) small batch.
2. AM4 socket is probably a proprietary IP of AMD. Who says AMD is going to let them use it?
3. The voltage rails are going to be incompatible. The ASIC is  powered by 0.9 V rail, while AMD CPU use 1.37-1.4 V. If you power RV ASIC with 1.4 V, chances are it's going to go bust before you can say "oops".
4. DRAM controller designed to work only with a small selection of DRAM chips, and the one that can work with pretty much anything out there on a market - are two very different things in terms of complexity.
5. There is much more going on on a motherboard than meets the eye. It's far from being just a bunch of connectors.
6. Most importantly, PCIe connectors are cheap, and so are PCBs and regular BGA assembly. So there is little sense in trying to save anything.

But fundamentally what's the point? If someone would want to build an RV PC, he would have to buy all components anyway. And I seriously doubt that motherboard is going to be the most expensive part.

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
The whole point of reusing an AMD motherboard is allowing the ASIC to carry just the processing cores, DRAM controller, PCIe root complex and an alternative BIOS, while with little external effort achieve a usable system.
1. It will increase the price of the chip. BGA packages are cheap, specialized sockets/connectors are not - especially if you only want to do a (relatively) small batch.
2. AM4 socket is probably a proprietary IP of AMD. Who says AMD is going to let them use it?
3. The voltage rails are going to be incompatible. The ASIC is  powered by 0.9 V rail, while AMD CPU use 1.37-1.4 V. If you power RV ASIC with 1.4 V, chances are it's going to go bust before you can say "oops".
4. DRAM controller designed to work only with a small selection of DRAM chips, and the one that can work with pretty much anything out there on a market - are two very different things in terms of complexity.
5. There is much more going on on a motherboard than meets the eye. It's far from being just a bunch of connectors.
6. Most importantly, PCIe connectors are cheap, and so are PCBs and regular BGA assembly. So there is little sense in trying to save anything.

But fundamentally what's the point? If someone would want to build an RV PC, he would have to buy all components anyway. And I seriously doubt that motherboard is going to be the most expensive part.
1. Since the RV chip is significantly smaller physically than a Ryzen, instead of a specialized PGA package, it can be a RV BGA chip + a QSPI Flash chip for the alternative BIOS on a piece of FR4 interposer. This is actually a trick some Shenzhen sellers use to convert Intel laptop chips for their desktop motherboard, by adding a FR4 interposer.
2. I don't think the pinout of a chip socket is copyrightable or patentable. And all signals we are interested here are open standards.
3. Depending on the chip's power consumption and the size of the interposer, it should be possible to integrate a Vcore regulator on the interposer. Alternatively since this level of design change more likely than not requires a new chip revision anyway, the new chip revision can be made with a maximum Vcore of 1.45V to tolerate both native BGA design and AM4-based design.
4. Self-calibration of DRAM controller is a problem you need to solve before RV chips can be taken seriously, as the use of DIMMs are practically required as soon as you leave the embedded/mobile market. Virtually all desktop computers and all servers uses DIMMs.
5. What might go wrong then? All the interfaces involved here are either open standard, or at least have open source drivers in Linux kernel you can learn from.
6. A small FR4 interposer is cheaper than even a mini-ITX board, and a mass market COTS AM4 board is more likely than not to be cheaper than a small volume native BGA board too if matched feature for feature.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4327
  • Country: nz
That's entirely up to whoever develops said PC-level chip. If it's you, you're free to open-source the whole thing if you want.
I'm actually working on PCIe root complex for my own hobby RISC-V PC that is going to be based on FPGA (since I have no money for ASICs), which I plan to open source once there will be enough of it to actually work. This is why I got so excited when I heard about your chip, and was so disappointed to see the lack of PCIe onboard  - this is a really major showstopper.

When the FU540 taped out about 1.7 years ago (and first working boards were ready 1.3 years ago) SiFive had fewer than 30 employees. Now there are over 400. You might imagine there are a lot of things in the pipeline.

Quote
At least in my case I don't have to design my own PCIe PHY, data link and transaction layers as there is pre-exising hard IP for that purpose integrated into FPGAs I'm planning to use

Yes, FPGAs give you a lot. That's why we use them for the PCIe too, for the first test boards.

They make really low performance CPUs though :-(

MicroSemi (a Microchip subsidiary) will soon be selling new PolarFire FPGAs (the "PolarFire SOC") with SiFive's quad core (plus 1) 64 bit FU540 as a hard macro inside the FPGA, the same way Xilinx Zynq has ARM cores.

https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
When the FU540 taped out about 1.7 years ago (and first working boards were ready 1.3 years ago) SiFive had fewer than 30 employees. Now there are over 400. You might imagine there are a lot of things in the pipeline.
Well I hope one day I will see a CPU with PCIe, DRAM that is going to be available in a reasonable package (not a 0.3 mm pitch BGA with bazillion of balls, something like 0.7 mm and up BGA with <500 balls), and it won't have a million of useless (for PC) peripherals (which is my personal grief with existing ARM SoCs - all PCIe-enabled ones have a crap load of peripherals that I don't want).

They make really low performance CPUs though :-(
Low per-core performance can be somewhat compensated by having a large amount of cores. Which is what's my current goal - I'm working on a 64 bit core that will run at 250 MHz on a cheap Artix-7 FPGAs, which can be pushed up to ~300 MHz (maybe even 350 MHz) on Kintex if I can get my hands on them for reasonable money, and then just bunch up however many of them that will fit inside FPGA. All FPGA 64 bit RV cores I've tried so far can barely go above 100 MHz on A7, which is why I'm on a mission to change that. MUL/DIV will obviously be multicycle, but the rest should give close to 1 IPC (fully-pipelined).
MicroSemi (a Microchip subsidiary) will soon be selling new PolarFire FPGAs (the "PolarFire SOC") with SiFive's quad core (plus 1) 64 bit FU540 as a hard macro inside the FPGA, the same way Xilinx Zynq has ARM cores.

https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga
Are their design tools just as free as Xilinx's ones are? Never encountered their FPGAs yet.

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4327
  • Country: nz
Low per-core performance can be somewhat compensated by having a large amount of cores. Which is what's my current goal - I'm working on a 64 bit core that will run at 250 MHz on a cheap Artix-7 FPGAs

That would be very cool! Good luck!

As well as 1 IPC, are you expecting to be able to run dependent integer instructions back to back?

PicoRV32 runs at a pretty high clock rate, but doesn't even try to be 1 IPC.

SiFive evaluation FPGA bitstreams are 1 IPC but are deliberately generated from RTL designed for SoC not for FPGA, to help verify that the eventual SoC will work, even though this results in lower clock frequency and higher LUT usage than FPGA-optimised RTL would.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
That would be very cool! Good luck!
Thank you! Right now the summer is somewhat getting in the way (my wife doesn't like me staring into the monitor while the weather is so nice outside :) ), but I already have some blocks ready, and the project is slowly progressing.

As well as 1 IPC, are you expecting to be able to run dependent integer instructions back to back?
Yes, absolutely. DSP blocks that I use for ALU have a dedicated feedback path from output back to input, so there shouldn't be any issues with that. I actually already have designed ALU and tested it in hardware at 250 MHz and it worked OK, but of course when you integrate modules together there might be some routing issues. These DSP tiles can be run as high as 500 MHz when fully pipelined, but I'm trying to avoid making pipeline too long as pipeline flushes become progressively more expensive.

PicoRV32 runs at a pretty high clock rate, but doesn't even try to be 1 IPC.
Well as I understand it's design goal was to be small. I don't have such constraint - there are lots of LUTs and registers in A7 FPGAs, so I can afford to "invest" them into the proper pipeline.


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf