Does anyone know what fabric is in the Zynq UltraScale?
It doesn't seem to be the same as the other 7 series devices, being on a 20nm process (Spartan/Artix/Kintex/Virtex-7 are all 28nm)
I'll have to give the DDR3 MIG a second thought. But, I don't think memory bandwidth is the ultimate limit here unless we were looking at sampling rates above 2.5GSa/s and those start requiring esoteric ADC parts with large BOM figures attached to them.
Could build a $3000 oscilloscope but would people really buy that in enough volume to make it worthwhile?
You also need to think about how long it will take to process the data. In my own USB design data came in at 200Ms/s but it could process acquired data at over 1000Ms/s. Say you have 4 channels with 500Mpts of memory with a maximum samplerate of 250Ms/s. Having a memory bandwidth of 1Gs/s would be enough for acquisition purposes. However in such a case you don't want the memory bandwidth between the processing part (whether inside the FPA or external) to become a bottleneck. Especially if the memory bandwidth needs to be shared between sampling and processing (think about double buffering here). Otherwise things like decoding and full record math will become painfully slow.
Yes so that is the goal: 32-bit DDR3 interface @ 667MHz, assuming a standard Zynq 7020 in enhanced speed grade, gets us up to 5.3GB/s.
100,000 wfm/s at ~600 points per waveform is only a read back bandwidth of 60MB/s. It's mostly the write bandwidth you need. (You need the write bandwidth for the pre-trigger, assuming you want a (pre-trigger*nwaves) bigger than the blockRAM supports.)
Read bandwidth starts trending higher, strangely enough as you go to a longer timebase and the blind time reduces as a fraction of the active acquisition time. At which point the current limitation is the CSI-2 bus to the Pi, and the Pi itself has memory bandwidth issues.
I have the capability to implement a 4 lane CSI-2 peripheral, which doubles bandwidth to around 3.2Gbit/s (400MB/s). At which point we are nearing the capacity of PCI-e or USB3, although only in one direction.
One reason to go to 32-bit interface is that we then have the performance available to do a write-read-DSP-write-read cycle -- we can start using the DSP blocks to work on the waveform data we just acquired, and then render it for the next frame. That would allow the DSP fabric to be used in a (psuedo-)pipelined manner. I'm working on a concept for the render-engine on the FPGA to see how practical it would be.