tom66, thank you, this is super impressive work!
A few years ago I've worked a bit on the Siglent SDS1x0xX-E (https://github.com/360nosc0pe/fpga) reverse-engineer/hack. We've got it to a level where we can control the frontends (coupling, attenuation, BW), capture the ADC data, and push that to memory on the PL. On the PS, we had Linux and some test code to pull data out of RAM and display it; eventually the goal was to render into accumulation buffers in blockram (which is what Siglent does, hence the crappy resolution to make it fit), but didn't get that far - we never got further than basically driving the hardware correctly, but that part worked well.
Without going too much into the topic of creating new hardware vs. hacking existing hardware, I think the design shares a lot of the same choices so for an open source oscilloscope, I would be very interested in cooperating and/or potentially porting your code to this platform.
Also, nice work on the CSI-2 interface! How does your CSI-2 Phy look on the FPGA side? Do you need to implement LP support or only high-speed? This is a very elegant, cheap and fast solution to capture lot of data into a RPI. (I've so far always used an FT2232H in FIFO mode, but it adds significant cost and especially on a RPi3, the USB alone eats a full CPU core due to the bad USB controller design.) I assume receiving data on CSI-2 doesn't take up a lot of CPU resources on the RPi if you can DMA large blocks.
Interesting project! I am impressed someone managed to do that.
There may be some 'scope' for collaboration, so let's keep talking and see if we can help each other out. Not sure how much would be reusable, but maybe some would be.
Regarding CSI-2. I was able to write a PLL register on an authentic Pi camera to get clock down to 12MHz ... image goes bad (too dark because shutter times etc wrong) but you can then switch the camera into a test pattern mode. Using this, you can reverse-engineer the protocol on as little as a Rigol DS1074Z. I built a board to allow me to do this - it sits between a Pi camera and a Pi and allows me to 'snoop' on the bus between the two (see attached)
The CSI-2 Phy on the FPGA side is an implementation of Xilinx XAPP894 using the Passive circuit they suggest with custom Verilog driving a pair of OSERDESE2 blocks and a bloody complex FSM to manage the whole process of generating packets and data streams. I prototyped this on a smaller PCB in the first run and spent a few months reverse engineering the protocol using what documentation I could find. It is something I really need to re-engineer at some point. It was initially designed with a BlockRAM interface i.e. data would be copied into BRAM and output from there. That was sufficient for testing but eventually I ended up bolting on an AXI stream interface. So you set up a transfer of X lines of video data each with 2048 bytes and the AXI DMA manages the rest of this. To simplify things, the two lanes terminate at the same moment (i.e. odd data lengths are fundamentally unsupported.) But I want to add the capability (as CSI-2 supports) for odd line lengths and jumbo packets at some point.
Annoyingly with a Pi it is 'all or nothing'... if you don't get it all right it doesn't work at all.
One consequence of this design choice is all packets have to be 2048 byte multiples - if they are not they are padded with null bytes. So not useful for small packets - those are sent over the SPI bus right now. But the protocol is fairly robust. I can reliably transfer 180MB/s from Zynq RAM to the Pi for hours on end with zero bit errors.
I don't implement the true LP protocol as the Pi camera doesn't use it so I don't support e.g. lane turnaround or low speed communication over that. I do of course implement the start-of-transmission and end-of-transmission signals, and the small packet header format for SoF/EoF and the larger packet format. Presently the CRC is set to all zeroes ... the Pi doesn't seem to use this and it makes the logic easier.
I also implement start and end on the clock lane, putting the clock lane into LP when not transmitting. There is no need in the specification to do this, but it improves reliability if the Pi failed to sync onto the first SoT packet it would never see any data for the duration of operation. It also saves power (about 0.1W).
One interesting way you can determine if a device is actually utilising the checksum is to deliberately degrade the link. In my case I added some 10pF to D1+ D1- pair, I had plenty of bit corruption, but all lines were appearing on the data and the frame was otherwise intact. That told me Pi ignores the checksum (or sets an ignorable error flag/increments some counter) which meant I could avoid implementing that part of the specification.
You are correct that on the Pi side this is all DMA driven so the data essentially arrives in memory at a given point and you can read it from there. You need to be careful of a few things:
- Pi and transmitter need to both know how big the packet is (so if you send 2045 lines, set the receiver to 2045 lines) otherwise an odd effect where the first few lines get offset with garbage occurs
- The Pi needs to be 'ready' to receive before the FPGA starts otherwise the CSI core gets into an error state
At present only the process that uses MMAL can access the data at a given pointer, which creates a few headaches. That would be good to solve. If you want to share the data between processes, it requires a memcpy
because the MMAL data is private to a given process. There is a Linux-kernel solution to this that a friend was looking into for me, but I need to awaken that.