Shifting the dots is computationally simple even with sinx/x (which is not yet implemented). It's just offsetting a read pointer and a ROT-64 with an 8-bit multiple, practically perfect FPGA territory. In the present implementation I simply read 0..3 dummy words from the FIFO, then rotate two words to get the last byte offset.
Noting that triggers in modern scopes are aligned more finely than the sample rate (interpolation), with the reconstruction and interpolation methods also dependent on the front end characteristics. Expect the rendering speeds to collapse in a software/GPU approach once you put in that phase alignment and sinc interpolation.
In better news if you're going down an all digital trigger route (probably a good idea) then the vast majority of "trigger" types are simply combinations of 2 thresholds and a one shot timer, which are easy enough. That can then be passed off to slower state machines for protocol/serial triggers. But without going down dynamic reconfiguration or using multiple FPGA images supporting a variety of serial trigger types becomes an interesting problem all of its own.
As I understand it, and please DSP gurus do correct me if I am wrong, if the front-end has a fixed response to an impulse (which it should do if designed correctly), and you get a trigger at value X but intend the trigger to be at value Y, then you can calculate the real time offset based on the difference between these samples which can be looked up in a trivial 8-bit LUT (for an 8-bit ADC). It's reasonably likely the LUT would be device-dependent for the best accuracy (as filters would vary slightly in bandwidth) but this could be part of the calibration process and the data burned into the 1-Wire EEPROM or MCU.
In any case there is a nice trade-off that happens as the timebase drops: you are processing less and less samples. So, while you might have to do sinx/x interpolation on that data and more complex reconstructions on trigger points to reduce jitter, a sinx/x interpolator will have most of its input data zeroed when doing 8x extrapolation, so the read memory bandwidth falls. I've still yet to decide whether the sinx/x is best done on the FPGA side or on the RasPi - if it's done on the FPGA then you're piping extra samples over the CSI bus which is bandwidth constrained, although not particularly much at the faster timebases, so, it may not be an issue. The FPGA has a really nice DSP fabric we might use for this purpose.
I don't think it will be computationally practical to do filtering or phase correction in the digital side on the actual samples. While there are DSP blocks in the Zynq they are limited to an Fmax of around 300MHz which would require a considerably complex multiplexing system to run a filter at the full 1GSa/s. And that would only give you ~60 taps which isn't hugely useful except for a very gentle rolloff.
I think you could do more if filters are run on post-processed, triggered data. Total numeric 'capacity' is approx 300MHz * 210 DSPs = 63 GMAC/s. But at that point it comes down to how fast you can get data through your DSP blocks and they are spread across the fabric, which requires very careful design when crossing columns as that's where the fabric routing resource is more constrained. I'd also be curious what the power consumption of the Zynq looks like when 63 GMAC/s of number crunching is being done - but it can't be low.
I hate fans with a passion. This scope will be completely fanless. It will heatsink everything into the extruded aluminum case.
Regarding digital (serial) triggers, my thought was around the area of a small configurable FSM that can use the digital comparator outputs from any channel. The FSM would have a number of programmable states and generate a trigger pulse when it reaches the correct end state. This itself is a big project, it would need to be designed, simulated and tested; hence why I have stuck with a fairly simple edge trigger (and the pulse width, slope, runt and timeout triggers are fairly trivial and the core technically supports them, although they are unimplemented in software for now.) The FSM for complex triggers could have a fairly large 'program' and the program could be computed dynamically (e.g. for I2C address trigger, it would start with a match for a start condition, then look for the relevant rising edges on each clock and compare SDA at that cycle - the Python application would be able to customise the sequence of states that need to pass through to generate triggers in a -very- basic assembly language.)
Serial decode itself would likely use Sigrok, though its pure-Python implementation may cause performance issues in which case a compiled RPython variant may be usable instead. There is some advantage to doing this on the Zynq in spare cycles if using e.g. a 7020 with the FPGA accelerating the level comparison stage so the ARM just needs to shift bits out a register to decide what to do with each data bit.