Just got this working.
~130kwaves/sec using the original, buggy GL renderer - that managed to persistently lock up the Pi's GPU driver. Now, I know we're not necessarily competing on waveform rate here but this has barely been optimised and already outpaces ArmWave by 6x, which should come as no great surprise. It's doing dot join too (can't necessarily see that from the rendered output as it's a sine wave, but it is joining each point with a vector, which used to kill performance of the older renderer.)
Caveats: Internally generated waveform, not connected to Zynq yet. Seems to only make use of about half of the GPU's compute units, and is probably inefficiently designed. Does seem to struggle with longer waveform lengths (>4k points), need to investigate why. No zerocopy for now, so every waveform buffer is copied into GPU space on each frame, but this should be fairly trivial to figure out.
Currently I render to an offscreen 1536x256 buffer and then scale that up to meet the window size using linear interpolation. This seems to hide some quantisation artefacts but might be undesirable. However, comparing e.g. Rigol DS1000Z and Keysight 2000X, they both seem to do some kind of linear interpolation in the vertical axis, so I think this is quite normal. I can use a nearest-neighbour scaler, but it looks awfully ugly.
Next week's challenge, I think, will be to play with some DSP. But for now, it's time to sleep.