Sorry if this is a dumb question - I'm not a DSP-guy.
Why do these scopes need this huge number of FFT points? Basically, if you have 1024 horizontal pixels on the screen, shouldn't a 1024-point FFT (+/- some slack) be good enough? Couldn't they mix the selected window down to DC (like a direct conversion receiver), filter, maybe resample and then a 1024-FFT over DC to span-width? Software mixing/filtering/resampling/1k-FFT should be much cheaper than a direct 128k-point FFT, shouldn't it?
Yes, you can do this, and in a rather cleverer technique called 'zoom FFT' processing. What Dave didn't mention, and indeed almost nobody does, is that the FFT mode in these scopes is not just an FFT, it also includes a power calculation, so that the trace on the screen is a power spectrum. They all do this because the output from an FFT is actually complex, and scopes have difficulty displaying complex waveforms. The power in a complex waveform is the sum of squares of the real and imaginary components, so we can easily show that. However, if you don't apply the power calculation you can do additional processing before display.
Suppose we make a 4096 point FFT. The output of this comprises 2048 frequency bins plus a DC bin (for a real input). You can think of each of these bins as the output of a narrow band filter, with an amplitude and phase response that depends on the window used, and a centre frequency which is a multiple of the basic frequency resolution of the 4096 point FFT, i.e. 1 over the time duration of the 4096 points. The real and imaginary components of the FFT bin output give you the signal that gets through this filter, down converted to DC, like the I and Q outputs of a 'zero IF' receiver. If we want a finer frequency resolution, we have to acquire more points, over a longer time, so that the bins are more closely spaced, and make a longer FFT.
However, if don't want to see all the frequency range, an alternative is to make a series of short FFTs of successive 4096 sample sections of the signal, and look at the sucessive complex outputs of the bins in just the frequency range we are interested in. This is another signal - complex valued this time - which varies over time and represents how the output signal from the filter corresponding to the FFT bin changed over time. For example, 8 successive 4096 point FFTs gives, for each frequency bin, a time sequence of 8 complex numbers. Here's the clever bit: for each bin we are interested in, we make another, 8-point FFT of the complex bin output. Because the data is complex, this gives us 8 frequency mini-bins in place of the original big bin. We apply the power calculation on the output of these FFTs and display the result. You can do this to as many of the original FFT bins as you want, but because of the extra overhead in the short FFTs it's only really worth doing if you are interested in just a small part of the frequency range. But it does let you show a low-resolution, full bandwidth power spectrum (based on the successive 4096 point FFTs) which is updated rapidly, at the same time as the higher resolution, zoomed section, updated less often. Hence the name of the technique.
We were using this approach in the '80s on a 68000 micro in a dynamic signal analyser for machinery vibration, but I believe it was first used in sonar processing some years earlier. Since memory is cheap and processors are fast, it seems to have fallen by the wayside, but heck, it still works!
Max