It's always a party! Part of why we love our ASICs so much is that it helps us streamline a lot of the plotting/data analysis challenges. It has it's downsides, of course (e.g. no memory controller built into MegaZoom so we only have 4M), but from a performance standpoint it's been great.
The 4M of memory is located on the ASICs directly, right?
I can hardly imagine that a memory controller would eat so much ASIC space that you had to put the memory directly into the ASICs instead for that reason, so what was the reason for choosing to put the memory directly onto the ASICs? Speed? Reduced latency?
I wouldn't be surprised if the performance of external memory back when the ASICs were designed simply wasn't enough to meet the performance requirements. Modern computer designs seem to use multiple levels of cache in order to achieve decent throughput, but the nature of the demands on an oscilloscope's memory is such that I don't know that such an architecture would be suitable (indeed, I rather suspect it's not).
It's been quite some time since I've looked at how computers are architected at that level, so something may have changed. In any case, densities are so high these days that I'm surprised that a substantial amount of additional memory wasn't added to the ASICs for the 3000T series, at least. Are you guys using an older fab process or something (that would be entirely understandable. Going to a new fab process is apparently quite expensive)?
Latency is key to the problem of building a histogram (here rendering waveforms with intensity grading). From back of the envelope calculations you can see why this all adds up quickly, we can take your example below and add some numbers to it.
Isn't there a problem with this approach that when you hit Stop, you won't know what memory depth you will get? If the first trigger has only just occurred, then there may only be just enough samples to see what is already on the screen, so while you can zoom, you can't pan outside the screen window. Only if there have been enough triggers to completely fill the sample memory will you get the full depth. This could be very confusing to the user. It seems better to maintain the one trigger = one memory buffer rule for consistency.
That won't be a problem with the circular buffer approach. That's because with the circular buffer, once enough initial time has passed to fill the entire buffer with samples, the buffer is always full, which means you can always scroll backwards, at the very least, to see what has happened before, for roughly however much time it took to fill the buffer. The real question is how much additional sampling the scope should do if it is set up to stop on a trigger and it encounters the triggering condition. For that, it seems logical that it will depend on the amount of memory that has already been filled and the amount of time it would take to fill the remainder. If the amount of memory that has been filled is small, then there's little point in actually stopping until the remainder is filled, as long as doing so doesn't take too long (that cutoff time is something that could be defined by the user).
If you can search the memory for trigger events after stopping the acquisition, it's difficult to see what practical advantages this approach brings - beyond increasing the data sheet waveforms/sec number, for bragging rights, of course!
The main advantage is that you're guaranteed to have the maximum possible amount of history at your disposal, while also being able to run the triggering system at its maximum speed. It makes the speed of the triggering system independent of the memory depth, too. And it makes it possible to use higher sampling rates with longer timebases.
The approach is so blindingly obvious to me that I must be missing something crucial here, as I would have expected scope manufacturers to already be implementing it if it didn't have some sort of showstopper property that I'm missing. But it does sound like Keysight does something similar. Their solution does take care of the case where trigger events are rare, and could be integrated into the single circular buffer approach, by splitting the memory into two pieces in the event you don't see another trigger event after half the buffer fill time since the previous trigger event.
A hypothetical scope with:
8bit 10GS/s 2GHz bandwidth
1,000,000 samples memory
100us capture
screen with 1000px display for the waveform
"ideal" 10,000 wfms/s 1 trigger+display per 100us (10,000 wfms/s)
acquisition memory:
wr bandwidth 10GB/s
rd bandwidth 10GB/s
filter and pipe to display histogram:
rd bandwidth 0.01GB/s
wr bandwidth 0.01GB/s
2GHz trigger = 200,000 triggers+display per 100us (2,000,000,000 wfms/s)
acquisition memory:
wr bandwidth 10GB/s
rd bandwidth 2,000,000GB/s
filter and pipe to display histogram:
rd bandwidth 2000GB/s
wr bandwidth 2000GB/s
Current histogram memory for 1,000,000 wfms/s peak into approximately 500px window as in the 3000 X discussed in this thread:
rd and wr bandwidth of 0.5GB/s
Overhead for clearing etc, double for zoom window, per channel etc. This would use 16, Probably 32 or more parallel 1MB histogram rams at x00MHz each. The histogram memory needs to be very low latency as the match pipeline grows in resources with latency squared, so to get 3 orders of magnitude increase in performance you're hunting for some magical technology.