Well, how much do you need?
A typical DSO is only a few hundred pixels tall on the screen, so you can't *see* anything finer than about 8 bits (more like 9 bits these days, i.e. 512 pixels height, or even 10 bits for fancier or PC-based scopes). If you need to zoom in without taking additional readings, you might need more, but you run into problems with dynamic range vs. offset, and bandwidth capability.
I can't see the detail, but won't the onscreen measurements benefit from higher bit depth?
Not really: most of the ones I've seen, display about three digits more than they represent, anyway! It would be very nice if they put a sigma on those as well, sadly most don't (with I think Lecroy being one of the few who do, unless it's simply standard only on the embedded-PC-based pricey ones).
A simple measure of accuracy is how many digits are dancing around in a given steady state display.
Errors arise from the signal itself (voltage and phase noise), trigger noise (and delay if shifted from the trigger point), quantization noise (including time quantization*) and thresholding.
*Which depends on if it's a "deep memory" scope, or if it's reading by various methods: decimated to screen resolution, averaged to screen resolution (Hi-Res), histogram (DPO or heat map style); and how the measuring algorithm operates on that data (does it operate on what's on screen, or on that segment of the buffer, or the total buffer?).
From what little I know of the usual measuring algorithms, many of them have threshold limitations: for example, to measure "top" of a square wave, how do you determine which points are in the "top"? What if you try measuring this on a wave that's decidedly not square: like a sine, or a switching waveform with ringing, or white noise? There should be some measure of confidence corresponding to that measurement, which is relayed to the user. But this is rarely provided.
(More broadly, signal and image processing needs to take these things into account. It's amusing when you have a picture of a fire hydrant identified as "dog" by some AI; it's less amusing when your digital communications channel is fading out and it makes bad assumptions about what it thinks the signal is, rather than reducing bandwidth to a more confident level.)
In any case, a hard threshold is the worst possible signal processing method, and has noise equal to any given sample. Ideally, you'd like the measurement to have 1/sqrt(N) noise, for N samples in the measurement span (whether it's piled on top as a histogram or stretched out in a very long buffer).
And yes, when you have megs of buffer, random errors can be very well refined indeed (1e6 points --> 1e3 reduction in noise!), leaving you with only the systematic errors (like the nonuniformity of bit sizes in the ADC). If the measurement algorithm harnesses this resource, then it can be as accurate as the systematic limitations; my concern is that most will not go to this length, and worst of all, you have no way to know because they rarely tell you the uncertainty.
Tim