Can you clarify what 'gain drift' refers too? The internal amps of the DMM?
Amplifiers, dividers, internal voltage reference, ADC and anything in between. All will drift (very slightly). With shorted inputs, you're just measuring offset (which is usually compensated by auto-zero) and noise. Change in gain is irrelevant for this measurement, if your voltage reference would change from 10V to 9.9V, you wouldn't notice any difference with shorted inputs. Not saying I expect much for a well-designed DMM like the HP 3456A, but I can't imagine it'd be zero, just look at the 24h and tempco specs. Of course these are worst case. The gain error is specified by the % of value uncertainty; the offset error/noise is represented by the % of full scale error.
After 12 hours overnight with 26000 readings the SD was 180.979nV.
This is very close to the 170 nV saturation reported for n ~ 12000. The number of samples is sufficiently large that both should represent be a very good estimate of the population standard deviation, assuming it's purely determined by noise. Temperature shouldn't really matter for this measurement, although you might expect slightly more noise at higher temperatures. The difference between say 293K and 300K is unlikely to make much difference, I would expect. I believe the HP 3456A has better accuracy specs than the Fluke 8846A (it was a reference class meter in its day, similar to the 3458A now), not sure about noise.
I have noticed that the 8846A works internally (on the lowest range only) at 1000X higher resolution than the displayed LSD (I do know the difference between resolution and accuracy). You can see the full readings in stat mode or trend plot. Max DCV resolution is 0.1uV but internally is 0.1nV, Max Ohms resolution is 10uOhm but internally is 10nOhm etc. That 1000X resolution is there from 0.02NPLC to 100NPLC and with digital filter on or off. It still only has 6.5 digits but on the lowest range but it seems to internally change amplification a decade at a time based on the reading to maximize resolution with a limit of 01.nV. Do all 6.5 digit meters do this?
Some do, especially Keithley. A random result via GPIB from a 5.5 digit meter with shorted inputs on its 300mV range (expect resolution 1 uV)
+000.0042E-3 (resolution .1 uV)
From a 6.5 digit meter on 100mV range:
4.75412547E-08 (.1 fV resolution!)
Especially the last one is completely ludicrous, I consider this just artifacts of the calculation and completely insignificant compared to the noise floor. The only advantage of the extra resolution is that it reduces quantization error, although it should be random for this (haven't studied it in that much detail). This is a real issue when averaging down to the last digit. Datron actually introduced noise in the measurement and then averaged it to get a better resolution. That's how they achieved 7.5 digits, I wouldn't be surprised if it was still used in the Fluke 8508, since they bought Datron to get their high accuarcy DMM technology). Most other brands return the same resolution via GPIB as on the display. Except for the 3457A which is specified as a 6.5 digit meter, but 7.5 digits of resolution via GPIB. This does not translate in improved accuracy, however, since the 3456A is superior in that regard.