The lower frequency noise (after the 50 points filtering) is to a large part the LM399 reference, though there could also be some 1/f noise from the resistors.
The higher frequency noise seen for the single 10 PLC conversion looks quite high for a LM399 reference:
I have looked at my test with an external LM399 reference and LM399 ref. at the ADC and there I get around 6 µV_peak to peak in between the popcorn events even with 1 PLC and some 12 µV peak to peak including the popcorn noise / longer time scales. 2 x LM399 reference at both ends would be comparabele to the 7 V scales up to 10 V. I have some reference filtering, but this would no longer be effective for 10 PLC and only 1 reference.
The reportet noise for the 34401 that I found is somewhere around 2-2.5 µV RMS ( 12-15 µV peak to peak) for 10 PLC and I don't see much of the noise coming from the input stage, though there is a little. So this would explain much of the observed noise.
I would be easy to measure that noise with a short.
I have not seen much theoretical analysis on the 34401 type ADC noise, though I have just done some calculations on that. There are a few uncertainties, and my calculations still get a slightly lower noise ( some 1.4 µV RMS) from the noise sources I included. The noise is not just one large noise source, but more like a combination of many sources. From my analysis the worst offenders are the current noise of the OP27 in the integrator, higher frequency reference noise, jitter and quantization noise. In theory one could improve on the current noise and a little on jitter, but not with many other noise sources, especially not the quantization noise.
There is one unknown in how the ADC is handling the first / last reference steps - that is the reference signal just before the auxiliary ADC is read. This part naturally will not be 100% effective and there is a chance that this factor is not perfectly treated in the firmware. This would be some extra discrete size errors, that can look rather random, though also changing with the input voltage. So some readings may be effected more than others.
Ideally there would be a procedure in the factory calibration to minimize this error part, but I am not so sure they actually do this and if so, that factor may also change over time.