Evaluating an ADR1399 (or an LM399 and better refs) with a 6.5digit voltmeter is difficult. You have to know your meter well at the 10V range - you have to know its TC to compensate for its internal temperature and/or ambient, you have to log an 8-10 hours long measurement (at 100NPLC) and evaluate several thousands of samples at minimum (analyze the graphs, compensate for internal/ambient temperature, remove the outliers, provide filtering/averaging/smoothing). And you get a single number (I do it this way). That all with an assumption (not valid one, btw.) the long therm drift of your meter is zero.
The LM399 inside those 6.5meters does a random walk typically within 5-10uV at 10V range (therefore the minimal resolution with those meters is 10uV at 10V) and there are sporadic random jumps 4-5uV large (the popcorn) on top of that walk as well. And your DUT does the same too, basically..