OK, to be serious about this for a minute (Let's see if I can last that long):
And this is exactly why I purchased a 2nd reference. I wasn't sure if I could trust it. Now is it possible that BOTH are inaccurate? Of course but based upon what I've observed, and to the limits of my equipment, I don't think so. There appears to be this perception that I'm a loose cannon and I make decisions in a vacuum. Just the opposite. I think I'm very meticulous and draw valid conclusions based upon what I test and observe.
Two references probably isn't enough, particularly when they are of the same pattern and therefore likely to have similar systematic errors. I think for good results one needs to go one of two ways.
(1) One goes for one high quality reference (think one of the Fluke 10V references or similar), have it calibrated regularly and generally follow good metrology practice. If you do this you're in the "Man with one clock knows what time it is" territory. If the 'clock' is of good enough quality then you've a very high probability that your clock agrees to within some tolerance with the clocks of other people. The error in your estimate of your reference quantity (volt, second, ohm, whatever) is inversely proportional to the amount of money you're willing to spend on it - a Fluke 10V reference and regular calibrations gets you 'the volt' to perhaps within 1-2 ppm, a second-hand 34661A like mine gets you perhaps within 35-50 ppm if you're lucky.
(2) Have many sources of reference. If you do this you're in the "Man with
n clocks doesn't know what time it is" territory and you need to get from there to knowing the true value of the reference quantity (to within some tolerance) with what you've got. The trick here is to rely on statistical inference from the multiple 'reference' values that you have to hand. To do that you need the 'clocks' to be statistically independent of each other.
If your references are all of the same type you have to find a way of ferreting out the common systematic errors from your readings - possible but quite hard to do without a long period of calibration against outside 'official' references. For instance, some voltage reference architectures tend to have positive tempcos, some tend to have negative tempcos - if you only have one type there's likely no statistical independence in their tempcos and therefore you will have to calibrate their tempcos against an external reference.
That's why I'd advocate for having references that differ in type - one LM399 based, one 1N891 based etc. etc. That way the drifts and noise aren't likely to be based on some common feature of structure or design. If you have enough statistically independent sources you just average them to get an estimate of the 'true' value that they are all trying to reproduce. The error in doing so scales inversely to the square root of the number of sources that you combine statistically - more sources lower error. If you've enough spread of sources and a mixture of positive and negative tempcos then you can always get a 'good' approximation of the 'true' value that you're aiming for.
In all this remember that in precision metrology there never is a 'true' value, there is only ever a measurement with a certain confidence interval. The best calibration you will ever get against 'the volt' is to a confidence of 0.03 ppm (k=2), and that's inside a top notch national metrology laboratory (NPL, PTB, NIST etc.). On the timenut front, the true value of 'time to UTC' is worked out after the fact by statistically pooling the estimates made by the world's national metrology bodies into a central notion of the (estimated) value of 'time to UTC'.
If this whole screed contains a lot of weasel words - 'likely', 'probably', 'estimate' - it's because that
is the territory when talking about precision metrology.