If, for example, your newly calibrated Siglent measures something different than the two that are no longer calibrated, it may well be that it is nevertheless closer to the "truth".
The way calibration intervals work, is that manufacturers come up with specifications that will, at some confidence level, confirm that the meter stays well within them for the calibration interval. This is generally a number that's padded to some degree to make sure the long tails of the Bell curve are still within spec. No company that cares about calibration would use a meter if only 68% of the meters are still in spec at the end of the calibration interval. Then the calibration interval needs to be shortened. Actually, any meter being out of spec at the end of its calibration interval is a problem, because it means you can't rely on any of the measurements you took since the previous calibration, and would have to recall all products that rely on this measurement. I'm not saying it doesn't happen, but it's generally very rare, especially for the low voltage DCV ranges.
So while it would certainly not convince an auditor, I'd have a fair amount of confidence that the Keysight and Keithley meters will still be in their one year spec after a year. For example from a data set I downloaded from the PMEL forum a long time ago, out of 250 calibrations of a HPAK 34401A at 1 V DC with roughly 1 year interval, zero where out of tolerance. Out of 118 Keithley 2000 calibrations at 1 V DC, one was out of tolerance. Both companies have decades of experience burning in and selecting the LM399 voltage references they use.
From what I've read, for example
here, Siglent uses LM399 references without such selection and burn in. So it would be very plausible that the Siglent meter drifted rather than the other three devices all drifting at the same rate and in the same direction. If you buy the meter and leave it on for a couple of months, and then have it calibrated, it may well drift less.
You should properly run the numbers with the uncertainties of all meters to figure out if this is actually out of spec. Because 50 ppm difference might well be borderline in spec. Note that the other devices will likely also be somewhat off in either direction. Consider it four overlapping bell curves with the reading as their mean and 3.33 * their uncertainty as their standard deviation (per GUM).