All in all it seems the conclusion is that long term (>24 hour) stability is OK but short term the receivers seems to drift apart in equal amounts.
In my opinion, this is the outcome to be expected. I've looked at most GPSDO circuits that have appeared online, almost universally they use a phase locked loop to bring the local oscillator in line with the received GPS signal. Each PLL has a different setup so any variation in the GPS signal will cause different effects from different receivers. Even two supposedly identical receivers, the oscillator will behave differently. I recently built two GPSDOs using the same model of oscillator, one has a control voltage around 1.9V and a sensitivity to control of 0.11 V/Hz. The second is 2.8V and 0.13 V/Hz. So the same PLL circuit will behave slightly differently.
The general solution seems to be make the GPS signal as accurate as feasible. This means it has less instability and allows the PLL to have longer time constants. In the end you are dealing with a whole series of variables - intrinsic variability of GPS data due to swapping between satellites, varying delays because the satellites are moving, ionosphere effects, etc. Then the GPS receiver may or may not faithfully follow the received signal (I use NEO-6M, cheapest and possibly dirtiest of all). Then there is varying PLL strategies (which are also affected by the stability of the power supply, the PLL has to compensate for unwanted variations due to voltage changes on the control voltage supply). And then how good is the oscillator?
Using a PLL should guarantee long term agreement - the name after all is phase
locked. Loss of agreement means loss of lock, means something is wrong.
My approach has been to use microprocessor power to avoid using a PLL. Instead the oscillator is allowed to "free run" {no change to the control voltage) for an extended period then apply one correction to correct any deviation. If you were to compare it to a PLL you'd see them diverge for some minutes then converge over more minutes. The advantage of this approach is the oscillator is mostly running at the limits of its own stability. May not be accurate but will be stable. Also it is possible to quantify the frequency error. If the oscillator diverges one cycle in 1000 seconds then it is within 1mHz of the GPS. I have seen better results than this under ideal conditions.
Another advantage is the GPS data can be 'dirty', it all averages out over time. Improvement to the system comes almost exclusively from improving the conditions for the oscillator - stable control voltage, stable supply voltage, stable temperature. This is what I am currently exploring. The holy grail is to get less than one cycle drift in a day. I think the satellites come into roughly the same position each day so the data from one whole day should be similar to the next. The it will become possible to work out an ageing factor and apply that as a second continuous correction. If this is achievable, it is a better standard than rubidium. May not be possible but we can dream.