How have you connected the same signal to EXT-REF and CH1? I am using distribution amplifier.
If I connect the signal from REF output to CH1 I get Avg error +0.005 mHz with gate time 10 s and N=60. Maybe the error decreases with more samples.
This is much less than 0.023 mHz which I got using distribution amplifier.
I just used an SMA T adapter (for the FA2 converted with BNC adapter to REF input and CH1 input). As the REF input is not 50 Ohm (but some high impedance) this means even some reflections...
As GPSDO aims for better accuracy and sacrifices noise for that (the GPS disciplining means small pushes and pulls and the controlling DAC may not have enough bits + the control wire may pick up noise) it's not a good source for this (this is why I have not used GPSDO output).
The error definitely decreases with more samples. It may actually require a huge amount of samples to go entirely to zero in some cases (or it may never happen, I don't know). I've made a new (and longer) run and what is in the attached picture is an example of that.
As the measurements results are assumed to be random (although this may not be true, could be cross checked with the distribution of measurement results, it may even worth especially in your case) in case of too few samples a short bad series can bias the average. In longer case it should not happen. The 0.005 mHz error with just N=60 is not a terrible result at all (although in my case it's probably in the range of 2-3 microHz in the worst case). Especially if you use a lesser stability source like GPSDO.
If you have very high expectations, probably you're better off with a branded device that specifies in much details what you can expect from it.
E.g. FA2 does not have the resolution for low frequency signals at all.
Update: Fixed soft photo