The GPSDO can drift in the short term, but over time its drift should be zero (at least compared against the GPS timing standard source). That means differences in the short term will probably be something like heating cycles in the OCXO or adjustment steps of the GPSDO or both, but if you look at it over time, the frequencies read should converge to exactly 10MHz (if your counter is referenced to a GPSDO and your source is a GPSDO). Otherwise, even very good crystals will drift and age, so you should expect to see it fluctuate slightly. The idea of disciplinin the oscillator is to nullify the aging and longer time period drift associated with crystal oscillators.
I don't know if I entirely understand your setup, but if you measure:
your counter alone measuring the GPSDO output, you see some drift in both from normal adjustment operations and then the drift of the undisciplined counter oscillator
your counter referenced to the GPSDO measuring another source, you see some drift in both from normal adjustment operations and then over time you approach an 'absolute' averaged measurement, given GPS time as a reference
This means that if your counter is running on its own timebase and you are measuring the GPSDO's output as being low (and given that it's warmed up and locked), then your counter's oscillator is actually running too fast, so it reads the better GPSDO output as being slightly slow.
In the moment, you will measure some difference, but when referenced to a disciplined source, that randomness can be eliminated when averaged over time. It also means you never have a moment where you have an aged OCXO just running at a different frequency and messing with your measurements - you'll be seeing primarily the short term stability spec (10^-8 or 10^-9 ballpark) as the noise in your measurement, instead of the 10^-6 or 10^-7 spec including aging and thermal drift and such of an OXCO that is not disciplined.
Of course this can be better with a better oscillator, but the disciplined reference feeding the counter should give you at least a couple more digits of base accuracy than using the internal reference in whatever shape it's in, especially given the age of the counter (it will have drifted from its initial calibration, definitely).