Let me try to summary some GPSDO system design parameters we have discussed.
Counting the 10MHz pulses: 1 seconds gives 10e+7 count, 1000 seconds gives 10e+10 count. To reach 10e-10 accuracy you must at least count 1000 seconds to know if you are within the 10e-10 boundary. To correct to stay within 10e-10 you must count longer.
Measuring the phase of the PPS (once locked) from 2.5MHz can be done with at least 0.5ns accuracy, using the 10MHz for the phase may give 0.1ns accuracy. Without taking the fractional time of the GPS PPS into account, measuring one PPS with 0.1ns accuracy is sufficient to reach 10e-10
Including the 21ns fractional clock jitter of the PPS from the NEO-x modules gives 210 times less accurate time so you need to measure 210 times longer so to reach 10e-10 you need to measure the phase for 210 seconds. This 21ns clock jitter has exactly the same impact on counting the pulses but for 10e-10 accuracy using a 10MHz clock you already need to measure 1000 seconds so the 21ns PPS jitter has no impact when counting pulses.
This indicates that a FLL should count for at least 100 minutes to be able to adjust and remain within 10e-10 so the drift of the 10MHZ oscillator within 100 minutes should be well below 10e-10
Adding the phase measurement could reduce the measurement time needed to reach 10e-10 with a factor of 5. When using a GPS PPS without the 21ns PPS jitter this reduction could be much bigger.
The GPS position has an uncertainty in the order of 10ns leading to noise in the PPS of the same magnitude. Elimination of the position noise to reach 10e-10 requires at least 100 seconds measurement which is not relevant for the phase measurement method (as its smaller then 210 seconds) and also irrelevant for the pulse counting method as the measurement time is already 1000 seconds to reach 10e-10
Is this a correct summary?
It's a bit more complex.
Regarding the jitter in the 1pps - assuming the GPS module outputs the 1pps at the next available clock cycle from when it should be produced, it will be delayed by a random amount between 0 and 21ns. If it is truly random, the mean delay is 10.5ns and the statistical uncertainty is +- 21ns/n where n is the number of observations. So for 100 observations the mean delay will be 10.5ns +- less than 0.5ns. If a PLL is used with a time constant > 100s then it is insignificant. In reality the constraint 'truly random' is not true, the jitter will show beats with 10MHz but a 48MHz clock should beat quickly enough that a period of 100 observations is pseudo random. So the jitter can be ignored if the measurement period is long.
The situation with the GPS module is not so clear cut. The uncertainty of the position (and by inference, the 1pps) depends on the specific module, the position of satellites used, the number of satellites used, the constellations they are in (GPS, Glonass, Bidou, etc), the strength of the received signals. The receiver calculates a DOP (dilution of precision) and if that is consistently high then the 1pps can have a lot of variation (in addition to the jitter). There is a specification somewhere that says something like 99% of 1pps within 30ns of true with good signal. This may be a bell curve distribution (does anyone know) so more observations means higher confidence. Again 100 seconds of data should be enough to reduce errors to the point the data is usable if the signal is good.
If the oscillator is running at 10MHz with an error of less than 1E-10, and assuming the oscillator is stable and the 1pps arrival time is known accurately (even though the 1pps itself is jumping around a bit), then 100 seconds should be plenty of time to verify it. The uncertainty in measurement should be small enough to make that call.
The real problem is if the measurement shows the oscillator is off by more than 1E-10. Then the mechanism to correct comes into play. That is a whole new can of worms, each strategy has benefits and drawbacks. Most schemes the better they are at averaging, the slower they are to converge. So to get the oscillator error within 1E-10 may take longer than verifying it is achieved.