There is some short-term oscillation fuzziness in all three devices - the PPS, the RTC, and even the Nano. I do the comparison every five minutes, and average the last 16 readings to get a number. If I print out those 16 values, I see a fair amount of jitter, so I think averaging makes sense. And if left to run for a couple hours, it tends to settle down and switch between two values. It's never going to be exactly perfect, so one value will be a little fast, and the next a little slow, and you have to pick the one that looks best.
The point of this was to not depend on zero being the right Aging value, which is how it comes from the factory. Zero will work pretty well on an SN, but Ms can be way off (my two Ms optimize at -23 and -45). So why not get as close as you can. The GPS just lets you find the best number pretty quickly.
But I think it's true that termperature and supply voltage can affect timekeeping significantly. The DS3231 has a built-in temperature adjustment, but voltage is another thing. In running the Aging code, the closer you can get to the temperature and supply voltage conditions it will experience in the field, the better the answer will be.