In the 53131A/53132A the official HP option board references all override the crappy "no-option" internal reference. When the option board is used as the reference clock it is calibrated by supplying a 10MHz input on Channel 1 from a frequency standard and running the calibration procedure from the menus. This procedure adjusts the option board's VCOCXO by setting a DAC that controls the EFC voltage input to the VCOCXO. One it hits agreement with the input standard it stores the DAC value in non-volatile memory and you're done.
From this point onward the option board VCOCXO just free runs and supplies the timebase for the rest of the counter.
The way these GPSDO add-ons have to work is by replicating the 'supplying a clock' bit of the option board. They don't need calibration because it's the job of a GPSDO to constantly recalibrate itself using GPS. Running the counter's calibration, even if the third party option card replicated or simulated the DAC from the option card, would achieve nothing.
However, you might need to initially fool the 53131A/53132A into running the calibration just so that it thinks it's calibrated. If you insert one of the timebase option cards in a 53131A/53132A that's never had a timebase option card it'll say "UNCALIBRATED" on the front and do nothing, so a one-time fake calibration may be necessary.
That does look neat. I would of hoped for a set of flat zeros using GPSDO. As you say it doesn't tell you much about adjustment.
The specifications section of the 53131A/53132A manuals have a fearsomely detailed section that allows you to work out the likely error limits that you can expect them to measure to. It's too complex for me to want to get into a worked example here. But, to give you a flavour of the errors other than pure timebase accuracy that mean that you will only see a 'perfect' result by chance: The spec for the 53131A has a best resolution of 650ps. An error of +650ps on a 10MHz clock would come out as 10,000,000.01 Hz. Add a systematic uncertainty specified as 350ps best case, 1.25ns worst case, due to trigger jitter and other factors, and it's clear that you're not always going to get 10,000,000.00 Hz as the answer.
I've got a 53132A with the super-duper option 012 "ultra-stability" option board in it. The OCXO on the 012 board is good enough, according to the specification, that it doesn't drift more than 3 parts per billion in a month, and I've tested mine against a GPSDO and it exceeds specs. If the 53132A is fed from its own reference out, it wanders one or two counts in the last digit between measurements. Sometimes it's all 9s, sometimes 10,000,000.000 0, sometimes 10,000,000.000 1. That last digit represents 10 parts per trillion - which is about 1/1,000,000th of half a bee's dick.
The OCXOs used in those GPSDO references probably have 0.05 ppb 1 second stability if they've one of the decent stratum 3E OCXOs that are kicking about in fair quantities from the Chinese used ex-telecom market.