A couple of new findings:
I graphed START, STOP and TRIGG signals with my scope. The START and STOP traces showed that indeed the signals were locked against each other with no apparent jitter (at least not that I could see it on the scope). I conclude that this is therefore a viable setup for a performance test.
Then I used the TRIGG signal as an indicator of when the start command from the MCU is executed by the TDC. I tried creating a trigger criterion for the assumption that TRIGG to START < 5ns causes an invalid measurement. This wasn't entirely conclusive. Indeed I found that when TRIGG to START is < 5ns I get a bogus measurement, but the frequency of bogus measurements is much higher than the events observed by the scope. This could mean that the window is much larger than those 5ns from the datasheet. Playing around with the trigger window hints that the critical window might be more in the tens or even hundreds of nanoseconds range.
I decided then to discard all the measurements with implausible TIME1. Using only the samples of the 1µs cluster as total population, the standard deviation calculates to around 38.5ps.
You can confirm the calculation based on the previously posted data set in Excel. Filter the TOF column to show only values < 2µs, then use the SUBTOTAL() function to calculate stdev.
So far, so good, however the next headscratcher is right around the corner. When I set the 1PPS input to an actual 1 second pulse signal, I get 4 clusters, spaced exactly 500ns. Confirmed with the scope, the signals are not properly locked. The phase shifts exactly 90 degrees every second. Might be some peculiarity of the instrument itself. I'll investigate that. I wanted to update the software of the Siglent anyway.