That jitter is not even related to the samples themselves but to the bits. Now, you have a 24 bit 192khz stereo system, all the data goes through I2S, that is about 9 megabit/s. That 0.5 ps jitter per bit suddenly becomes 5ppm of your bit. If you take the noise analogy, -106dB, almost audible range, so I would say, he is right.
It is actually not as simple. I have the impression, that engineers oversimplify audio all the time. Jitter is changing the frequancy of the signal, so you cannot make that simple calculations.
Remember, it is for frequency. Brain does fancy stuff with frequency, that is how you locate stuff only by sound. If one of your ears receive a changing frequency, while the other one a stable one, your brain translates this, that the object emitting the sound is moving.
If you want to analise an audio system, dont ever forget, that the full signal chain includes the human at the end, with two ears, and with a brain that does fancy DSP stuff. I think every audio system designer makes this mistake, that is why we cannot come close to live music with recordings.