Imagine a sine wave that you're sampling at exactly 2x frequency.
a) You might sample the signal exactly on the peaks/troughs in which case you'll be fine.
b) OTOH you might sample it exactly on the zero-crossing points, in which case you'll see nothing at all.
You can also get every possible value in between (a) and (b), it's just dumb luck.
If you sample at 99.99999% of Nyquist you'll drift slowly between (a) and (b) and see the amplitude varying on screen ("AM effect").
2.5x Nyquist is the minimum to avoid this AM effect.
The familiar version of the sampling theorem as published by Shannon is his seminal 1948 and 1949 papers assumes a function f that "contains no frequencies higher than W cps". At first sight this might seem to include your counter-example (sine at frequency W), however Shannon is writing for engineers, not mathematicians, so he glosses over some details. The proof has an implicit assumption that f is square-integrable which in fact excludes signals containing
any pure sines. Note that there are ways to generalize this using more advanced mathematical machinery but you may or may not be content to just claim that there are no pure sines in the real world anyway.
On the practical side, it is also important to realize that the sampling theorem is inherently infinite time. It assumes that you have sampled the signal for an infinite duration. If you take a finite portion of a band-limited signal you are effectively applying a window function and thereby broadening the spectrum beyond its original bandwidth.¹ Together with any sampling jitter this will give you a hard limit that is higher than 2x but probably not by much.
More importantly, the sampling theorem also assumes you are using
all the samples for reconstruction all the time but you probably aren't because a) it would use a lot of processing power and b) it doesn't work for real-time applications (the ideal sinc-filter is non-causal and has infinite delay). Depending on the number of taps you use, frequencies close to the Nyquist limit will still alias close to DC. This is what you observe. However, this is a trade-off and the limit of 2.5x is an arbitrary choice of the implementation you are using. You can get a lot closer to 2x if you use more than a couple of samples for interpolation.
¹ It will technically no longer be band-limited at all (you cannot have a signal that is both time- and band-limited). But let's ignore that here, as long as it drops below our noise floor, we don't really care.