MP3 and similar lossy compression is, well lossy. What does it remove? It converts the sampled data (which consists of periodically sampled air pressure data) to spectral data, i.e. the amount of intensity in any given frequency band at any given time. It then uses a psychoacoustic model of masking effects in the ear to figure out which frequency bands can be represented with lower precision. The masking effects are based on the assumption that if the cochlea is subjected to stimulation in one frequency band, it gets desensitized to stimulation in other bands.
If you imagine a sawtooth or square wave for example, they can be represented as a Fourier series of harmonics, ie a sum of a number of sine waves of various frequencies, amplitudes and phase positions. This corresponds well to how the ear perceives the audio.
Now let's compare some signals:
This is a 440 Hz sawtooth wave generated in Adobe Audition. The ringing near the edges is caused by the waveform display's interpolation/bandlimiting (sinc perhaps?) Disregard that and look at the dots, which are the sample values, which line up to form a ramp. The spectral view shows a gradual decrease in intensity as frequency increases.
This could be a part of a straight line segment in the demo.
Now let's look at the same data after a round of 96 kbps MP3 compression. This an intentionally low bitrate to show the effects more clearly, but the same principle applies to all lossy audio compression. The waveform is noticeably squiggly, and the spectral view shows how the harmonics don't simply decrease as frequency increases, but vary a lot. For example, the bands near 6000 and 9000 Hz are noticeably lower in amplitude. And all content above 17 kHz is simple filtered away. The individual harmonics are also likely phase shifted, ie shifted slightly back or forth in time.
The human ear doesn't particularly care if the samples are lined up in a straight line (or some otherwise perfect representation) as long as it stimulates the right portions of the cochlea, but the smallest squiggle on the line will show up as a similar squiggle on the oscilloscope.
However, it's not simply the waveform that matters, but the relation between the channels. MP3 assumes that the ear cannnot locate high frequency sounds above at higher frequency and reduces the stereo imaging for higher frequencies in order to save space. This is also detrimental to the quality of the image shown on the oscilloscope, among other things, for sharp corners.
Another aspect is the update rate of the oscilloscope. Any analog scope beats cheapo digitals like the Rigol DS1052 by kilometers.
Yet another thing that matters is AC coupling, and which frequency it occurs at. (AC coupling is really just a HP filter at some selected frequency.) Soundcards are typically AC coupled, probably in order not to set people's headphones on fire from a residual DC current. However, a demo like this is most likely designed to work under that condition.