The analysis is largely correct.
The LEDs are lit sequentially, so each camera can take a snapshot. Its can't be done simultaneously as the cameras would see each others LEDs, resulting in saturation and false peaks from surface reflections.
Using three cameras is the minimum to resolve two touches unambiguously. One of the designs I worked on had 6 cameras and could handle more than five simultaneous touches. Most of our screens used four cameras.
The cameras will be taking two images, with the LED on and off, in order to remove ambient lighting. Flashing the LED also lets it be overdriven beyond its DC current limit, to get more light out during image capture.
The strange video waveform is the result of two factors: distance attenuation, and angular attenuation. Retroreflective tape loses reflectivity as the light angle moves away from the normal, as the effective aperture of the microprisms reduces.
I'm surprised this display style was not more popular, it seems incredibly cheap to produce.
Sure on consumer gear now touch displays are getting cheap, but industrial displays are not. Although I think this display would be confused if it were raining, causing issues, so it likely wouldn't work outdoors.
For anything bigger than a small tablet it is far cheaper than other technologies - the price scales with perimeter length rather than area. We managed to get the camera price down to ~2USD - custom camera chip and single element aspheric moulded lens. We would have made about 3 million screens by the time the company was shut down.
Touch isn't that useful on a desktop monitor though. Its ergonomically awkward to use. Its much better used on whiteboards, and information kiosks.
Although dust and dirt are obviously a problem for optical touch, the system we had was sufficiently adaptive to handle a significant amount of contamination.
There's a few hidden things that might be of interest, though I'm not entirely certain thats one of our touchscreens. The camera is a linescan cmos camera, but internally each pixel had a number of subpixels that could be enabled/disabled. A bit like an area camera 1000 pixels wide and 10 pixels high. This allowed electronic alignment of the camera to the screen bezel, turning off those subpixels that were looking outside the border.
There was also a fair bit of math going on to correct the lens distortion and get accurate touch locations.