Phototransistors are usually relatively slow, especially if the operate all the way to saturation to give a clean logic signal. One may have to look at the signals first with scope to see what signal is there to start with. This may very well be not the full swing and may show delays and slow trasistions. The way a phototransistor is operated (e.g. load resistor, signal level) effects the speed quite a bit . For high speed one may want extra amplification before getting a logic level.
The 74HC595 shift registers and similar are good for adding extra outputs, expecially those than don't change very fast / often. There are other similar chips (e.g. 74HC165) for inputs via shift registers, but this is still relatively slow. With a 14 MHz SPI clock it should still be a bit faster than 9 µs for 24 bits, more like 24/14 µs + maybe 1 µs overhead for the start and gaps between bytes. So more like in the 3 µs range.
For fast scanning of many channels it would be better to use many of the IO-pins (e.g. 8 or 16) and than switch between only a few signals per IO pin.
So the 74xx257 (many logic sereis to chase from like HC, LV,LVC,AC,AHC) is a good choice. If one does not need the output enable there is also the 74xx157.