I don't have any references or links unfortunately but I have read a few articles discussing old video codecs, maybe you can find them? Contemporary work is relevant, for obvious reasons.
Jon Burton's video comes to mind,
but that's a far more limited system, and obviously even doing full-screen redraws on such a system is challenging enough, let alone video: significant compromises to quality were required. A few PC games (in the 90s, maybe early 2000s just before MPEG acceleration became widespread) attempted video, some using the popular QuickTime and relatives for example (often limited by CD bandwidth), others using more obscure/bespoke codecs. That seems relevant, at least; I don't know your exact specs but I'm guessing it's comparable to a Pentium of that era. Maybe a lower end one since POWER is RISCy (fast but weak instructions vs. slow but powerful).
Without a doubt you MUST compromise quality. Raw video is 70.7 MBps. Matter of fact, do you even have a fat enough pipe to the graphics adapter? At 32b x 133MHz (unless that has a wider FSB?), it's plausible, but it very much depends what bus, graphics, RAM (like, is it dual-ported?), and whatever else you have to do in the background...
Like, a contemporary PC (2MB SVGA, Pentium 66 or thereabouts) sure as hell ain't gonna do full resolution redraws at anywhere near real time frame rates. Maybe your system is more tightly integrated (more graphics bandwidth).
Anyways, lossless will only ever get you 2-4x compression on media data, or somewhere around there. You're asking for 11.8x, there's simply no bridge between those points. But a modest degradation of quality will do. HD video would do fine...but MPEG on CPU is piss.
Some basics ought to help. Toss away bits that aren't visually used: consider mu-law coding; YUV transform and color downsampling; difference frames if you can (requires back buffer!) but in the worst case you still need some way to update the whole screen; use spacial awareness to your advantage, but be careful about the higher complexity of 2D data (IIRC the discussion of QOI included critique of PNG's row-aware algorithm?). Blocking probably isn't going to gain you very much, except on very flat patches, which won't show up very often (but the encoder could set a variable threshold for what counts as "flat block", maintaining fixed bitrate while prioritizing more significant changes). Well, maybe; blocking certainly helps deal with priority by area. Maybe it can be applied fractally, which would in turn depend on expected frequency content of the image (you wouldn't make a fully general binary space partition, but merging together a few orders of magnitude could still afford higher resolution in priority areas, i.e. variable block size on a powers-of-2 grid). None of these methods should impact bulk CPU performance too much (i.e. avoiding high complexity per pixel, cache misses). And perhaps blocks are limited by cache size as well, give or take how much has to be reserved for instructions, and source and destination buffers.
May also be some transforms you can perform to precondition the image for better performance. PNG does forward difference (among other things). Maybe some periodicity tuning to improve LZ-esque methods?
Also, assuming you have ~unlimited time to encode (or at least a much more powerful system e.g. modern PC or FPGA, and the dev time to implement it), you could study the decoder in excruciating detail to figure out subtle optimizations; and given even more time, refine the decoder to remove less frequently used codings and tweak things back and forth further. I mean, you'll need to do that initially anyway, but, just to say, the "long tail" of deep optimization...
Tim