What I described above, the setup factors will also be stores in DDR3 and multiples of these programs can call each other in a greater loop. So, even such programs can calculate or LUT modify the factors of another. You have enough to create a ray-caster engine equivalent to Doom. Though, I would stick to 640x480 for this, though you may get away with 32bit color. For this core, I only choose something super simple where you only need 4x 32bit adder counters for (A), the same thing for (B), the same thing for (OUT) while keeping track of an 'OP-Code'. You will use 1 new read port to fetch the instructions, the next read port to fetch the (A) directed by it's 4x 32bit counters, another read port to fetch the (B), then one read/write port to write the (OUT) with it's control counters.
The opcode controls what function will be written.
The int and fractional counters allow us to for example split or selectively render just the red or green or blue, split them apart, do a contrast / brightness / rgb/yuv conversions. For audio split / combine stereo audio, apply volume to individual channels. For geometry, well we have already been discussing that.
However, the ALU with it's opcode itself will need some planning. It is not the math functions themselves as Altera has all the floating point and integer functions, including trig, available for free in their IP store. It's the decision of how we wire the int/FP in and out conversions as many thing will need int and float conversion and we want to do this on they fly instead of doing another FPU pass when it is not necessary.
Separate of the ALU, one last graphics function we never attempted was a raster graphics fill routing. All our line / ellipse / box / triangle can perform fills on themselves, but what we are missing is a means of drawing a geometric shape, or have a ready picture made on the screen and decide to choose a pixel and begin a flood fill on all connected pixels of that same color to a new color.