Is the sc_FIFO up to the task then? Would using your zero-latency FIFO be overkill?
It's not slow, it still runs at 125MHz. This just means when you send the first command, there is a 1/41 millionth of a second before the geometry_xy_plotter will see it and begin to compute. However, if the geometry plotter is busy for a long time, you can keep on adding 510 additional geometry commands which will get immediately executed as soon as the 'load_cmd' signal goes high. The fifo size was chosen based on it using only 1- Cyclone M9K block of ram to store these commands.
My zero-latency FIFO may remove that 1/41 millionth pipe delay, but, it stores it's 4 16 bit words of data in logic cell registers, not M9K memory blocks. This doubles if you set it to 8 word mode.
Because of your GPU design, you cannot increase your current bulk graphics FPGA memory because of it's width & depth unless you use a larger FPGA, but you still have a few free odd M9K blocks left. These are perfect for the geometry input fifo.
You also don't even need a fifo, but the Z80 might need to wait a few times when dumbly copying a precompiled list of commands filling and copying the largest possible objects to a high color depth screen.
Now, with the next 3 weeks, do you think you can squeeze in past the triangle fill to ellipse fill and do a copy & paste function to finish this geometry unit V1.0 meaning you may even be able to paint (blit) software fonts on a normal graphics screen and even draw/copy/render multiple sprites all over the screen being able to replicate car racing games like 'Out Run' at full arcade speed and quality?
https://youtu.be/ELUl-cAtUIE?t=1142Ok, you'll need either the Lattice FPGA with a little trickery to retain the full 320x240 res, and/or add the 512k ram chip to get every single asset for this one running on a 256 color full arcade quality 320x240 screen. With some added logic to the rectangle copy command, you could get a Z80 to play Doom at a reasonable framerate as you would have a scaling feature inside the copy command though the Z80 side would still need 8mb of memory.
I hope you have a good BGA Cyclone choice + add at least 1 or 2 of those ZBT rams chips for 1mb. You will be able to do full audio as well.
Note that the sampling audio system will eat up the RS232 debugger port to the GPU ram controller.
Yes, if you want DDR ram with full efficient access, especially for the geometry unit, there will be a ton leg work to do on your own. The ZBT ram would be a drop-in addition which still requires a little steering logic inside the 'vid_osd_generator.sv' to give access to it's own 1 or 2 MAGGIE layers as it is will be 2x slower at 125MHz, or if you are lucky you can run it at 250MHz, but it can still get the read data back in 3 clocks instead of 2. (Easy interleave trick to sync the core and external ram, add 1 dummy clock cycle delay on the read of your FPGA core ram (changing all the read delay parameters) and it will match the external ram, unless you want to learn about defining external IO constraints like tsu/tpd/th, then you can get the read down to 2 clocks with any certainty, but forget getting the external ram to run at 250MHz.) Also remember, Maggie currently can only address 1 megabyte as much of the rest of your design also has only 20 bit addressing.