8 or 16 bit SDRAM and DDR1/2/3 Ram will be too slow for the MAGGIE system other than having 1-2 dumber linear MAGGIE background graphics layers. All tile/font/sprite images will need to be stored in the FPGA + you will need an extra 1-2kb FPGA cache to stream the DRAM video.
To replace the core and storing everything in DRAM, you would need 1GHz 32bit DDR3 controller which is beyond the CycloneIV. You would also have to use all of the core ram to create a smart read ahead cache since DDR ram reads bytes in bursts where you need only single bytes from. The development cycle to accomplish this is beyond our time and scope.
You've got to be kidding! 16 bit DDR3 running at 400 MHz gives you enough bandwidth to stream and double-buffer (meaning you read and write in parallel) a 1080p/32bpp/60Hz video stream in real time with some bandwidth to spare! And this video card has a resolution of what? 640x480/8bpp/60Hz? You should be able to push at least 50 these streams in parallel with no problems whatsoever!
But if you for some bizzare reason need more, what stops you from implementing wider bus, up to 64bit SODIMM?
Something doesn't sound right here.
Your figures are correct for 'bandwidth to stream' IE sequential burst-efficient access, 2 pictures simultaneously...
You haven't followed the project, the MAGGIE system has 16 translucent video window layers (16 pictures), 16 bits each, at 25 million pixels a second.
HOWEVER, each MAGGIE layer can tell the next adjacent MAGGIE layer to address individual 16 bit pixels from anywhere else in ram on a pixel/pixel basis, no sequential multiple pixel reads, hence the realtime pixel zooming and skipping feature.
All this means is that exactly like a said, running a single 125MHz or 250MHz DDR2/3 ram chip at 16 bits. Since you cannot address single bytes/words and have those redirect to a point elsewhere in ram without waiting for a ras/cas/4-8 clk burst/terminate cycle, all being said and done, an over 35ns turnaround for each random byte access, you cannot achieve the 25MHz pixel speed * 15 pixels which sit on-top of each pixel, each which may randomly point to anywhere else in ram for an associated font/tile which can be also 16bits/pixel or pointer to another font/tile.
However, just having 2 sequential 16bit/pixel raster layers with a small pixel cache is doable with a simple ram controller, + having another graphics geometry engine random read/write access port plus another port with 0 (ZERO) wait state at Z80 speed weaved in will be feasible with some simple HDL code.
Squeezing out the Drams approximate full 750MB/sec is doable with 16 random access ports, only half of them somewhat sequential can be done, all read ahead in time to generate each 16 layer finished pixel, however I want nockieboy to be able to finish the project.
Now, if 512kb is a limit on the Z80, maybe nockieboy should consider ZBT-SRAM. It has access exactly like a static ram chip, but when swapping from reading to writing and back, the ram chip will have 0 wait states, or Zero Bus Turnaround. A 250Mhz ZBT sram, 256K x 18bit (512 kilobytes). The ram controller HDL code is non-existent other than a latch and IO pins set to the tightest timing tolerances. Now the memory density is less, but it will perform almost identical to a core signle-port ram operating at 250MHz and 0 coding experience is needed.
For some reason, the 256kx18bit 200MHz is a cheap sweet spot for ZBT ram at 2$ each (maybe obsolete).
ZBT 250Mhz = $8.40.
Sync SRAM 250MHz = $5.53. I need to read more on this as the interface is a little more complex as the first read/write can take 2 cycles.
I would say if the Z80 can access more than 2 megabytes, go for a single 16bit ddr ram chip. If it can only access 512kb, a more expensive but more HDL friendly solution would be the ZBT ram.