Author Topic: FPGA VGA Controller for 8-bit computer (Read 510852 times)

BrianHG · « **Reply #1400 on:** July 31, 2020, 09:04:46 pm »

Hold on, don't quit yet. I might still be available for another 2-3 weeks...
However, once I'm unavailable, I doubt I'll be able to get back to anything serious...

BrianHG · « **Reply #1401 on:** July 31, 2020, 09:56:32 pm »

Quote from: nockieboy on July 31, 2020, 08:03:47 pm

Is the sc_FIFO up to the task then? Would using your zero-latency FIFO be overkill?

It's not slow, it still runs at 125MHz. This just means when you send the first command, there is a 1/41 millionth of a second before the geometry_xy_plotter will see it and begin to compute. However, if the geometry plotter is busy for a long time, you can keep on adding 510 additional geometry commands which will get immediately executed as soon as the 'load_cmd' signal goes high. The fifo size was chosen based on it using only 1- Cyclone M9K block of ram to store these commands.

My zero-latency FIFO may remove that 1/41 millionth pipe delay, but, it stores it's 4 16 bit words of data in logic cell registers, not M9K memory blocks. This doubles if you set it to 8 word mode.

Because of your GPU design, you cannot increase your current bulk graphics FPGA memory because of it's width & depth unless you use a larger FPGA, but you still have a few free odd M9K blocks left. These are perfect for the geometry input fifo.

You also don't even need a fifo, but the Z80 might need to wait a few times when dumbly copying a precompiled list of commands filling and copying the largest possible objects to a high color depth screen.

Now, with the next 3 weeks, do you think you can squeeze in past the triangle fill to ellipse fill and do a copy & paste function to finish this geometry unit V1.0 meaning you may even be able to paint (blit) software fonts on a normal graphics screen and even draw/copy/render multiple sprites all over the screen being able to replicate car racing games like 'Out Run' at full arcade speed and quality?

https://youtu.be/ELUl-cAtUIE?t=1142

Ok, you'll need either the Lattice FPGA with a little trickery to retain the full 320x240 res, and/or add the 512k ram chip to get every single asset for this one running on a 256 color full arcade quality 320x240 screen. With some added logic to the rectangle copy command, you could get a Z80 to play Doom at a reasonable framerate as you would have a scaling feature inside the copy command though the Z80 side would still need 8mb of memory.

I hope you have a good BGA Cyclone choice + add at least 1 or 2 of those ZBT rams chips for 1mb. You will be able to do full audio as well.
Note that the sampling audio system will eat up the RS232 debugger port to the GPU ram controller.

Yes, if you want DDR ram with full efficient access, especially for the geometry unit, there will be a ton leg work to do on your own. The ZBT ram would be a drop-in addition which still requires a little steering logic inside the 'vid_osd_generator.sv' to give access to it's own 1 or 2 MAGGIE layers as it is will be 2x slower at 125MHz, or if you are lucky you can run it at 250MHz, but it can still get the read data back in 3 clocks instead of 2. (Easy interleave trick to sync the core and external ram, add 1 dummy clock cycle delay on the read of your FPGA core ram (changing all the read delay parameters) and it will match the external ram, unless you want to learn about defining external IO constraints like tsu/tpd/th, then you can get the read down to 2 clocks with any certainty, but forget getting the external ram to run at 250MHz.) Also remember, Maggie currently can only address 1 megabyte as much of the rest of your design also has only 20 bit addressing.

nockieboy · « **Reply #1402 on:** August 01, 2020, 09:27:49 am »

Quote from: BrianHG on July 31, 2020, 09:56:32 pm

Now, with the next 3 weeks, do you think you can squeeze in past the triangle fill to ellipse fill and do a copy & paste function to finish this geometry unit V1.0 meaning you may even be able to paint (blit) software fonts on a normal graphics screen and even draw/copy/render multiple sprites all over the screen being able to replicate car racing games like 'Out Run' at full arcade speed and quality?

I don't know, is the honest answer. I know enough about the design to know what I don't know - and it feels like a lot. I don't have the large chunks of time I really need to sit down and process what's going on and work on the next steps effectively at the moment, so that's a cause for concern, but I will do my best.

Quote from: BrianHG on July 31, 2020, 09:56:32 pm

Ok, you'll need either the Lattice FPGA with a little trickery to retain the full 320x240 res, and/or add the 512k ram chip to get every single asset for this one running on a 256 color full arcade quality 320x240 screen.

I was considering the jump to an EP4CE22F17 for the BGA - would that be able to handle what we're trying to do, or do I need to look at Lattice FPGAs specifically?

Quote from: BrianHG on July 31, 2020, 09:56:32 pm

I hope you have a good BGA Cyclone choice + add at least 1 or 2 of those ZBT rams chips for 1mb. You will be able to do full audio as well.
Note that the sampling audio system will eat up the RS232 debugger port to the GPU ram controller.

Okay, so the ZBT RAM chips are a definite requirement then? I'll add them to the BOM for the BGA design and (when I get a chance) I'll work them into the design. The BGA design hasn't really progressed much at all, thanks to everything else I've had going on.

Quote from: BrianHG on July 31, 2020, 09:56:32 pm

Yes, if you want DDR ram with full efficient access, especially for the geometry unit, there will be a ton leg work to do on your own. The ZBT ram would be a drop-in addition which still requires a little steering logic inside the 'vid_osd_generator.sv' to give access to it's own 1 or 2 MAGGIE layers as it is will be 2x slower at 125MHz, or if you are lucky you can run it at 250MHz, but it can still get the read data back in 3 clocks instead of 2.

I wouldn't have a clue how to connect up DDR ram. I mean, the actual physical connection should be easy enough (with a little learning about matching trace lengths and some other oddities of PCB design), but actually integrating it into the GPU HDL would be the big issue for me.

Having access to megabytes of RAM would be massively advantageous, though. Not just for the GPU, but for any application I'd decide to put the FPGA to. I could get rid of all the 'old' hardware (memory, CF card, CPU) and have it all on the one PCB, for example.

Quote from: BrianHG on July 31, 2020, 09:56:32 pm

Also remember, Maggie currently can only address 1 megabyte as much of the rest of your design also has only 20 bit addressing.

That'd be easy enough to improve, though - at least in theory.

Okay, I'm going to use any spare time I have today to try and focus on the Z80_bridge changes.

nockieboy · « **Reply #1403 on:** August 01, 2020, 09:54:03 am »

I've added an input to the pixel_address_generator called fifo_full which connects to the output of the same name (presumably) on the zero-latency fifo.

In the comb section of the pixel_address_generator, pixel_cmd_rdy is now the ANDed product of (what was called) pixel_cmd_rdy && !fifo_full, so pixel_cmd_rdy only goes high now when cmd_rdy is HIGH AND fifo_full is LOW.

~~Shouldn't draw_busy also be ANDed with !fifo_full, so the pixel_address_generator~~ (I'm calling it PAGET from now on) ~~stops working whilst the pixel_writer's input fifo is full?~~

Just read in your previous post(s) that it should do.

nockieboy · « **Reply #1404 on:** August 01, 2020, 10:32:02 am »

Z80_bridge_v2 attached, with inputs and outputs added as described.

Quote from: BrianHG on July 29, 2020, 07:11:59 pm

I also want to see a 16 bit output port added to the Z80 but. Basically 2 adjacent 8 bit ports, with a write strobe output for each. (This is so you can select loading data into the geo-unit after the low byte is sent, or, after the high byte is sent since the geo unit needs to take in 16bit at a time) If you do not want to 2 ports to write to the geometry_xy_plotter, but a memory address bytes instead, you will need to do this on your own later. This 16 bit port will feed a 512x16 word ALT_FIFO megafunction into the geometry_xy_plotter module.

Won't I need a third IO port to tell the Z80_bridge to send the data to the geo-unit?

BrianHG · « **Reply #1405 on:** August 01, 2020, 02:57:03 pm »

Quote from: nockieboy on August 01, 2020, 09:54:03 am

I've added an input to the pixel_address_generator called fifo_full which connects to the output of the same name (presumably) on the zero-latency fifo.

In the comb section of the pixel_address_generator, pixel_cmd_rdy is now the ANDed product of (what was called) pixel_cmd_rdy && !fifo_full, so pixel_cmd_rdy only goes high now when cmd_rdy is HIGH AND fifo_full is LOW.

~~Shouldn't draw_busy also be ANDed with !fifo_full, so the pixel_address_generator~~ (I'm calling it PAGET from now on) ~~stops working whilst the pixel_writer's input fifo is full?~~

Just read in your previous post(s) that it should do.

NO. The already 'draw_busy' input goes to the 3 word zero latency fifo's 'fifo_full' flag. Like I said, the address generator was finished.
The 'draw_busy' input of the geometry_xy_plotter also is tied to the same fifo_full output flag.
This is how they know to stop sending data to the pixel_writer unit since it isn't reading anything from the 3 word FIFO.

BrianHG · « **Reply #1406 on:** August 01, 2020, 03:14:26 pm »

Quote from: nockieboy on August 01, 2020, 10:32:02 am

Z80_bridge_v2 attached, with inputs and outputs added as described.

Quote from: BrianHG on July 29, 2020, 07:11:59 pm
I also want to see a 16 bit output port added to the Z80 but. Basically 2 adjacent 8 bit ports, with a write strobe output for each. (This is so you can select loading data into the geo-unit after the low byte is sent, or, after the high byte is sent since the geo unit needs to take in 16bit at a time) If you do not want to 2 ports to write to the geometry_xy_plotter, but a memory address bytes instead, you will need to do this on your own later. This 16 bit port will feed a 512x16 word ALT_FIFO megafunction into the geometry_xy_plotter module.

Won't I need a third IO port to tell the Z80_bridge to send the data to the geo-unit?

Code: [Select]

   input logic WR_PX_CTR_STROBE, // HIGH to clear the WRITE PIXEL collision counter
   input logic [7:0] WR_PX_CTR,  // WRITE PIXEL collision counter from pixel_writer
   
   input logic WFR_PX_CTR_STROBE,// HIGH to clear the WRITE FROM READ PIXEL collision counter
   input logic [7:0] WFR_PX_CTR, // WRITE FROM READ PIXEL collision counter from pixel_writer
   
   input logic PAGET_FIFO_STROBE,// HIGH for valid data on PAGET_FIFO bus
   input logic [7:0] PAGET_FIFO, // pixel_address_generator FIFO flags
   
   input logic GEOFF_FIFO_STROBE,// HIGH for valid data on GEOFF_FIFO bus
   input logic [7:0] GEOFF_FIFO, // geometry_xy_plotter FIFO flags

The '_STROBE' are outputs, not inputs. This is how we are able to tell if the Z80 read each one of those ports so we may auto clear and update their values.

'WRITE FROM READ PIXEL collision counter' Should just be a COPY READ PIXEL collision counter. The 'WRITE PIXEL' collision counter is already doing what you describe.

'GEOFF_FIFO_STROBE' & 'PAGET_FIFO_STROBE', these are outside of your control. you do not have Z80 access to these.

What you want is:
GEO_STATUS_STROBE output and
GEO_STATUS_DATA_READ_IN 8 bit input read port where we may tie the 510 word scfifo's almost full flag to bit 0, and maybe some other status information to to any other of the 7 bits, like a 1 bit VS frame counter or spare programmable H/V strobe output on bit 1 so you may sync animation to a new frame or chosen line of video.
GEO_STATUS_DATA_WRITE_OUT this output on the same port is optional as you may use a selected bit, say bit 7 to feed all the reset lines for the entire geometry section so it may be kept in sleep until you release the reset. Or, if the geometry unit crashes, IE 510 word scfifo's almost full flag stays full for a consecutive 65535 reads, you may assume a geo crash and reset the entire section and send a fault error to your OS graphics driver.

BrianHG · « **Reply #1407 on:** August 01, 2020, 03:28:09 pm »

Quote from: nockieboy on August 01, 2020, 09:27:49 am

I wouldn't have a clue how to connect up DDR ram. I mean, the actual physical connection should be easy enough (with a little learning about matching trace lengths and some other oddities of PCB design), but actually integrating it into the GPU HDL would be the big issue for me. Having access to megabytes of RAM would be massively advantageous, though. Not just for the GPU, but for any application I'd decide to put the FPGA to. I could get rid of all the 'old' hardware (memory, CF card, CPU) and have it all on the one PCB, for example.

The CycloneIV operatives nowhere near enough high speed that you could ever need to match length traces unless for some reason, you have 1 trace going through circular hoops to get to it's destination. The biggest time delay you need to worry about here is vias as they have slightly more decernable delay at the top 200MHz(400MTPS) speed you can operate the ram at. For the CLK, DATA & DQS Strobes, if one of these signals goes through a via, then all of them need to go through at least 1 via to reach their destination. Otherwise, route everything on the top layer exclusively and you wont have a problem.

Note that the same goes for the ZBT ram.

BrianHG · « **Reply #1408 on:** August 01, 2020, 03:33:37 pm »

Quote from: nockieboy on August 01, 2020, 09:27:49 am

Quote from: BrianHG on July 31, 2020, 09:56:32 pm
Also remember, Maggie currently can only address 1 megabyte as much of the rest of your design also has only 20 bit addressing.

That'd be easy enough to improve, though - at least in theory.

I left space open to support 24 bits max all round the design. This means 16 megabytes is your absolute limit unless you are really prepared for some massive surgery or workarounds. Because of how the maggie is wired, only in 16bit color mode can you trick it to appear to address 32 megabytes of graphic data. But then, the entire geometry unit will need a little surgery.

nockieboy · « **Reply #1409 on:** August 01, 2020, 04:05:43 pm »

Quote from: BrianHG on August 01, 2020, 03:14:26 pm

The '_STROBE' are outputs, not inputs. This is how we are able to tell if the Z80 read each one of those ports so we may auto clear and update their values.

'WRITE FROM READ PIXEL collision counter' Should just be a COPY READ PIXEL collision counter. The 'WRITE PIXEL' collision counter is already doing what you describe.

'GEOFF_FIFO_STROBE' & 'PAGET_FIFO_STROBE', these are outside of your control. you do not have Z80 access to these.

Ah okay, I've updated the Z80_bridge code accordingly. I've called the 'COPY READ PIXEL' counter port simply 'RD_PX_CTR_STROBE'.

Quote from: BrianHG on August 01, 2020, 03:14:26 pm

What you want is:
GEO_STATUS_STROBE output and
GEO_STATUS_DATA_READ_IN 8 bit input read port where we may tie the 510 word scfifo's almost full flag to bit 0, and maybe some other status information to to any other of the 7 bits, like a 1 bit VS frame counter or spare programmable H/V strobe output on bit 1 so you may sync animation to a new frame or chosen line of video.
GEO_STATUS_DATA_WRITE_OUT this output on the same port is optional as you may use a selected bit, say bit 7 to feed all the reset lines for the entire geometry section so it may be kept in sleep until you release the reset. Or, if the geometry unit crashes, IE 510 word scfifo's almost full flag stays full for a consecutive 65535 reads, you may assume a geo crash and reset the entire section and send a fault error to your OS graphics driver.

Okay, I think I've got the IO as you want it now. Z80_bridge attached. Obviously there's nothing driving or reading these new ports yet.

nockieboy · « **Reply #1410 on:** August 01, 2020, 04:10:34 pm »

I assume you want the zero-latency FIFO for the pixel_writer to be instantiated in the pixel_writer module, rather than as a separate entity on the schematic diagram?

BrianHG · « **Reply #1411 on:** August 01, 2020, 04:13:57 pm »

That's fine.
Remember, that fifo's 'fifo_full' output drives an output port called 'draw_busy' which tells the other 2 geometry units to wait before sending over additional write pixel commands.

BrianHG · « **Reply #1412 on:** August 01, 2020, 04:17:27 pm »

Quote from: nockieboy on August 01, 2020, 04:05:43 pm

Quote from: BrianHG on August 01, 2020, 03:14:26 pm
The '_STROBE' are outputs, not inputs. This is how we are able to tell if the Z80 read each one of those ports so we may auto clear and update their values.

'WRITE FROM READ PIXEL collision counter' Should just be a COPY READ PIXEL collision counter. The 'WRITE PIXEL' collision counter is already doing what you describe.

'GEOFF_FIFO_STROBE' & 'PAGET_FIFO_STROBE', these are outside of your control. you do not have Z80 access to these.

Ah okay, I've updated the Z80_bridge code accordingly. I've called the 'COPY READ PIXEL' counter port simply 'RD_PX_CTR_STROBE'.

Quote from: BrianHG on August 01, 2020, 03:14:26 pm
What you want is:
GEO_STATUS_STROBE output and
GEO_STATUS_DATA_READ_IN 8 bit input read port where we may tie the 510 word scfifo's almost full flag to bit 0, and maybe some other status information to to any other of the 7 bits, like a 1 bit VS frame counter or spare programmable H/V strobe output on bit 1 so you may sync animation to a new frame or chosen line of video.
GEO_STATUS_DATA_WRITE_OUT this output on the same port is optional as you may use a selected bit, say bit 7 to feed all the reset lines for the entire geometry section so it may be kept in sleep until you release the reset. Or, if the geometry unit crashes, IE 510 word scfifo's almost full flag stays full for a consecutive 65535 reads, you may assume a geo crash and reset the entire section and send a fault error to your OS graphics driver.

Okay, I think I've got the IO as you want it now. Z80_bridge attached. Obviously there's nothing driving or reading these new ports yet.

// geo_unit inputs
input logic [7:0] WR_PX_CTR, // WRITE PIXEL collision counter from pixel_writer
input logic [7:0] RD_PX_CTR, // COPY READ PIXEL collision counter from pixel_writer
input logic [7:0] PAGET_FIFO, // pixel_address_generator FIFO flags NO - NOT NEEDED
input logic [7:0] GEOFF_FIFO, // geometry_xy_plotter FIFO flags NO - NOT NEEDED
input logic [7:0] GEO_STAT_RD,// bit 0 = scfifo's almost full flag, other bits free for other data

add 1 additional output:
output logic GEO_RD_STAT_STROBE, // HIGH when sending reading data from the GEO_STAT_RD bus
rename:
output logic GEO_WR_STAT_STROBE, // HIGH when sending data on GEO_STAT_WR bus

Let's see the simulation of a read and write port & what happens to the strobe signals and output data.

nockieboy · « **Reply #1413 on:** August 02, 2020, 10:06:43 am »

I'm looking at setting up the Z80_bridge code to send data to the FIFO currently, and can't see a way to do it without adding a third IO port to trigger the send. Is that right?

You're treating the 16-bit bus between the Z80_bridge and the geo_xy_plotter's FIFO as two independent 8-bit buses?

Also, is there any reason I can't just merge GEO_WR_LO and GEO_WR_HI into one 16-bit bus with one strobe? I can't see the advantage of being able to send one byte instead of one word, so I've probably missed something important.

(i.e. the Z80 could load the 16-bit register via two IO writes for the low and high byte, then write to the third port to send the 16-bit word to the geo_unit?)

nockieboy · « **Reply #1414 on:** August 02, 2020, 10:56:37 am »

Above image shows a simulation of the Z80_bridge as two bytes are written to GEOFF. The IOs shown as the Z80 writes the low byte first, then the high byte, then triggers the two strobes to let the FIFO know new data is on the bus.

I'm holding off on going further in case I go too far down the wrong path (with the third IO, buffering the low/high byte until ready to send both etc).

BrianHG · « **Reply #1415 on:** August 02, 2020, 12:43:21 pm »

Where are the:

output logic GEO_WR_LO_STROBE,// HIGH to write low byte to geo unit
output logic GEO_WR_HI_STROBE,// HIGH to write high byte to geo unit

outputs?

They seem dead when you write GEO_LO_BYTE and write GEO_HI_BYTE....

Why are they pulsing when you send something to port 248?

You have 2 strobes to just get used to the fact that each IO port should have it's associated strobe.
Separate ones for each read and separate ones for each write.
For the geo write data, you will only be using one of them, the one associated with the second half of the 16bit word you will be sending.

nockieboy · « **Reply #1416 on:** August 02, 2020, 03:23:10 pm »

Quote from: BrianHG on August 02, 2020, 12:43:21 pm

Where are the:

output logic GEO_WR_LO_STROBE,// HIGH to write low byte to geo unit
output logic GEO_WR_HI_STROBE,// HIGH to write high byte to geo unit

outputs?

They seem dead when you write GEO_LO_BYTE and write GEO_HI_BYTE....

Why are they pulsing when you send something to port 248?

You have 2 strobes to just get used to the fact that each IO port should have it's associated strobe.
Separate ones for each read and separate ones for each write.
For the geo write data, you will only be using one of them, the one associated with the second half of the 16bit word you will be sending.

They're there - I did explain previously that I was unsure how/when you wanted the GEO_LO_BYTE and GEO_HI_BYTE to be read by GEOFF's FIFO, so I thought a third IO port would be necessary to signal the FIFO that valid 16-bit data was available on the GEO_xx_BYTE bus(es). Both strobes go high together when IO port 248 is written to. If there's a better way to do it, I'm all ears..

Quote from: nockieboy on August 02, 2020, 10:06:43 am

I'm looking at setting up the Z80_bridge code to send data to the FIFO currently, and can't see a way to do it without adding a third IO port to trigger the send. Is that right?

Quote from: nockieboy on August 02, 2020, 10:06:43 am

Also, is there any reason I can't just merge GEO_WR_LO and GEO_WR_HI into one 16-bit bus with one strobe? I can't see the advantage of being able to send one byte instead of one word, so I've probably missed something important. (i.e. the Z80 could load the 16-bit register via two IO writes for the low and high byte, then write to the third port to send the 16-bit word to the geo_unit?)

As above, I questioned whether two strobes were necessary and the GEO_xx_BYTE buses can't just be treated as one 16-bit bus?

BrianHG · « **Reply #1417 on:** August 02, 2020, 04:28:35 pm »

Ok, write the low byte, then the low byte out is held in it's 'GEO_WR_LO'.
Now write the high byte, and the high byte is held in the 'GEO_WR_HI'.
Now, what happens if you had the 'GEO_WR_HI_STROBE' tied to the geometry plotter's input 'fifo_cmd_ready'?

So long as you always send the LO first and HI second, wont the HI's strobe tell the geometry unit's 'fifo_cmd_ready' to take the stored 'GEO_WR_LO' with the now new 'GEO_WR_HI' together at that point?

Well?

Maybe you prefer sending 16 bit words in Little-endian format. If this case, you would wire the LO's strobe instead of the HI's strobe to the 'fifo_cmd_ready's input. Then you must send the HI byte first, then the LO byte and when sending the LO byte, it's LO strobe would tell the 'fifo_cmd_ready's input to take both the previously sent 'GEO_WR_HI' with the new 'GEO_WR_LO'.

In either case, you are treating the final result as a 16 bit bus. You just need to decide when copying memory from somewhere else in your Z80 ram, which direction the results are written into the LO and HI port and select which one of those strobes signify that the full 16bit value has been transmitted to both. Unless, you have hidden a 16 bit data bus from me entering the Z80_bridge, how else has all 8bit MCUs/CPUs send out 16 bit wide registers.

The Z80 is a slow snail. You must shave off every cycle possible and make the GPU take control data in the least amount of clock cycles. Shaving off that silly third port writer action to take in data makes a ~35% speed improvement in sending data to the geometry unit. And at around 1 million writes per second VS 0.75, (4-8MHz clock port access cycle). Take the speed improvement.

It might even be faster to change the ports to 2 bytes of memory access as the have a 3-8MHz clock cycle. I'll leave the lessons learned here and you may choose to swap to a memory ram address in place of reading and writing to ports to drive the geometry unit.

nockieboy · « **Reply #1418 on:** August 02, 2020, 05:33:00 pm »

Quote from: BrianHG on August 02, 2020, 04:28:35 pm

You just need to decide when copying memory from somewhere else in your Z80 ram, which direction the results are written into the LO and HI port and select which one of those strobes signify that the full 16bit value has been transmitted to both.

Well, all the time I'm using I/O to get the data into the geometry unit, it's a fairly moot point as I can only send one byte at a time. I'll go with little endian as that'll match up with my intended method of getting the data across via memory instead of IO later on.

Quote from: BrianHG on August 02, 2020, 04:28:35 pm

It might even be faster to change the ports to 2 bytes of memory access. I'll leave the lessons learned here and you may choose to swap to a memory ram address in place of reading and writing to ports to drive the GPU.

That's my intention - memory access will be a lot faster than IO access - but that's something I'll sort out later.

Here's the latest simulation with the third IO removed. Will the strobes be okay or are they a little early for the GEO_WR_HI and _LO lines to stabilise?

BrianHG · « **Reply #1419 on:** August 02, 2020, 05:47:17 pm »

Quote from: nockieboy on August 02, 2020, 05:33:00 pm

Here's the latest simulation with the third IO removed. Will the strobes be okay or are they a little early for the GEO_WR_HI and _LO lines to stabilise?

Try Zooming in to see.
There should be no stabilization time. The moment the
out <= data_in ;

The data will be valid.
Now if you have:
out <= data_in ;
out_strobe <= 1 ;

Both become valid and acknowledgeable at the same time.
Remember, everything is tied to the 125MHz clock and it is all synchronous logic here inside the FPGA.
If it weren't so, ohhhh boy, everything we have written to date would completely fall apart and not function.

Your code looks fine.

Get on the pixel writer...

BrianHG · « **Reply #1420 on:** August 02, 2020, 06:21:26 pm »

One minor note:

Code: [Select]

   // **** Manage IO interface to GEOFF ****
   if ( z80_write_port_1s && Z80_addr_r[7:0]==GEO_LO ) begin     // Write to GEOFF low-byte register
      GEO_WR_LO     <= Z80_wData_r[7:0] ;
      GEO_WR_LO_STROBE <= 1'b1 ;                                 // Pulse both strobes HIGH to signal to FIFO new data on the bus
   end
   
   if ( z80_write_port_1s && Z80_addr_r[7:0]==GEO_HI ) begin     // Write to GEOFF high-byte register
      GEO_WR_HI     <= Z80_wData_r[7:0] ;
      GEO_WR_HI_STROBE <= 1'b1 ;
   end   
   // ***** End of GEOFFs IO interface *****
   
   // **** ONE-SHOTS ****
   if ( GEO_WR_HI_STROBE ) GEO_WR_HI_STROBE <= 1'b0 ; // 
   if ( GEO_WR_LO_STROBE ) GEO_WR_LO_STROBE <= 1'b0 ; //
   // *******************

We prefer to write it as:

Code: [Select]

   // **** Manage IO interface to GEOFF ****
   if ( z80_write_port_1s && Z80_addr_r[7:0]==GEO_LO ) begin     // Write to GEOFF low-byte register
      GEO_WR_LO     <= Z80_wData_r[7:0] ;
      GEO_WR_LO_STROBE <= 1'b1 ;                                 // Pulse both strobes HIGH to signal to FIFO new data on the bus
   end else GEO_WR_LO_STROBE <= 1'b0 ;
   
   if ( z80_write_port_1s && Z80_addr_r[7:0]==GEO_HI ) begin     // Write to GEOFF high-byte register
      GEO_WR_HI     <= Z80_wData_r[7:0] ;
      GEO_WR_HI_STROBE <= 1'b1 ;
   end else GEO_WR_HI_STROBE <= 1'b0 ;
   // ***** End of GEOFFs IO interface *****

What's going on here is that if somehow, (I don't know how), but imagine somehow, if the xxx_STROBE = 1 and a write port to that same strobe comes in at the same time, you would simultaneously be setting the xxx_STROBE to a 1 and the 'one shots' below will be trying to set that xxx_STROBE to 0 at the same time.

Using the 'else' means the xxx_STROBE will be continuously cleared to 0 unless that port write happens where it would be set to a '1' for that port write cycle, then back to clearing the strobe under any other circumstance.

However, as you can see, your simulation still worked anyways. You decide how you want to keep your code.

nockieboy · « **Reply #1421 on:** August 02, 2020, 07:10:17 pm »

Quote from: BrianHG on August 02, 2020, 06:21:26 pm

Using the 'else' means the xxx_STROBE will be continuously cleared to 0 unless that port write happens where it would be set to a '1' for that port write cycle, then back to clearing the strobe under any other circumstance.

However, as you can see, your simulation still worked anyways. You decide how you want to keep your code.

No, I like how you've done it - much neater and, as you say, it removes the edge-case situation too.

Latest pixel_writer below for confirmation that I'm on the right track. Not sure how to progress the case statement now - have included all the commands that could be sent from the pixel_writer (haven't added a default yet) but as far as what goes into each command statement, I'm going to need more of a steer. The read process seems like it's manageable, I just need to latch the address and value and return that instead of performing a second read to the same address - a bit like a cache, but much, much smaller.

The write process, however, seems like it's going to need some sort of pipeline as it involves a read, then some calculation to address the target bit/s, then a write...

I haven't created registers for the address, colour value, sub-pixel address and bpp to hold the data yet.

BrianHG · « **Reply #1422 on:** August 02, 2020, 09:17:16 pm »

So far so good on the pixel writer.
Step #1 hint, for always_comb (combinational logic section)

ram_addr = address input;

Yup, no clocked register delay.

Step #2, add a reset please.

Step #3, copy pixel function. Since this is a read pixel ram only function, lets begin here as the pixel writer begins with the same logic with added functions.
a) Check if our copy pixel read cache has already read the same address and the read cache valid flag is set, if so, store the new read bits per pixel setting & latch the sub pixel bit position & color data. Do nothing else.
b) If not, stop the fifo read command (shift_out) and send a 'rd_req_a' pulse and clear the read cache valid flag to 0.
c) once the 'rd_data_rdy_a' pulse comes in, latch the read data & latch the read address & latch the read bits per pixel setting & latch the sub pixel bit position & color data and set the read cache valid flag to 1.

Done. (for now, it's a start)

Let's see a simulation with a block of external 4096x16 alt_syncram function tied onto the top block diagram with dummy data. (Just generate a 16 bit counting data with the .mif editor) Also, pass the 2 rd_req_a/b, outputs through 2 clocked DFF back to the 2 rd_data_rdy_a/b simulating the read delay of the read cycle for the alt_syncram megafunction.

Don't forget to decompose the cmd_in bus to multiple inputs making it easy to edit in the .vwf an address, pixel bits, bit position, color and command function all separately.

Then we will be able to issue commands and watch the pixel writer in action with an authentic block of ram.

nockieboy · « **Reply #1423 on:** August 03, 2020, 09:12:36 am »

Quote from: BrianHG on August 02, 2020, 09:17:16 pm

Step #3, copy pixel function. Since this is a read pixel ram only function, lets begin here as the pixel writer begins with the same logic with added functions.
a) Check if our copy pixel read cache has already read the same address and the read cache valid flag is set, if so, store the new read bits per pixel setting & latch the sub pixel bit position & color data. Do nothing else.
b) If not, stop the fifo read command (shift_out) and send a 'rd_req_a' pulse and clear the read cache valid flag to 0.
c) once the 'rd_data_rdy_a' pulse comes in, latch the read data & latch the read address & latch the read bits per pixel setting & latch the sub pixel bit position & color data and set the read cache valid flag to 1.

Okay, got as far as b) above before confusion set in.

How am I getting data out of the FIFO? Am I supposed to be checking each clock cycle if the pixel_writer is free, then strobing the FIFO's 'shift_out' line to get the next command? I'm a little fuzzy on the details here.

BrianHG · « **Reply #1424 on:** August 03, 2020, 01:02:39 pm »

The if the 'pixel_cmd_rdy', you should be always be reading the fifo's shift_out, except when busy. This should be a combination function to erase 1 clock cycle.

IE inside always_comb

load_next_cmd = pixel_cmd_rdy && !stop_fifo_read;

I know some things here appear to get ugly as we try to erase every wasted clock cycle as each read from ram takes 2+ clock cycles while a write takes only 1. And you also need to do a read every time you write a pixel +1 processing clock cycle. We are trying to get that down to 0 cycles and do just the processing and write in a single clock cycle. Example, a copy pixel will take 2 clocks for the read, then write pixel will take another 2 to read it's mem posistion, then another 1 to edit the right bits within that 16 bit word & write the output. Without cache, this means each copy bliter function of four 16 color pixels to 4 16 color pixels will take 2rc+2rw+1w,2rc+2rw+1w,2rc+2rw+1w,2rc+2rw+1w clocks for a total of 20 clocks. (rc = read in pixel copy command, rw = read in pixel write command, w = write in pixel write data to ram) With a two read word caches, one for each read&write pixel command, here is the new clock count: 2rc+2rw+1w,1w,1w,1w for a total of 8 clocks. Now when copying filled boxes 4 at least 4 pixels wide, 20:8 for 16 color graphics this is a 2.5 fold speed increase. For filled triangles, boxes and ellipses, it is a 1.5x speed increase. The speed increase is even greater with 2 bit color and 1 bit color graphics. (For DDR ram, since that ram has a minimum fixed size read & write bursts (something like 4 or even 8 16 bit words with an additional setup delay of additional 8 clocks making 16 clocks for 1 access), these caches would need to increase to that width in bits and then you would want a write data cache which wont send out a write until it is filled a 4-8 word chunk as well since our wait with such memory transactions will be huge compared to the onchip static ram.)

Code: [Select]

		cache_address           <= 20'b0 ;
		cache_colour            <= 8'b0  ;
		cache_bpp               <= 4'b0  ;
		cache_target            <= 4'b0  ;

There are two read caches, one for the pixel copy and one for the pixel write which also needs to read ram data before generating a word to write back to ram.

You also need a cache_data_valid bit for each since after reset, this bit should be cleared. Otherwise, after reset, if you read/write to address 0, without this extra bit, the cache contents will be assumed correct since it now has an address of 0. (We will also have to add a 'stale' timer later on which automatically clears this flag if no reads happens within a set period of time since the geometry unit won't know if the Z80 manually edited the same memory word within each cache, so it should be re-read even if the same address was requested once again after doing nothing for something like 255 clock cycles.)

Yes it is possible to chain and shave off one clock cycle on the required read memory when doing back-back read/copy and write pixel commands, we will tackle that one if you are up to it.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 510852 times)

Share me