Author Topic: FPGA VGA Controller for 8-bit computer (Read 510763 times)

BrianHG · « **Reply #300 on:** November 11, 2019, 10:51:26 am »

Quote from: nockieboy on November 11, 2019, 10:12:38 am

Quote from: BrianHG on November 10, 2019, 03:00:03 pm
After initiating the "gpu_dual_port_ram_INTEL gpu_RAM(.....);", you need the:
---------------------------
defparam
gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS ;
---------------------------
This will pass the module multiport_gpu_ram's MAX_ADDR_BITS parameter into the gpu_dual_port_ram_INTEL's MAX_ADDR_BITS parameter. It may be useful to pass the 'altsyncram_component.numwords_a&b' since it may be possible to allocate 24kb in the FPGA since it has that much memory, yet not 32kb.

Okay, stupid question - altsyncram specifies altsyncram_component.numwords_a (and b) - I had 2 ** MAXSIZE in there, but if they're the number of words, I'll need to divide that by word size (, otherwise the RAM will (try to be) 8 times larger than what I think I'm specifying?

So, for example, this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2 // but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified parameter WORDS = 2 ** MAX_ADDR_BITS;

..needs to be this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2 // but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified parameter WORDS = (2 ** MAX_ADDR_BITS) / 8;

??

Nope, the first one not the divide by 8.

Quote

Quote from: BrianHG on November 10, 2019, 03:00:03 pm
   // address pass-thru bus (output)
   output reg [19:0] addr_out, There are 5 of these to match the read address ins 0 through 5 in.

   // auxilliary read command buses (input)
   input [7:0] aux_read_0,
   input [7:0] aux_read_1,
   input [7:0] aux_read_2,
   input [7:0] aux_read_3,
   input [7:0] aux_read_4,
change all these to cmd_in[15:0]. (global search and replace)

   // auxilliary read command buses (pass-thru output)
   output reg [7:0] auxRdPT_0,
   output reg [7:0] auxRdPT_1,
   output reg [7:0] auxRdPT_2,
   output reg [7:0] auxRdPT_3,
   output reg [7:0] auxRdPT_4,
change these to cmd_out[15:0]

reg [MAX_ADDR_BITS - 1:0] address_mux;
change to reg [19:0] address_mux;

reg [7:0] aux_read_mux;
change to reg [15:0] cmd_read_mux (global search and replace)

These should all be present and correct now... I think. Got a little confused earlier with all the changes, so I'll be double-checking it all, but I think it would benefit from a close look.

Quote from: BrianHG on November 10, 2019, 03:00:03 pm
Your missing a few of the new ports for 'gpu_dual_port_ram_INTEL gpu_RAM(...);'

They should all be present and correct now.

Quote from: BrianHG on November 10, 2019, 03:00:03 pm
Almost done, next you will resort the read ram contents, the piped through address & cmds into their output registers and sync those to your new delayed 'pc_ena_out[3:0]' coming out of the Intel ram module.

Have made a bit of a start on this - the 5:1 mux code is modified according to my present understanding. The read address is passed through to the ram module, the pass-through address is passed out to the appropriate address bus according to the current mux step, as is the data read from memory.

I'm a little unsure about the command bus, though. It's piped into the memory via cmd_read_mux, but that seems like an unnecessary step as I only have one cmd_in bus (and one cmd_out bus) - should these be increased to 5 as well? It's possible I've misunderstood your instruction to 'change all these to cmd_in[15:0]'...

The 1 command bus is inside the INTEL dual port ram module. Just like the read addresses, it should be piped through in a single file fashion.
On the multiport GPU ram, there should be 5 groups going in, grouped with the 5 read addresses going in, and 5 grouped 16 bit cmd coming out, just like the 5 read datas, 5 read addresses, 5 cmd_outs, all in parallel...

Quote

Quote from: BrianHG on November 10, 2019, 03:00:03 pm
Note that we forgot to wire through the 'pc_ena_out[3:0]' coming out of the Intel ram module thought to the multiport_gpu_ram ( ...) ports, so that the rest of our graphics pipe heading to the output pins will incorporate the delay shift generated by the memory. (Though we can work around this through sophisticated re-syncing all the ram outputs back to the next pc_ena_in==0 cycle, this ena signal in the FPGA is beginning to drive so much logic limiting our FMAX, this is an opportune point to D-clock pipe the signals for the second half of our graphics pipe.)

Okay, I think I understand - but pc_ena passes through the gpu_dual_port_ram_INTEL module via a register pipe, which will fulfil the need to D-clock the signal, right?

Pipe it just like a read address and the auxiliary 16 bit cmd, delayed by two 125MHz clocks. The difference is when it comes back through the GPU multiport ram module, there it is not muxed, it's just wired through without delay.

Quote

gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL ( // inputs input clk, input [3:0] pc_ena_in, input clk_b, input wr_en_b, input [19:0] addr_a, input [19:0] addr_b, input [7:0] data_in_b, input [15:0] cmd_in, // registered outputs output reg [19:0] addr_out_a, output reg [3:0] pc_ena_out, output reg [15:0] cmd_out, // direct outputs output wire [7:0] data_out_a, output wire [7:0] data_out_b ); // define the maximum address bit parameter ADDR_SIZE = 14; ********************************************************** // define the memory size (number of words) - this allows RAM sizes other than multiples of 2 // but defaults to power-of-two sizing based on ADDR_SIZE if not otherwise specified parameter NUM_WORDS = 2 ** ADDR_SIZE; ********************************************************** // define delay pipe registers reg [19:0] rd_addr_pipe_a; reg [15:0] cmd_pipe; reg [3:0] pc_ena_pipe; // **************************************************************************************************************************** // Dual-port GPU RAM // // Port A - read only by GPU // Port B - read/writeable by host system // Data buses - 8 bits / 1 byte wide // Address buses - MAX_ADDR_BITS wide (14 bits default) // Memory word size - 2^MAX_ADDR_BITS (16384 bytes default) // **************************************************************************************************************************** altsyncram altsyncram_component ( .clock0 (clk), .wren_a (1'b1), .address_b (addr_b[ADDR_SIZE-1:0]), *************************************************************** .clock1 (clk_b), .data_b (data_in_b), .wren_b (wr_en_b), .address_a (addr_a[ADDR_SIZE-1:0]), **************************************************************************** .data_a (8'b00000000), .q_a (data_out_a), .q_b (data_out_b), .aclr0 (1'b0), .aclr1 (1'b0), .addressstall_a (1'b0), .addressstall_b (1'b0), .byteena_a (1'b1), .byteena_b (1'b1), .clocken0 (1'b1), .clocken1 (1'b1), .clocken2 (1'b1), .clocken3 (1'b1), .eccstatus (), .rden_a (1'b1), .rden_b (1'b1)); defparam altsyncram_component.address_reg_b = "CLOCK1", altsyncram_component.clock_enable_input_a = "BYPASS", altsyncram_component.clock_enable_input_b = "BYPASS", altsyncram_component.clock_enable_output_a = "BYPASS", altsyncram_component.clock_enable_output_b = "BYPASS", altsyncram_component.indata_reg_b = "CLOCK1", altsyncram_component.init_file = "../osd_mem.mif", altsyncram_component.intended_device_family = "Cyclone IV E", altsyncram_component.lpm_type = "altsyncram", altsyncram_component.numwords_a = NUM_WORDS, altsyncram_component.numwords_b = NUM_WORDS, altsyncram_component.operation_mode = "BIDIR_DUAL_PORT", altsyncram_component.outdata_aclr_a = "NONE", altsyncram_component.outdata_aclr_b = "NONE", altsyncram_component.outdata_reg_a = "CLOCK0", altsyncram_component.outdata_reg_b = "CLOCK1", altsyncram_component.power_up_uninitialized = "FALSE", altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're altsyncram_component.read_during_write_mode_port_b = "OLD_DATA", altsyncram_component.widthad_a = ADDR_SIZE, ******************************************************************** altsyncram_component.widthad_b = ADDR_SIZE, ********************************************************************* altsyncram_component.width_a = 8, altsyncram_component.width_b = 8, altsyncram_component.width_byteena_a = 1, altsyncram_component.width_byteena_b = 1, altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1"; // **************************************************************************************************************************** always @(posedge clk) begin // ************************************************************************************************************************** // *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports // ************************************************************************************************************************** rd_addr_pipe <= addr_a; addr_out_a <= rd_addr_pipe; cmd_pipe <= cmd_in; cmd_out <= cmd_pipe; pc_ena_pipe <= pc_ena_in; pc_ena_out <= pc_ena_pipe; // ************************************************************************************************************************** end endmodule

multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram ( input clk, // Primary clk input (125 MHz) input [3:0] pc_ena_in, // Pixel clock enable input clk_b, // Host (Z80) clock input input write_ena_b, // Host (Z80) clock enable // address buses (input) input [19:0] address_0, input [19:0] address_1, input [19:0] address_2, input [19:0] address_3, input [19:0] address_4, input [19:0] addr_host, // auxilliary read command buses (input) input [15:0] cmd_in, // outputs output wire [3:0] pc_ena_out, // address pass-thru bus (output) output reg [19:0] addr_passthru_0, output reg [19:0] addr_passthru_1, output reg [19:0] addr_passthru_2, output reg [19:0] addr_passthru_3, output reg [19:0] addr_passthru_4, output reg [19:0] addr_host_passthru, // auxilliary read command bus (pass-thru output) output reg [15:0] cmd_out, ************************************* NEED 5x cmd_out0/1/2/3/4 and we also need 5x cmd_in# // data buses (output) output reg [7:0] dataOUT_0, output reg [7:0] dataOUT_1, output reg [7:0] dataOUT_2, output reg [7:0] dataOUT_3, output reg [7:0] dataOUT_4, output [7:0] data_host ); // dual-port GPU RAM handler // define the maximum address bits - effectively the RAM size parameter ADDR_SIZE = 14; ******************************************* parameter NUM_WORDS = 2 ** ADDR_SIZE ; ******************************************* reg [19:0] address_mux; reg [15:0] cmd_read_mux; wire [19:0] addr_passthru_mux; wire [7:0] data_mux; // create a GPU RAM instance gpu_dual_port_ram_INTEL gpu_RAM( .clk(clk), .pc_ena_in(pc_ena_in), .clk_b(clk_b), .wr_en_b(wr_en_b), .addr_a(address_mux), .addr_b(), .data_in_b(), .cmd_in(cmd_read_mux), .addr_out_a(addr_passthru_mux), .pc_ena_out(pc_ena_out), .cmd_out(cmd_out), .data_out_a(data_mux), .data_out_b() ); // pass MAX_ADDR_BITS into the gpu_RAM instance defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE, ************************************************************************* gpu_RAM.NUM_WORDS = NUM_WORDS ; // ************** Actual word count always @(posedge clk) begin // route non-muxed pass-throughs cmd_read_mux <= cmd_in; // perform 5:1 mux for all inputs to the dual-port RAM case (pc_ena[2:0]) 3'b000 : begin address_mux <= address_0; addr_passthru_0 <= addr_passthru_mux; dataOUT_0 <= data_mux; end 3'b001 : begin address_mux <= address_1; addr_passthru_1 <= addr_passthru_mux; dataOUT_1 <= data_mux; end 3'b011 : begin address_mux <= address_2; addr_passthru_2 <= addr_passthru_mux; dataOUT_2 <= data_mux; end 3'b100 : begin address_mux <= address_3; addr_passthru_3 <= addr_passthru_mux; dataOUT_3 <= data_mux; end 3'b101 : begin address_mux <= address_4; addr_passthru_4 <= addr_passthru_mux; dataOUT_4 <= data_mux; end endcase end // always @clk endmodule

Read all my ********************************************** in the 2 codes above

To make 1 thing clear, I changes the 'MAX_ADDR_BIT' to 'ADDR_SIZE'. So, 14 = 14 address lines = [13:0]...
Studying the settings you setup in the Megawizard, and analyzing the example dualp...v file it generated should confirm this.
Same for 'WORDS', I changed it to 'NUM_WORDS'.

Check all the new ***************************** as there were one of 2 other items...

Next, re-assemble all the outputs of the INTEL dualport ram into 5 addresses, 5 datas, 5 cmds.
Helpful hint:
Since we want all the 5 outputs to parallel appear, each with the write contents when the input (pc_ena[2:0] == 0), and you have a bunch of delays through this module where you can easily loose count of clocks cycles, especially if you need to make your mux take 2 or 3 clocks instead of 1 to help improve FMAX, make these local params and I'll leave it up to you to figure out how to implement them:

localparam CLK_CYCLES_MUX = 1; // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam CLK_CYCLES_RAM = 2; // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.

Im not sure of this one, we will need to send this parameter back to the OSD generator so it know how many pixels to delay the H&V ena, and OSD ena to align the picture.
localparam CLK_CYCLES_PIXEL

= ; // Adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready.

BrianHG · « **Reply #301 on:** November 11, 2019, 11:17:02 am »

You can only have 1 'defparam' after each module is initiated. To set multiple parameters, you use the " , " at the end of each parameter and the ' ; ' at the end of the final setting.

When you pass multiple parameters to a sub module, you type it like this:
--------------------------------------------------------------------
// pass MAX_ADDR_BITS into the gpu_RAM instance
defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE,
gpu_RAM.NUM_WORDS = NUM_WORDS ;
-------------------------------------------------------------------

nockieboy · « **Reply #302 on:** November 11, 2019, 11:39:18 am »

Quote from: BrianHG on November 11, 2019, 10:51:26 am

The 1 command bus is inside the INTEL dual port ram module. Just like the read addresses, it should be piped through in a single file fashion.
On the multiport GPU ram, there should be 5 groups going in, grouped with the 5 read addresses going in, and 5 grouped 16 bit cmd coming out, just like the 5 read datas, 5 read addresses, 5 cmd_outs, all in parallel...

All done - I've renamed some of the buses as well to make it clearer what's going on with all the pass-throughs etc.

Quote from: BrianHG on November 11, 2019, 10:51:26 am

Pipe it just like a read address and the auxiliary 16 bit cmd, delayed by two 125MHz clocks. The difference is when it comes back through the GPU multiport ram module, there it is not muxed, it's just wired through without delay.

Sorted. pc_ena is treated the same way as the other delayed signals in the memory module, but passed straight to the output in multiport_gpu_ram.

Quote from: BrianHG on November 11, 2019, 10:51:26 am

Read all my ********************************************** in the 2 codes above

Thanks - have updated the code accordingly.

Quote from: BrianHG on November 11, 2019, 10:51:26 am

Next, re-assemble all the outputs of the INTEL dualport ram into 5 addresses, 5 datas, 5 cmds.
Helpful hint:
Since we want all the 5 outputs to parallel appear, each with the write contents when the input (pc_ena[2:0] == 0)...

Ooookay... so all five outputs should be valid when pc_ena[2:0] == 0? At the moment, addr_out_0, cmd_out_0, data_out_0 will all be valid two or three clock cycles (at least) before addr_out_1, cmd_out_1 and data_out_1, etc. with the delays compounding up to the 5th set of outputs? Not to mention pc_ena needing to be delayed as well until the 5th outputs of the mux are ready?

Would it work to just route the results of the first 4 mux cycles into registers and then assign all 5 sets of results to the outputs at the end of the 5th mux cycle? Am I even understanding the issue?

Currently, the mux code is just putting the results onto the multiport outputs as soon as they come in.

Quote from: BrianHG on November 11, 2019, 10:51:26 am

...and you have a bunch of delays through this module where you can easily loose count of clocks cycles, especially if you need to make your mux take 2 or 3 clocks instead of 1 to help improve FMAX, make these local params and I'll leave it up to you to figure out how to implement them:

localparam CLK_CYCLES_MUX = 1; // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam CLK_CYCLES_RAM = 2; // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.

Im not sure of this one, we will need to send this parameter back to the OSD generator so it know how many pixels to delay the H&V ena, and OSD ena to align the picture.
localparam CLK_CYCLES_PIXEL = ; // Adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready.

localparams added. CLK_CYCLES_PIXEL (CLK_CYCLES_PIX in code) needs to be added to the OSD generator code then?

gpu_dual_port_ram_INTEL.v:


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 510763 times)

Share me