@BrianHG - do you have thoughts/preferences on these options at all for working best with your multi-port module? I've defaulted to bank-row-column, from what asmi is saying it'll be rank-bank-row-column (don't try saying that when you're drunk!), although the rank part is dealt with by the MIG interface.
I'm thinking that 'rank-bank-row-column' gives you the opportunity to create a 2 rank controller for when a 8gig stick is installed and it may still function with a 4gig stick except that everything above 4 gig will read and write as blank or error. It depends on Vivado's ram controller's flexibility when in operation.
This is what is meant by the 'order'
Say we have a 33bit address. The
rank-bank-row-column means that the address being wired to the ram modules will be:
Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, bank[2:0] , row[14:0], column[9:0], null[3:0] } ;
(For Vivado's ram controller, just ignore the bottom 4 null bits)
The 'rank' controls the 2 separate CS# (S1#/S0# chip select on module) pins for the first 8 and second 8 ram chips wired in parallel except for the CS and CKE pins. Basically it is wired as an upper address line in the
'rank-bank-row-column' order. The bank[2:0] selects 1 of 8 banks in each ram chip. The row selects the row address. The column[9:0] selects which column to read. I placed the null[3:0] there since even though the module is a 64bit DDR/ 128bit, I still point the address down to they theoretical byte even though those address lines in the command are ignored. (Assuming Vivado's ram controller places these dummy address lines in their command like mine. Also, my controller also ignores the column[2:0] and forces 0s there as well as it's read write minimum burst size 8 (Called a BL8) always aligns the read and write in a forward order.)
Now, why would we do this. Say we went with
'rank-row-bank-column', ie:
Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, row[14:0], bank[2:0], column[9:0], null[3:0] } ;
Now, every sequential 16384 bytes of ram, we will switch to a new bank.
bytes 0-16383 is in bank 0.
bytes 16384-32767 is in bank 1.
ect... until bank 7, 131071 bytes. After bank 7, we go to row 1, bank 0,... ect...
With my selected preference
rank-bank-row-column, ie:
Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, bank[2:0] , row[14:0], column[9:0], null[3:0] } ;
Now the first sequential 536870912 bytes of ram are all in bank 0, though every 16384 bytes will switch to a new row for that bank 0.
Then we will go to bank 1, row 0 and up.
If all your code and loops usually fits into a block or two of 131071 bytes, then it should be advantageous to use
'rank-row-bank-column' or even
'row-rank-bank-column'. However, with out large display buffer and future display textures, sound and network buffers, operating in
'rank-bank-row-column' can offer other future memory access optimization if you properly design to do so. (IE: place display buffers 1,2,3,4 in banks 6 and 7, place textures in banks 5,4,3,2, line interleaved multiplexed, (IE you can parallel read alternate lines of a texture when filling so you can do bi-linear filtering when zooming to a texture without all the additional row precharge/activate/read/ /precharge/activate/read everytime every time you read a new Y coordinate on a texture to acquire the pixel shading blend.) code and audio in banks 0,1. In dual rank mode, make each bank size 1gb and potentially keep track of 16 banks)
What about bank machines? More the merrier, or is 4 the sweet spot? I guess you'd want as many bank machines as you have ports (up to max 8 ), as each port could be using a different row? Depends on resource usage I guess?
I do not know what Vivado's ram controller's bank machines are all about. Ask asmi. In my controller, since the DDR3 has 8 banks, my controller keeps track an keeps open all 8 banks. It will only close them as needed to optimize access. My guess, and it is only a guess, is that your should set Vivado's ram controller's bank machines to 8 to keep all 8 individual banks open memory access makes use of it.
What about transaction ordering? Any preference there for maximum compatibility/performance?
I do not know what this setting does.
For my multiport controller, I have:
// ************************************************************
// *** Controls are received from the BrianHG_DDR3_PHY_SEQ. ***
// ************************************************************
input SEQ_CAL_PASS , // Goes low after a reset, goes high if the read calibration passes.
input DDR3_READY , // Goes low after a reset, goes high when the DDR3 is ready to go.
input SEQ_BUSY_t , // (*** WARNING: THIS IS A TOGGLE INPUT when parameter 'USE_TOGGLE_OUTPUTS' is 1 ***) Commands will only be accepted when this output is equal to the SEQ_CMD_ENA_t toggle input.
input SEQ_RDATA_RDY_t , // (*** WARNING: THIS IS A TOGGLE INPUT when parameter 'USE_TOGGLE_OUTPUTS' is 1 ***) This output will toggle from low to high or high to low once new read data is valid.
input [PORT_CACHE_BITS-1:0] SEQ_RDATA , // 256 bit date read from ram, valid when SEQ_RDATA_RDY_t goes high.
input [DDR3_VECTOR_SIZE-1:0] SEQ_RDVEC_FROM_DDR3 , // A copy of the 'SEQ_RDVEC_FROM_DDR3' input during the read request. Valid when SEQ_RDATA_RDY_t goes high.
// ******************************************************
// *** Controls are sent to the BrianHG_DDR3_PHY_SEQ. ***
// ******************************************************
output logic SEQ_CMD_ENA_t , // (*** WARNING: THIS IS A TOGGLE CONTROL! when parameter 'USE_TOGGLE_OUTPUTS' is 1 *** ) Begin a read or write once this input toggles state from high to low, or low to high.
output logic SEQ_WRITE_ENA , // When high, a 256 bit write will be done, when low, a 256 bit read will be done.
output logic [PORT_ADDR_SIZE-1:0] SEQ_ADDR , // Address of read and write. Note that ADDR[4:0] are supposed to be hard wired to 0 or low, otherwise the bytes in the 256 bit word will be sorted incorrectly.
output logic [PORT_CACHE_BITS-1:0] SEQ_WDATA , // write data.
output logic [PORT_CACHE_BITS/8-1:0] SEQ_WMASK , // write data mask.
output logic [DDR3_VECTOR_SIZE-1:0] SEQ_RDVEC_TO_DDR3 , // Read destination vector input.
output logic SEQ_refresh_hold // Prevent refresh. Warning, if held too long, the SEQ_refresh_queue will max out.
);
(I have a parameter to change the '_toggles' controls to positive logic. IE: normal High = on, low = off)
After the DDR3 is ready...
While you keep my 'SEQ_BUSY_t' input low, I will set 'SEQ_CMD_ENA_t' high when I am sending out a command. Otherwise it is low.
My SEQ_ADDR output will be 33bit for an 8gb module. (For Vivado's ram controller, just ignore the bottom 4 bits)
The bottom 7bits will always be 0s since I am expecting 512bit data.
My 'SEQ_WMASK' output will be 64bits (512/8) and for every bit which is high, the 8 associated data bits are expected to be written.
(Warning, Vivado's ram controller may have this inverted as this is how it is on the DDR3 ram chips)
When I send a read command, my 'SEQ_RDVEC_TO_DDR3' output will have a 4 bit ID number.
My multiport will accept read data word every clock while the (SEQ_RDATA_RDY_t) input is high. While it is high, it is expecting a 4 bit ID input 'SEQ_RDVEC_FROM_DDR3' from the ram controller with the read data on input 'SEQ_RDATA'.
If Vivado's ram controller doesn't support such a feature, you will need to create your own. So long as the reads come back in the same order as the read requests, it is nothing more than a FIFO tied to my read request, and Vivado's ram controller read ready. This FIFO just needs to be long enough to support the maximum Queued read commands Vivado's ram controller allows before an actual read is returned. It is preferred that Vivado's ram controller supports some sort of read target address/pointer function as this removes any possible synchronization error or bug.
As for the dual rank, settings in my multiport, we will just change my parameter 'DDR3_WIDTH_BANK' from 3 to 4, effectively treating the rank as another 8 banks. Basically we will set the ram chips to 8x 4gig, 8bit, but add an extra bit on the bank as the
'rank-bank-row-column' will just operate as a 16 bank memory even though it just comes across 2 groups of ram chips tied in parallel.