Author Topic: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.  (Read 45045 times)

0 Members and 4 Guests are viewing this topic.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8139
  • Country: ca
    • LinkedIn
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #175 on: January 26, 2023, 01:02:37 pm »
It's up to Nockieboy what he wants to do.  All I offered was a way of backwards compatibility to his existing GPU code as all it took were a few cut and paste on my side.  I no longer have time to help his to adapt to an entirely new layout and architecture.  That will now become your job asmi.

There I was thinking switching FPGA wouldn't be that much of an upheaval to the project...  :-\

I'm going to have to go the path of least resistance, which for me at the moment is to retain the existing architecture as much as possible (as I understand it) and keep the interconnects within the GPU project close to the metal, rather than using something like AXI to make it all plug 'n' play.  I guess this means I'll have to write an AXI interface to use a MicroBlaze with the GPU, but I did that with Wishbone so perhaps it's possible for me to do it with AXI too.  It still means any other soft-core CPU (I'm looking at you, 68000 etc.) will be straightforward to interface to the GPU via a bridge like the Z80_Bridge module.

In terms of 'what's next', aside from having a play with Vivado and the simulation software to get familiar with it and send some commands to the DDR3, I need to start looking at the MIG's HDL itself and working out what signals, ports, buses are exposed by it to look into wiring it to BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv.
Once again, my multiport has a 'wait/busy' input to tell it to wait and not send any commands.
It has a 'enable command' output to tell you it is sending a command.
It has a 'read/write' flag for the type of command.
It provides an address.
If it wants to write, it provides the write data + write byte enable which specifies which bytes in the 512bits should be written into DDR3.
If reading, it provides an ID code to instruct where that read command's data belongs.

It also has a 'read data ready' input with a 512bit data input port plus the expected read ID code input associated with that read ready data.

I'm sure you can wire this to Vivado's existing AXI standard if you like by adding some simple glue control logic, or, you also might be able to use Vivado MIG's lower level user interface if you like to achieve the same communication with the DDR3 MIG.  It is up to you Nockieboy.

If AXI operates with separate read and write data paths or runs the data at 2x clock internally for bidir communication, then there is no hindrance to using it.  Otherwise if the data path is shared as a single bidirectional read/write bus, note that my multiport will not be able to send out write data commands while read data is being returned, slowing down mixed/bidirectional DDR3 transactions.  If you want this added speed, then you will need to use a lower level direct interface to Vivado's MIG to allow write data posting while read data is still being received in the data pipeline.

Asmi should have a much better understanding of AXI's capabilities.  I'm also assuming if you tie my multiport to the AXI buss, you should be able to tie additional AXI compliant clients on the AXI side as well.
« Last Edit: January 26, 2023, 01:23:54 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #176 on: January 26, 2023, 04:13:56 pm »
I'm sure you can wire this to Vivado's existing AXI standard if you like by adding some simple glue control logic, or, you also might be able to use Vivado MIG's lower level user interface if you like to achieve the same communication with the DDR3 MIG.  It is up to you Nockieboy.

If AXI operates with separate read and write data paths or runs the data at 2x clock internally for bidir communication, then there is no hindrance to using it.  Otherwise if the data path is shared as a single bidirectional read/write bus, note that my multiport will not be able to send out write data commands while read data is being returned, slowing down mixed/bidirectional DDR3 transactions.  If you want this added speed, then you will need to use a lower level direct interface to Vivado's MIG to allow write data posting while read data is still being received in the data pipeline.

Asmi should have a much better understanding of AXI's capabilities.  I'm also assuming if you tie my multiport to the AXI buss, you should be able to tie additional AXI compliant clients on the AXI side as well.
AXI is not a Xilinx standard, but an ARM standard, and it's widely used in ARM SoCs - I guess that's why they decided to adopt it all those years ago. Currently AXI4 variant of a bus is used by Xilinx IPs. Full specification is publicly accessible here: https://developer.arm.com/documentation/ihi0022/e/?lang=en

There are three flavors of AXI bus - AXI4 memory-mapped Full, AXI4 memory-mapped lite (it's a simplified version of a full bus, it lacks burst capability and is used where high bandwidth is not required - like in a control registers interface), and AXI stream - which is a simple parallel interface with ready/valid handshake so that it can support throttling from both sides of the stream (source and sink), it's mostly used for non-memory-mapped stream-like data transfer, so I only mention it here for the sake of completeness.

AXI4 memory-mapped bus is a point-to-point connection (so if you want to have more devices connected you will need to use interconnect) and consists of 5 separate channels - read address, read data, write address, write data and write response. Each channel is semi-independent (in a sense that transactions on each channel can happen independently of other channels), but of course they are logically related - transfer over "read data" channel (or a series of them in case of a burst) is a response for earlier read request over "read address" channel, "write data" channel provides data to write for a write request over "write address" channel, and "write response" channel is used to communicate result of a write request back to a requestor. Here is a diagram from the specification:


It's very easy to use on a master side (because you get to initiate all transactions), it's also relatively easy to implement and AXI4-lite slave (because it doesn't support the most complex things which full one does) like a control registers interface, AXI4 full slave is the most complex because it needs to support quite wide range of transactions.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8139
  • Country: ca
    • LinkedIn
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #177 on: January 26, 2023, 04:28:52 pm »
AXI4 memory-mapped bus is a point-to-point connection (so if you want to have more devices connected you will need to use interconnect) and consists of 5 separate channels - read address, read data, write address, write data and write response. Each channel is semi-independent (in a sense that transactions on each channel can happen independently of other channels)
That's perfect for Nockieboy.  He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.

The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.

Quote
write data
Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.

Quote
write response
Other than if you need to certify writes, I'm guessing you may ignore this.
« Last Edit: January 26, 2023, 04:41:07 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #178 on: January 27, 2023, 03:47:02 pm »
That's perfect for Nockieboy.  He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.
How do you deal with variable latency (say due to ongoing refresh)? Do you have some sort of elastic FIFO to sort it out?
I will create a testbench similar to what I created for UI some time during weekend.

The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.
Advantage of using AXI is that you can connect your AXI master port to a crossbar interconnect instead of directly to MIG and this will make MIG available to other AXI masters alongside your component.
Also with AXI you won't have to use individual requests, for example, you can command up to 4KB-long burst with a single command, instead of making a whole bunch of 64 byte requests.

One thing to keep in mind is that AXI addresses are true byte addresses, unlike MIG UI, which uses addresses derived from rank/bank/row/column.

Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.
There is a write byte mask support via WSTRB signal. Its' logic is the opposite of DDR3 DM signals in that only bytes for which the corresponding bit of WSTRB is "high" are written.

Other than if you need to certify writes, I'm guessing you may ignore this.
If you don't need these, you can simply hardwire BREADY signal to "high" and leave other signals of that channel unconnected.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8139
  • Country: ca
    • LinkedIn
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #179 on: January 27, 2023, 05:10:52 pm »
That's perfect for Nockieboy.  He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.
How do you deal with variable latency (say due to ongoing refresh)? Do you have some sort of elastic FIFO to sort it out?
I will create a testbench similar to what I created for UI some time during weekend.
Yes, my user command requests have a variable self adjusting Queue.
When controlling the DDR3, or in the case talking to the AXI4, when I send a write, it is expected to eventually be written.
When I send a read, I know that the read data comes in way in the future, so I transmit an ID with the read request.

On my read data bus input, when a read data is ready, I expect to see that ID I transmitted with the read so I know where that read data belongs.
Quote
The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.
Advantage of using AXI is that you can connect your AXI master port to a crossbar interconnect instead of directly to MIG and this will make MIG available to other AXI masters alongside your component.
Also with AXI you won't have to use individual requests, for example, you can command up to 4KB-long burst with a single command, instead of making a whole bunch of 64 byte requests.

One thing to keep in mind is that AXI addresses are true byte addresses, unlike MIG UI, which uses addresses derived from rank/bank/row/column.
My multiport will send a 64byte request each time.  It was designed like this because if 2 read/write ports which have identical priority have a small max burst size and have access in the same bank and column or different banks who have already been activated with the same row, my multiport will automatically perform an interleaved access knowing that the DDR3 bandwidth will still be maximum, but 2 simultaneous ports will at least be moving data at the same time.  My multiport also shaves out any unnecessary access caches the 64bytes for each user IO port locally.  (IE: if a port for the Z80 is set to 8bit, then 1 read will send a read, reading and writing the next 63 bytes, or repeats will not send out any commands to the DDR3 until the Z80 needs a new 64byte chunk.  Both read and write 64byte chunks are separate caches and are automatically aware of each other if they have the same memory address.)

My multiport address output is a true address, down to the byte, but it has been sanitized for the MIG at the other end to evenly land on 512bit/64byte blocks every time.  IE: the lower address bits are effectively tied the GND.
Quote
Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.
There is a write byte mask support via WSTRB signal. Its' logic is the opposite of DDR3 DM signals in that only bytes for which the corresponding bit of WSTRB is "high" are written.
So, 100% compatible with my multiport.
Quote
Other than if you need to certify writes, I'm guessing you may ignore this.
If you don't need these, you can simply hardwire BREADY signal to "high" and leave other signals of that channel unconnected.
AOK.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8139
  • Country: ca
    • LinkedIn
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #180 on: January 27, 2023, 05:15:04 pm »
Also, my multiport has parameters for the width of the DDR3 column, bank, row, and bit width sizes.  It needs these set to match the DDR3 module to know when to prioritize which read/write commands go out when.  It also has a parameter for your DDR3 controller's 'BANK-ROW-COLUMN' as this needs to be known to best prioritize DDR3 access.

(Though if another AXI device through access changes the DDR3 bank #7 compared to what my multiport was expecting, the DDR3 MIG controller will need to do additional gymnastics.)
« Last Edit: January 27, 2023, 05:16:46 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #181 on: February 06, 2023, 03:47:51 pm »
Sorry about delays, some urgent stuff came up that I had to deal with.
I've added another project into the source control for AXI version of a MIG. There are two simulation sets - one (sim_1) is a test bench for talking to MIG via AXI, and another one (sim_2) is (almost) the same client code, but instead of MIG I have an AXI Verification IP instance set up as an AXI slave memory. Simulation with the latter takes just a few seconds, while AXI interface is the same. And as a bonus, AXI VIP also performs AXI protocol compliance checks and will report an error if something is wrong with AXI signalling.
Also, I finally received a response from MPS about MPM3683-7 layout, and they basically said to do as their eval board is doing. So that settles that question.

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #182 on: February 06, 2023, 04:11:53 pm »
Sorry about delays, some urgent stuff came up that I had to deal with.

No need to apologise - I haven't had a day off in nearly two weeks it's been so busy here, so zero progress made on the project at all. :'(

Change of role at work coming up in next week, so not likely to be making much headway for a while.  What time I do have, I'll be trying to get my head around the Vivado simulation of the SODIMM and get more confidence using it in general.  From scanning (literally) the conversation up to this point, I should be looking to create an AXI4 interface to BrianHG's multiport interface?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #183 on: February 06, 2023, 04:29:42 pm »
No need to apologise - I haven't had a day off in nearly two weeks it's been so busy here, so zero progress made on the project at all. :'(
That's OK - life takes a priority.
I've been also quite busy lately, but since I work from home most of the time, I've been using whatever breaks in work I had to draw up schematics in Altium. They are still far from being completed, but I will get there eventually.

Change of role at work coming up in next week, so not likely to be making much headway for a while.
Good luck in your new role, you will need the money to make this project a reality ;)

What time I do have, I'll be trying to get my head around the Vivado simulation of the SODIMM and get more confidence using it in general.  From scanning (literally) the conversation up to this point, I should be looking to create an AXI4 interface to BrianHG's multiport interface?
I would think so. But in any case now you have both versions, and they aren't all too different from each other to be honest as far as interface goes.

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #184 on: February 07, 2023, 04:42:06 am »
I've been reading a datasheet for that memory module I bought (MT16KTF1G64HZ-1G6E1, which uses revision E dies), and found that it can consume up to a bit over 2 Amps of current during refresh (this is the mode in which it consumes the most)! For comparison, that same module with revision N dies consumes ~1.5 Amps of current in that mode, and the one with revision P dies (this is the most recent revision) takes up only 1.3 Amps! What a difference between die revisions!
In addition to that, the module can also sink or source up to 0.6 Amps of current via termination rail (VTT), which will ultimately come from the same VDDR rail (via DDRx termination regulator). With that 3 Amps DC-DC converter we've chosen is just about enough to provide all of that current. I never really thought about this until today, because the most I used was a pair of 4G x16 chips, which consume like 240 mA MAX each, and so the choice of a converter for that rail has never been something I needed to think about - I typically used the same converter that is used elsewhere in a design (like in a Vccio rail) to help consolidate BOM.
In case you are curious, you can get a datasheet for the module here: https://www.micron.com/products/dram-modules/sodimm/part-catalog/mt16ktf1g64hz-1g9 There is also a printout of SPD contents in case you want to know what's stored there.

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #185 on: February 07, 2023, 11:00:24 am »
I've been reading a datasheet for that memory module I bought (MT16KTF1G64HZ-1G6E1, which uses revision E dies), and found that it can consume up to a bit over 2 Amps of current during refresh (this is the mode in which it consumes the most)! For comparison, that same module with revision N dies consumes ~1.5 Amps of current in that mode, and the one with revision P dies (this is the most recent revision) takes up only 1.3 Amps! What a difference between die revisions!
In addition to that, the module can also sink or source up to 0.6 Amps of current via termination rail (VTT), which will ultimately come from the same VDDR rail (via DDRx termination regulator). With that 3 Amps DC-DC converter we've chosen is just about enough to provide all of that current. I never really thought about this until today, because the most I used was a pair of 4G x16 chips, which consume like 240 mA MAX each, and so the choice of a converter for that rail has never been something I needed to think about - I typically used the same converter that is used elsewhere in a design (like in a Vccio rail) to help consolidate BOM.
In case you are curious, you can get a datasheet for the module here: https://www.micron.com/products/dram-modules/sodimm/part-catalog/mt16ktf1g64hz-1g9 There is also a printout of SPD contents in case you want to know what's stored there.

I have the 1G6E1 variant; I didn't think we'd need to be running the RAM at 1866MHz, thought 1600 would be more than enough. :-DD

I've scanned the datasheet - peak current draw doesn't seem to drop off too much with lower frequencies unfortunately, but I guess a 3A supply provides a little margin - I'll have to be careful to remember this when I'm routing the power supplies.

Had 10 minutes to look at the simulation project (not the latest version with AXI, admittedly), and it looks like mig_tb.sv is the file that generates the signals to the SODIMM in the simulation?  Did you write that yourself or was it generated?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #186 on: February 07, 2023, 02:05:35 pm »
I've scanned the datasheet - peak current draw doesn't seem to drop off too much with lower frequencies unfortunately, but I guess a 3A supply provides a little margin - I'll have to be careful to remember this when I'm routing the power supplies.
We will have to ensure good decoupling to make sure the rail voltage does not dip during such current spikes.

Had 10 minutes to look at the simulation project (not the latest version with AXI, admittedly), and it looks like mig_tb.sv is the file that generates the signals to the SODIMM in the simulation?  Did you write that yourself or was it generated?
That's the testbench I've writted to demonstrate how to send commands to MIG. MIG generates an example design as well, but I think it's a bit too complex because it's primary goal is not to demonstrate how to work with MIG, but rather to provide a platform for a hardware checkout (that example design includes a traffic generator and a validation logic to check for memory errors).

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #187 on: February 07, 2023, 07:52:54 pm »
Quick question regarding SODIMM power supply:

VTT, VREFCA and VREFDQ all reference VDD/2.   What's the best way to create this 0.75V supply?  Do I need to add another discrete supply, or can I use a voltage divider from the 1.5V rail? ???
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #188 on: February 07, 2023, 08:27:46 pm »
Quick question regarding SODIMM power supply:

VTT, VREFCA and VREFDQ all reference VDD/2.   What's the best way to create this 0.75V supply?  Do I need to add another discrete supply, or can I use a voltage divider from the 1.5V rail? ???
You will need to use a tracking regulator (such that VTT and VREF would track closely VDDR rail), often called DDRx termination regulator, something like MP20075, TPS51206 or TPS51200, you connect VREFCA and VREFDQ to VTTREF output of those regulators, and VTT to main (VTT) output of those regs - see attached screenshot from Xilinx's AC701 devboard schematics for reference. You will also need a bunch of decoupling caps, again, you can refer to my second screenshot from the same schematics. And before you ask - CPx devices on the second screenshot are four 0.1 uF caps in a single physical package.
« Last Edit: February 07, 2023, 08:29:32 pm by asmi »
 
The following users thanked this post: nockieboy

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #189 on: February 09, 2023, 05:45:18 pm »
Have been thinking about the power supplies today whilst working.  I was originally going to power the board via the USB connector, with the option of a DC jack providing the juice if the design (whatever it may be) gets too hungry.  I guess a diode on the USB 5V rail will prevent power going back up the programming/serial USB cable, but there's probably better solutions that don't have such a voltage drop.  Ideally I want something that will automatically switch between the USB input and a 5V rail generated from the DC jack, if it's plugged in.  That DC jack needs to also generate a 12V rail - if someone wants to use (full) PCIE, they'll need to supply some extra juice via the DC input.  So something like this:



The blue boxes are unknowns at this time.  The smaller blue box I guess will be a power switch of some kind (LTC4412?), cutting the USB 5V supply off if there's a feed from the DC jack.

I've had a quick look on Mouser for suitable power ICs, it seems switching voltage regulators offer the best performance specs to my inexperienced eye, but they aren't cheap for a decent 12V one.  Dual output would allow me to get 12V and 5V rails from a single IC, but it's going to require some meaty supporting components.  Any suggestions or ideas?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #190 on: February 09, 2023, 05:51:02 pm »
Forget about USB - it can't provide enough power. Here is how I'm thinking to implement the power (note that DDR3 termination regulator is not there yet):


Vccint rail is going to be powered by MPM3683-7, which is powered by 12 V input, other rails will be powered by 5 V created by MP8772. I will wire up enable pins and "power good" outputs such that 12 V -> 5 V converter is going to start first, and once it's output is stabilized, it's "power good" output will allow all other rails to start up. The reason I've chosen MP8772 is because I happen to have them in my stock, so I won't have to buy them.
« Last Edit: February 09, 2023, 05:58:01 pm by asmi »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #191 on: February 09, 2023, 10:11:28 pm »
Yeah, I've got no issue creating the <12V rails - it's the 12V rail itself I'm thinking about.  Where are you getting that from?  I was considering using a DC jack which would take a 'wall-wart' input, or a laptop power supply, for example.  Can't guarantee it'll be 12V, probably more like 12-18V, so some form of regulator is going to be required.



NOTE:  The above schematic is NOT finished and there are a couple of misnamed nets (power good signals, specifically) and it's missing the DC INPUT -> 12V output regulator.
« Last Edit: February 09, 2023, 10:17:03 pm by nockieboy »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #192 on: February 10, 2023, 12:54:26 am »
Yeah, I've got no issue creating the <12V rails - it's the 12V rail itself I'm thinking about.  Where are you getting that from?  I was considering using a DC jack which would take a 'wall-wart' input, or a laptop power supply, for example.  Can't guarantee it'll be 12V, probably more like 12-18V, so some form of regulator is going to be required.

You can get a power supply which provides more-or-less precise 12 V power. I happen to have this one, but there are plenty of other supplies on a market which will provide for a good regulation. As for PCI Express power, +12V rail allows ±8% voltage tolerance (which means anything from ~11 to ~13 V is going to be within spec), and as per spec, an x4 card is allowed to draw up to 2.1 Amps from that rail (for total power from that rail of 25 W nominal), which is enough to power the entire devboard (it will probably consume around 15 W - 2-3 W for the RAM, 5-7 W for the FPGA itself, and another 5 watts for other random bits and bobs on a board). So that if you connect your power input directly to the +12 V rail of PCIE and use one of those jumper cable, you can power the second one directly from PCIE slot (provided that a power supply you use can output enough current for both boards).

PCI Express power specification gives quite a bit of freedom for addon cards as to how they can power themselves, but this creates some headaches for the motherboard/host designers (and it kind of makes sense if you think about it - as there are much more addon cards then there are hosts, so addons are more price-sensitive). For example, PCI Express Electromechanical Specification Revision 2.0 (that's the most recent revision I have access to) requires all hosts to provide a 3.3 V ± 9% rail with 3 Amps of current, and also a 12 V ± 8% rail with 2.1 Amps of current (for x4 or x8 PCIE slot), but an x4/x8 addon card can only draw up to 25 W of power from these rails, so some cards will draw power from 3.3 V rail only, others from 12 V rail only, yet others some combination of both. Which means a compliant implementation of a PCIE x4 slot will require another converter just for 3.3 V PCIE rail, or beefing up the one we have for 3.3 V rail so that it can feed all peripherals on a board AND provide enough power for the PCIE slot. Of course there is also an option to just connect PCIE 3.3V rail to the existing converter and pray that no addon card connected to that board will ever consume a significant current off that rail, which is what some chinese boards do (for example MYD-C7Z015 which I have), but I don't think that's the right approach. But I didn't get to PCIE slot yet, so I didn't think about that part of design.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8139
  • Country: ca
    • LinkedIn
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #193 on: February 10, 2023, 01:11:51 am »
Use a USB-C PD power supply with a power delivery profile setting #3 which offers 12v, 3 amp.

Now you can use any 36 watt capable USB wall wart or battery bank.

Even profile #2 may work, 12v 1.5amp, 18 watts, but you probably wont have any breathing room.
« Last Edit: February 10, 2023, 01:28:24 am by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #194 on: February 10, 2023, 01:30:21 am »
Use a USB-C PD power supply with a power delivery profile setting #3 which offers 12v, 3 amp.

Now you can use any 36 watt capable USB wall wart or battery bank.

Even profile #2 may work, 12v 1.5amp, 18 watts, but you probably wont have any breathing room.
You will need a PD controller for that to work, not worth it for this board IMHO.

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #195 on: February 10, 2023, 10:33:39 am »
You can get a power supply which provides more-or-less precise 12 V power.

Yeah, I was just thinking of making the board as flexible as possible.  I guess a 12V supply isn't too big a hurdle for people to jump if they want to use this board.

I happen to have this one, but there are plenty of other supplies on a market which will provide for a good regulation.

This is why I was thinking of flexibility - there's no guarantees someone will plug a good quality supply in, or even one in spec.  Maybe I'm worrying too much about that - it takes a pretty big regulator/inductor off the board by having the 12V rail supplied externally.  I had a look for something a little cheaper than what you've got and found this - T5994ST on Mouser for a little over £12.  Only 5A / 60W, but that should be more than sufficient if you're saying the board will need around 25W.

As for PCI Express power, +12V rail allows ±8% voltage tolerance (which means anything from ~11 to ~13 V is going to be within spec), and as per spec, an x4 card is allowed to draw up to 2.1 Amps from that rail (for total power from that rail of 25 W nominal), which is enough to power the entire devboard (it will probably consume around 15 W - 2-3 W for the RAM, 5-7 W for the FPGA itself, and another 5 watts for other random bits and bobs on a board). So that if you connect your power input directly to the +12 V rail of PCIE and use one of those jumper cable, you can power the second one directly from PCIE slot (provided that a power supply you use can output enough current for both boards).

Jumper cable / PCIE edge connector.  I could make it so the board has a PCIE slave connector on a board edge for some real bizarre board-ception. :o

It's a shame those MPM3833's input voltages max out at 6V, otherwise I'd use the 12V rail to directly power all the switching regulators and could get rid of the 5V rail entirely.   What if I replace the MPM3833's with these MPM3632s instead?  MPM3632 datasheet.  I can just run them straight from the 12V rail, get rid of the 5V regulator and they're slightly cheaper than the MPM3833s they're replacing, unless I've missed something obvious (or not so obvious) in the part selection?

PCI Express power specification gives quite a bit of freedom for addon cards as to how they can power themselves, but this creates some headaches for the motherboard/host designers (and it kind of makes sense if you think about it - as there are much more addon cards then there are hosts, so addons are more price-sensitive). For example, PCI Express Electromechanical Specification Revision 2.0 (that's the most recent revision I have access to) requires all hosts to provide a 3.3 V ± 9% rail with 3 Amps of current, and also a 12 V ± 8% rail with 2.1 Amps of current (for x4 or x8 PCIE slot), but an x4/x8 addon card can only draw up to 25 W of power from these rails, so some cards will draw power from 3.3 V rail only, others from 12 V rail only, yet others some combination of both. Which means a compliant implementation of a PCIE x4 slot will require another converter just for 3.3 V PCIE rail, or beefing up the one we have for 3.3 V rail so that it can feed all peripherals on a board AND provide enough power for the PCIE slot. Of course there is also an option to just connect PCIE 3.3V rail to the existing converter and pray that no addon card connected to that board will ever consume a significant current off that rail, which is what some chinese boards do (for example MYD-C7Z015 which I have), but I don't think that's the right approach. But I didn't get to PCIE slot yet, so I didn't think about that part of design.

Okay, so I had a look for 3.3V/6A+ regulators (MPM ones, anyway, as I like the non-inductor solution for saving space, if not cost).  I'd be looking to use one costing nearly four-times the MPM3632 for a combined 3.3V rail that will supply all the peripherals, including PCIE.  I think the logical solution is to add another MPM3632 (if they're an appropriate replacement for the MPM3833s) for a couple of GBP.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #196 on: February 10, 2023, 05:21:32 pm »
Yeah, I was just thinking of making the board as flexible as possible.  I guess a 12V supply isn't too big a hurdle for people to jump if they want to use this board.

This is why I was thinking of flexibility - there's no guarantees someone will plug a good quality supply in, or even one in spec.  Maybe I'm worrying too much about that - it takes a pretty big regulator/inductor off the board by having the 12V rail supplied externally.  I had a look for something a little cheaper than what you've got and found this - T5994ST on Mouser for a little over £12.  Only 5A / 60W, but that should be more than sufficient if you're saying the board will need around 25W.
If someone wants to power a $400+ board with a crappy $5 wallwart from the nearest dumpsurplus store - more power to them, but I personally think it's a bad idea, which is why I presumed that a person who buys/makes such a board can afford to spend some extra dosh for a quality power supply.

Jumper cable / PCIE edge connector.  I could make it so the board has a PCIE slave connector on a board edge for some real bizarre board-ception. :o
Having both PCIE slot and an edge connector will require using a high-speed switch for PCIE lanes and a switch for the reference clock line (because PCIE addon cards are supposed to use a clock provided by the host), and I think that's not that great of idea as PCIE addon design has some limitations on a form factor as well as on connector placement, which is why I would prefer to design a separate PCB with an edge connector. Incidentally PCIE connector has an optional JTAG lines, which - if connected properly - can allow programming both host and an addon at the same time using a single JTAG connection (this is called a JTAG chain). AC701 devboard has such connection for a FMC connector with an switch which is tripped automatically when something is connected to that connector, so we can use the same idea for a PCIE port. That will require an addon card designed specifically to support such scenario - as JTAG from FPGA will need to be wired to an edge connector, but I like that idea nonetheless.

It's a shame those MPM3833's input voltages max out at 6V, otherwise I'd use the 12V rail to directly power all the switching regulators and could get rid of the 5V rail entirely.   What if I replace the MPM3833's with these MPM3632s instead?  MPM3632 datasheet.  I can just run them straight from the 12V rail, get rid of the 5V regulator and they're slightly cheaper than the MPM3833s they're replacing, unless I've missed something obvious (or not so obvious) in the part selection?
That's a good point and a good suggestion. I think MPM3632C would be a better idea than MPM3833, just make sure you pick a MPM3632C and not MPM3632S as these are different parts in different packages absolutely not compatible with each other. I was under impression that you will need a 5V rail anyway to power your existing Z80 sandwitch, but if it's not required, than we can eliminate that rail.

Okay, so I had a look for 3.3V/6A+ regulators (MPM ones, anyway, as I like the non-inductor solution for saving space, if not cost).  I'd be looking to use one costing nearly four-times the MPM3632 for a combined 3.3V rail that will supply all the peripherals, including PCIE.  I think the logical solution is to add another MPM3632 (if they're an appropriate replacement for the MPM3833s) for a couple of GBP.
That's what I'm thinking too. I would also connect 12V and 3.3V pins of a PCIE connector through fat (something like 0805) zero ohm resistors to allow disconnecting those rails in case it's required for a connected card (thinking about scenario of a PCIE jumper cable which would connect 3.3V PCIE regulator on one board with the same regulator on another one which will cause problems, or if you want to power a second board with a separate power supply - for example if the one you use isn't powerful enough to power both).

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #197 on: February 10, 2023, 08:07:26 pm »
Having both PCIE slot and an edge connector will require using a high-speed switch for PCIE lanes and a switch for the reference clock line (because PCIE addon cards are supposed to use a clock provided by the host), and I think that's not that great of idea as PCIE addon design has some limitations on a form factor as well as on connector placement, which is why I would prefer to design a separate PCB with an edge connector. Incidentally PCIE connector has an optional JTAG lines, which - if connected properly - can allow programming both host and an addon at the same time using a single JTAG connection (this is called a JTAG chain). AC701 devboard has such connection for a FMC connector with an switch which is tripped automatically when something is connected to that connector, so we can use the same idea for a PCIE port. That will require an addon card designed specifically to support such scenario - as JTAG from FPGA will need to be wired to an edge connector, but I like that idea nonetheless.

Yes, I've noticed the JTAG lines and wondered about the possibilities there.  So they'd need to be connected to the FPGA following the rules for daisy-chained JTAG devices (TDI being the chained link, everything else is a parallel bus.)

I've been looking at the AC701 design and intend to use a 74LV541A buffer to (presumably) prevent the stubbing issue you've mentioned previously if there's going to be more than two endpoints on the JTAG bus.

I like the automatic switch idea for connecting/disconnecting the PCIE JTAG to the board's FPGA programming bus - is there a purpose-built IC I can use for that task, or is it a case of OR-ing the PRSNT#2's together into a transistor to short TDI/TDO across the PCIE connector?

Speaking of which, I'm not 100% sure how the PRSNT lines are used by the PCIE host; does the host have a weak pullup on PRSNT#1 and the remaining PRSNT#2s are connected to IOs so the FPGA can detect which one is high and determine if a card is connected and, if so, whether it's a x1 or x4 card, or does something else go on there?

I was under impression that you will need a 5V rail anyway to power your existing Z80 sandwitch, but if it's not required, than we can eliminate that rail.

Originally yes, but now I've decided to ditch the old uCOM stack and go all-in for making this board a full soft-core CPU system, I don't see the need for a 5V rail any more.  100k LEs should be enough to emulate most systems people would want to run, including up to Linux-running 32-bit systems.  It shouldn't take much effort for me to get a Z80 core running and emulate the ROM.

In fact, ROM is going to be something I need to think about.  There needs to be some form of permanent storage on the board for ROM software/data.  i.e., my uCOM boots from ROM (as do most computers, I suspect), so I'd need space on an EEPROM or other form of storage (I'm open to suggestions) to hold this data and allow the soft-core CPU to boot up without having to rely on using the FPGA's internal memory.  Should be simple enough to connect a FRAM or serial EEPROM to the FPGA and map its memory into wherever it would need to go for the soft-core CPU?  Would it be possible to use spare room on the FPGA's config flash chip for this?

That's what I'm thinking too. I would also connect 12V and 3.3V pins of a PCIE connector through fat (something like 0805) zero ohm resistors to allow disconnecting those rails in case it's required for a connected card (thinking about scenario of a PCIE jumper cable which would connect 3.3V PCIE regulator on one board with the same regulator on another one which will cause problems, or if you want to power a second board with a separate power supply - for example if the one you use isn't powerful enough to power both).

Good point.  What about instead of 0805 links, would DIP switches be okay/suitable?  Would make it all much more easily configurable.
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #198 on: February 10, 2023, 08:43:00 pm »
Something else just sprang to mind for a peripheral - as well as the 10/100/1000 Ethernet interface, what about wifi?  What would be the best way to implement that?  I guess a wifi module, like the ESP32, would work but is there a way to do it more simply, without the middle-man microcontroller?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2839
  • Country: ca
Re: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.
« Reply #199 on: February 10, 2023, 09:15:06 pm »
I like the automatic switch idea for connecting/disconnecting the PCIE JTAG to the board's FPGA programming bus - is there a purpose-built IC I can use for that task, or is it a case of OR-ing the PRSNT#2's together into a transistor to short TDI/TDO across the PCIE connector?
Come to think about it, it might not be the best idea to do the switch automatically because if you insert a card which doesn't have JTAG pins connected at all (which is what most consumer addon cards do), you won't be able to program FPGA at all as the JTAG chain will not be completed. So some kind of jumper or manual switch might be a better idea.

Speaking of which, I'm not 100% sure how the PRSNT lines are used by the PCIE host; does the host have a weak pullup on PRSNT#1 and the remaining PRSNT#2s are connected to IOs so the FPGA can detect which one is high and determine if a card is connected and, if so, whether it's a x1 or x4 card, or does something else go on there?
Here is a diagram from the spec:


Basically an add-in card is required to connect PRSNT1# to the farthest PRSNT2# pin it's got (there are multiple of them depending on a number of links). That stuff is only really required if you want to implement a hotplug, but in that case you will need to add power switches to power lines and only enable power once you detect that PRSNT1# and PRSNT2# are shorted, these pins are supposed to be shorter on the edge connector so that they will be the last to mate and first to unmate. This prevents arcs, sparks and other issues which can occur as power pins are being mated/unmated.

In fact, ROM is going to be something I need to think about.  There needs to be some form of permanent storage on the board for ROM software/data.  i.e., my uCOM boots from ROM (as do most computers, I suspect), so I'd need space on an EEPROM or other form of storage (I'm open to suggestions) to hold this data and allow the soft-core CPU to boot up without having to rely on using the FPGA's internal memory.  Should be simple enough to connect a FRAM or serial EEPROM to the FPGA and map its memory into wherever it would need to go for the soft-core CPU?  Would it be possible to use spare room on the FPGA's config flash chip for this?
You can use whatever leftover space in a QSPI flash for that purpose. That's actually how a lot of my designs worked - bitstream includes a small bootloader which resides inside BRAM which would copy the main application code from QSPI into RAM and then launches it.

Good point.  What about instead of 0805 links, would DIP switches be okay/suitable?  Would make it all much more easily configurable.
You will need one hell of a switch as it will need to be rated for at least 3 Amps. Jumpers are much easier IMHO. Especially since I don't expect that you will need to flip it often.
 
The following users thanked this post: nockieboy


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf