Author Topic: ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs) (Read 4100 times)

axero · « **on:** October 28, 2013, 04:54:17 pm »

Background
Devices that use a lot more power than the PCIe slot can supply take the power directly from the PSU. PCI devices are controlled by power states D0-D3 where D0 is fully on and D3 totally cuts the power from the slot.

Devices connected to the PSU may not turn off in state D3 because of the power supply voltage. So the intention of this design (from which I'd like your input) is to design a (milliseconds fast) breaker circuitry that can break up to 25 Amps of current from the 6 and 8 pin ATX terminals. Each terminal in an ATX connector is rated for up to 6.2 Amps.

The purpose of this project
Ultimately, this circuitry is intended to reset hardware for virtualization purposes (PCI passthrough). We want to be able to reset hardware with auxiliary power as mentioned above by doing a D3D0 transition via the ACPI framework.

The concepts
According to the pci-sig specification, the power supply can be controlled by a number of various power states. The states are as follows:

* B0...B3 power state of the system bus/bridge
* D0...D3 power states of a PCI device (subset of system bus)
* L0...L3 power states of the PCI Express link (subset of PCI device)

These states are summarized from the following sources:
http://www.intel.com/content/www/us/en/io/pci-express/pci-express-architecture-power-management-rev-1-1-paper.html
http://www.pcisig.com/specifications/conventional/pcipm1.2.pdf

So the different reset types that the hypervisor is trying to trig comes from altering the above mentioned states. The reset procedure is stated as follows for Xen (quoted from Ian Campbell at Citrix):

With a pvops dom0 Xen resets devices by writing to its "reset" node in
sysfs so it will reset the device using whatever method the dom0 kernel
supports for that device.

The version of Linux I have to hand has, in __pci_dev_reset, calls to
the following in this order and stops after the first one which
succeeds:
* pci_dev_specific_reset (AKA per device quirks)
* pcie_flr
* pci_af_flr
* pci_pm_reset
* pci_parent_bus_reset

(end quote)

These reset functions can be found in drivers/pci/pci.c in the kernel. So if flr (aka Function Level Reset or FLReset, a reset invoked by a software function) is not present it moves on to invoke the function "pci_pm_reset" (I take the pm to stand for "power management"). A look into the source code at for example:

http://lxr.free-electrons.com/source/drivers/pci/pci.c#L3188

reveals the characteristics of this reset function. Lines 3188-3216 contain the code for the function pci_pm_reset. The comment above says that the duration of this reset is 10ms but is configurable for longer durations. If we decide to settle for this duration we then must have a relay or switch circuitry that can handle such short durations.

So if we want to be able to break the auxiliary power to the GPU for the same duration as the power state D0 is in effect we need to use a breaker that is faster than 10 milliseconds. On the other hand the breaker should be able to handle quite a bit of amperage. In a power supply "reference" article at Toms Hardware, we have the information we need:

http://www.tomshardware.com/reviews/power-supply-specifications-atx-reference,3061-8.html

As you probably know (directed at gordan), the yellow terminals of the 6- or 8-pin connectors are +12V whereas the black terminals are ground. According to the article above, each yellow terminal is rated for 75W which means that it is rated to handle up to 6.2 Amps each.

If we can design a PMOS-based breaker circuit that can handle the high amps at below 10 ms then we're probably good to go. If you have a lot of auxiliary connectors, building a breaker circuit for each terminal (up to 6 terminals / two 6-pin ATX connectors may be present for the most power hungry GPUs) may lead to a lot of circuits.

So another question that rises is if we really need to have separate circuits or if we can merge the yellow terminals into one. That probably depends on whether each of these yellow terminals are on separate rails of the power supply. I'm not sure whether it is a good thing to merge the rails like that. It would be good to have someone more versed into electronics answer this. I think I'll ask that question in a separate thread. If it is bad then the yellow terminals can be put behind power diodes (that can handle up to 6.2 Amps or more).

The following weblink gives good suggestions as to how to design such a circuit:

http://electronics.stackexchange.com/questions/41729/why-is-this-mosfets-pullup-resistor-necessary

If it turns out that only one circuit is necessary, it doesn't hurt to have another one for redundancy in case one would fail.

Then the next step is to decide upon from which source this circuit should be controlled. The PCIe pinout reveals that pins 2 and 3 on both sides supply +12V of power so in state D3, these pins should have 0V (it would be good if someone could verify this). This should be the gate voltage of the breaker circuit above.

My questions (TL;DR)
* Is it ok, to merge the the output terminals from several DC power supplies with the same voltage, or should one use diodes/rectifier circuits to separate them? In this case the power supplies are possibly separate rails in a single ATX PSU. Try to measure the resistance between the +12V terminals of the 6 or 8 pin ATX connecctors of the graphics card, if they are shorted then this would be no problem.
* Can a MOSFET circuit handle the high AMPS at 10ms respons time without voltage drop that disturbs the GPU? A voltage drop as low as 0.1V should be achievable and accepted by the voltage controllers inside the graphics card.
* Is 10 milliseconds really a long enough duration to reset the GPU for virtualization when it also takes power from the auxiliary input? The capacitors and coils of the voltage controller circuitry might need more time to discharge.

axero · « **Reply #1 on:** October 29, 2013, 12:17:45 pm »

I'm pretty sure that if you cut all power (including the auxiliary ATX power) to the graphics adapter for a long enough duration it will trig a reset of the GPU.

I find it reasonable to believe that the GPU draws power from whatever source that is available. So if there's an auxiliary input, cutting the power to the PCIe will likely have no effect, or at most force the graphics adapter into a 'hibernation mode' without actually turning the GPU off or resetting it. That is, at least on GPUs that are known to be troublesome for PCI passthrough.

I'm not sure how the power states L3, D3, B3 affect the power supply pins at pin 1-11, whether these states turn the voltage completely off on all of them or if there is a 1-4 V drifting voltage on some of them. If they don't get shorted to the ground, that would affect the requirements of a MOSFET-based switch. The breaking circuit must 'understand' the difference between 'on' and 'off' on the 'steering' voltage (or D0 and D3 if you will or L0 and L3 or B0 and B3).

Also higher ampere power supplies tend to have higher capacitances and inductances which could maintain supply voltage at low load (which is typical during startup of a VM) even if the input voltage is cut intermittently. So a voltage cut from the auxiliary input with a 10 ms duration may not be long enough.

Edit:

Another option would be that instead of monitoring the pin 1-11 voltages of the PCIe slot, the auxiliary power switch could be controlled by a controller circuitry connected through for example a USB controller. Examples of such solutions are mentioned here:

http://stackoverflow.com/questions/3246077/controlling-simple-relay-switch-via-usb

So by invoking a software function that switches the auxiliary power off in the Dom0 a few seconds prior to the invocation of the __pci_dev_reset (and then invoking a software function that turns it back on of course) by the hypervisor right after priming it for PCI passthrough might yield the same effect. This might even be a better solution as it won't potentially mess up the power management routines in the VM.

Ton · « **Reply #2 on:** October 29, 2013, 09:16:43 pm »

atx powersupplies uses the same powerrail to feed all the same voltage pins in a connector so you can safely assume that the 2 or 4 yellow wires comes from the 12v rail.
so one pmosfet for all the wires would be fine, you should take great care when switching capacitive loads with a fast MOSFET, the inrush current or initial charge current can be very large (100s of amp), and PCIe based Gpu's are sure to have a few 100 or 1000's of uf worth of capacitance on the 12v rail.

to keep your p-MOSFET safe you most probably need to turn it on again in a 'slow' and controlled way to keep the switch current manageable (read to prevent the smoke from escaping )

for more detail read this app note: www.onsemi.com/pub/Collateral/AND9093-D.PDF?

NiHaoMike · « **Reply #3 on:** October 29, 2013, 09:24:40 pm »

Some multi rail PSUs (some of the larger ones) are true multi rail designs.

I remember that the PCIe slot has a reset signal, so try just tapping into that.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs) (Read 4100 times)

axero

ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs)

axero

Re: ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs)

Ton

Re: ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs)

NiHaoMike

Re: ACPI Power saving circuitry for 150+ W PCI devices (ie GPUs)

Share me