Author Topic: Yet another GPU repair - dead 1080 Ti (and a ME at the end of my abilities)  (Read 7097 times)

0 Members and 1 Guest are viewing this topic.

Offline yjgfiklTopic starter

  • Newbie
  • Posts: 5
  • Country: us
Hi smart people of EEVBlog, I've reached the point where I'm stumped by what to do next here. I accidentally purchased a dead GPU off eBay (I mixed up the listings) and figured I could at least use it as a learning opportunity and attempt to repair it. I'm not an electrical engineer by any means and as a mechanical guy, only have a basic knowledge of electrical engineering principles.

So a quick rundown: When slotted into a motherboard the card does power on and the fan spins, but there is no video output. If left running for a little while (~1-2 min), the card will get rather hot as the fan spins up and the heat exhausted out the back is pretty substantial. This is a Dell OEM version of an Nvidia GTX 1080 Ti, manufactured by MSI (very similar to the MSI 1080 Ti aero, but without branding). So it's close to the Nvidia reference / FE design, but probably with cheaper components hence the failure.

Here is a hi-res link of the front of the Founder's Edition PCB:
https://cdn.wccftech.com/wp-content/uploads/2017/03/NVIDIA-GeForce-GTX-1080-Ti-Founders-Edition-PCB_Front.jpg

Here's a link to the back side:
https://cdn.wccftech.com/wp-content/uploads/2017/03/NVIDIA-GeForce-GTX-1080-Ti-Founders-Edition-PCB_Back.jpg

I've probed around a little bit and found a few things, and I'll start with the "simpler" one on the back side. Here's an album of photos that I can eventually add to, as requested: https://imgur.com/a/MfsNS86

Q563 appears to have a short between Base and Collector (1st pic). I had it de-soldered at work and found unfortunately that even without it installed, those 2 pads are shorted together so not sure what's going on there or if it's intentional. I of course don't have a schematic for the GPU, so not much I can do there. I've since had a replacement soldered back on. Datasheet here: https://my.centralsemi.com/datasheets/CMPT2222A.PDF

Not sure if there are any surrounding components worth investigating there, or components on the opposite side of the board that may be part of the same circuit.
According to this video: the side opposite Q563 appears to include the VRMs responsible for the GPU PLL management, and 1.8V supply to the vBIOS.

The 2nd (and probably bigger issue) is an apparent heat (probable short) issue with one of the GPU power delivery phases. I used a thermal camera and found that upon powering up, the L6 inductor, Q26 and Q30 mosfets will heat up significantly faster/hotter than surrounding components and the other 6 phases. Additionally, capacitors C190 and C191 run hotter than similar capacitors nearby (2nd pic). All these capacitors however measure OL then switch to ever-increasing resistance values as they charge up, which I understand is normal behavior.

The offending MOSFETs are printed as ON DJ27BZ (pic #3) but those appear to not exist anywhere, while the Founder's edition cards appear to use Fairchild FDCP8016S FETs. Turns out ON makes that PN and the datasheet is identical to Fairchild's, provided here: https://datasheetspdf.com/pdf-file/1090214/FairchildSemiconductor/FDPC8016S/1 and here for ON: https://www.onsemi.com/pub/Collateral/FDPC8016S-D.PDF

The TI 53603A VRM controller (position U11) does not appear to be an offender, or at least it doesn't appear to heat up abnormally on startup. I've not been able to identify the inductors labeled as R22, but they all measure ~0.2Ohm with my DMM which probably doesn't go down to whatever value they actually measure.

My assumption is that this Q26 is the offending part since it runs the hottest, but I'm at a loss on how to actually test it and/or replace it. What's a possible common failure for these chips?

On all these DJ27BZ FETs on the board, the GR(2)/SW(5-7) have continuity with each other and PCB ground. Looking at the schematic this seem normal? But this is where my knowledge ends so I'm hoping you fine people can shed some light on this and recommend further troubleshooting.

I've attached some general photos of my board, for your reference.

Thanks so much in advance for the help! Test equipment I readily have access to at home are a Fluke 114 and a Fluke 279FC, but if needed I can use current/voltage limited lab power supplies and other measurement devices at work.


« Last Edit: December 20, 2020, 06:47:45 am by yjgfikl »
 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Well... 

first a necessary safety briefing...
- Such diagnostic of high power rails is not trivial
- you **MAY** easily damage you motherboard without precautions
- I always power the HIGH POWER RAIL of GPU via BENCH PSU
- the bench PSU requires CV/CA with ratings of 30A under stable 12V
- protections need to match the current testing gig

Poking around without a minimal idea will not help


You may need a schema or a board view - both in ideal terms

I have a fairly close board view for another GTX1080 in which
Q564 should match closely yours  BJT...

Such fault requires an extensive OHM probing of all MOSFETS
- first the low sides
- then the high sides..
- after that all REGULATORS (which contains MOSFETS) should
be ohm probed for INPUT OUTPUT shorts..

The schema is really handy although some experienced folks
can achieve results without it..

Experience is required at some level to conclusive results

Paul
 
The following users thanked this post: yjgfikl

Offline yjgfiklTopic starter

  • Newbie
  • Posts: 5
  • Country: us
Hi Paul, thanks so much for the response.

So far I haven't actually probed the card with any power to it as I'm a little nervous to do so, everything I've measured/checked so far has just been with the card on the bench without any sort of power. I only powered it up to get the thermal images, and just used a standard ATX power supply to give the GPU power via the PCIe slot (didn't connect the 6 and 8 pin supplemental power connectors.

I think the 1080 Ti board is based on the reference Titan design (also a GP102 chip) as opposed to the GP104 used on the GTX 1080. Still, it could serve as a good reference if the board designs are similar. Unfortunately it seems impossible to find board schematics online, I imagine those are pretty locked down for IP reasons.

Do you have some insight as to what I'd be looking for by ohm probing the MOSFETs? When you say High and Low sides, are you referring to pins 1 (HSG) and 8 (LSG) on the DJ27BZ dualFETs? If so, am I just probing between the two and looking for some resistance value (that hopefully is in the datasheet)? My confusion lies with knowing that whatever I probe isn't affected by the component being installed in the circuit(s).

Thanks again for the response!

 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Hi Paul, thanks so much for the response.

So far I haven't actually probed the card with any power to it as I'm a little nervous to do so, everything I've measured/checked so far has just been with the card on the bench without any sort of power. I only powered it up to get the thermal images, and just used a standard ATX power supply to give the GPU power via the PCIe slot (didn't connect the 6 and 8 pin supplemental power connectors.

This is the best thing to do while not being 100% sure there
is any obvious dead short. Caution applied very well

Do not power such piece of power without a proper sanity check


I think the 1080 Ti board is based on the reference Titan design (also a GP102 chip) as opposed to the GP104 used on the GTX 1080. Still, it could serve as a good reference if the board designs are similar. Unfortunately it seems impossible to find board schematics online, I imagine those are pretty locked down for IP reasons.

Not impossible but very hard to do so without manufacturer willing
to do so.  You see ...  they want to release a "NEW" board every 15 days
or so with "NEW"  specs new hardware new software ...

Sad truth is that boards are very likely the same.. with crippled
"features" and firmware they manage to play when cash flow needed

So every 15 days a new "graphic thing" appears..

THEY ARE VERY MUCH ALIKE...
Some others schematics can be found  on the net..
With some effort just replay the schema over the "new" shit...


Do you have some insight as to what I'd be looking for by ohm probing the MOSFETs? When you say High and Low sides, are you referring to pins 1 (HSG) and 8 (LSG) on the DJ27BZ dualFETs? If so, am I just probing between the two and looking for some resistance value (that hopefully is in the datasheet)? My confusion lies with knowing that whatever I probe isn't affected by the component being installed in the circuit(s).

Thanks again for the response!


Yes my  tips are very straight.
- OHM probing is a required skill for this job.
- you need to probe all MOSFETs and regulators.
- YOU ARE LOOKING FOR DEAD SHORTS
- best tool for the job is an ANALOG multimeter with x1 scale
- it is 10x faster than digital
- numeric values are not relevant at this test
- you need 3 fast things:  OPEN or NORMAL or DEAD SHORT.

The needle spots that in milisecs...

Attached a helper of a typical regulator for GTX boards..
- low side MOSFETs are  GROUNDED
- high side MOSFETs are not (you need to probe them per si
- probing requires BOTH DIRECTIONS  (test.. invert probes... test again)
- REGULATORS ALSO HAVE MOSFETs inside...

THE COILS WILL IDENTIFY EASILY THE MAIN TEST POINT

Last word - this is not a trivial thing... but .. you may luckily spot
an easy fixable dead short..

 :-+
Paul
 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
... My bench AMM  made by myself for this purpose ...

In dozen seconds  it allows me to check dozen coils..
just watching the needle - very fast...

Paul

 

Offline yjgfiklTopic starter

  • Newbie
  • Posts: 5
  • Country: us
Hi Paul,

I'm starting up a spreadsheet with the results of my pin to pin testing of all the MOSFETs in the power stages. I struggled a bit with the schematic you posted as I believe the 1080 ti has two dual channel/integrated MOSFETs per phase, as opposed to the layout in your schematic (and I'm just not well versed on electrical schematic reading to begin with).

Preliminary results lead me to question what I'm seeing with respect to Pins 2, 5, 6, 7 and Pad 9 as they all have continuity with ground. Pin 8 on each of them, identified as Low Side Gate, does not have continuity with ground.

The inductors all have a dead short to ground (~0.2ohm) as do the high and low sides of the output capacitors (to the left of the inductors). I have a functional GTX 780 Ti that I am using for reference despite the different architectures, and find that the inductors and high sides of the output capacitors measure 3ohm to ground, with the low side of the output caps being another short to ground. Again the latter is a functional card, so I expect that this is normal behavior.

Is it possible that a dead output cap could be resulting in Pins 2, 5, 6, 7 on all the MOSFETs could appear to short with Pad 9 / ground? It appears that all the output phases are in parallel so this could happen. I tested the theory with the 780 Ti by shorting 1 cap with a wire across the high/low sides, and found that the high side of a DIFFERENT cap was then shorted to ground.

My other question would be that if 1 phase (or components within the phase) is shorted, causing all the other phases to appear to be shorted, why would only the 1 phase heat up under power and not all of them?
« Last Edit: December 22, 2020, 07:09:15 am by yjgfikl »
 

Offline aqibi2000

  • Regular Contributor
  • *
  • Posts: 211
  • Country: gb
Everything is possible.

Use a 4 wire kelvin measurement to accurately deduce if there is still is a  short when dealing with such low ohm phases.
Tinkerer’
 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Hi Paul,

I'm starting up a spreadsheet with the results of my pin to pin testing of all the MOSFETs in the power stages. I struggled a bit with the schematic you posted as I believe the 1080 ti has two dual channel/integrated MOSFETs per phase, as opposed to the layout in your schematic (and I'm just not well versed on electrical schematic reading to begin with).

The posted figure is just a small part of the schema
which can be found on the net easily.. GTX7xx family
is well posted in several places - attached some more parts..

When testing multi phase regulators you need to remember
that low side MOSFETs are all grounded and "in parallel"
on that output - being multi phased.

but several other important ones are stand alone..


The inductors all have a dead short to ground (~0.2ohm) as do the high and low sides of the output capacitors (to the left of the inductors). I have a functional GTX 780 Ti that I am using for reference despite the different architectures,
(..)

If you suspect that a dead short is found...
REMOVE THE MOSFETs and test them outside the board.

At this time test the board pins to see
if by removing the component the short persists or vanishes..

Organize yourself as that is a intense information gathering stage

This is the way..
Paul
 

Offline yjgfiklTopic starter

  • Newbie
  • Posts: 5
  • Country: us
Thanks for the help! I was absent for a while due to the holidays. I did more searching around though and found a few surprising videos that indicated that measuring resistance or continuity with my multimeter will not appear to work. It seems that although I believe that power phases has some short somewhere in the circuit, it's hard to tell as the video below indicated that with the GTX 1xxx series, the board resistances are very low in comparison to past GPUs.

Here's a video, which funny enough compares resistance between the 2 GPUs that I was actually doing the same with. After seeing this, I figured I'd have to cut my losses as I won't readily have access to the 4 wire micro-ohmmeter regularly enough to continue troubleshooting. Thanks again for the help.

 

Online wraper

  • Supporter
  • ****
  • Posts: 17635
  • Country: lv
You don't need kelvin connection to measure GPU resistance precise enough for diagnostics needs. If multimeter has at least 0.01 OHm resolution, it's possible to distinguish shorted GPU from OKish. You just need to press the probe hard to get stable connection with minimal resistance. And substract resistance of shorted probes (poking the same pad).
 
The following users thanked this post: SilverSolder

Offline yjgfiklTopic starter

  • Newbie
  • Posts: 5
  • Country: us
Yeah I think my main issue is my multimeter doesn't quite have that resolution. Considering the virus I wasn't really able to go into work and use better equipment outside of work hours and continue troubleshooting.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf