Author Topic: Have certain Nvidia AIBs made a mistake with RTX 3000 series board components?  (Read 1997 times)

0 Members and 1 Guest are viewing this topic.

Offline EsxPaulTopic starter

  • Newbie
  • Posts: 2
  • Country: gb
Hello to all  :)

I've brought myself here in the hope of garnering thoughts and opinion from those who actually know their stuff, rather than relying on hoards of self appointed YouTube & Reddit "experts"

My question centers on the very recent furore around the choice of board components being used by Nvidia's aftermarket GPU vendors.

In short, the newly released RTX 3080 has made its way to a small number of customers, some of which have experienced crashing when the cards boost up and over the 2GHz mark.

Some early investigation/speculation has placed the fault on choosing to use POSCAPs Sp-Caps rather than MLCCs to filter the high voltage rails.

Link to article and video which will explain the situation far better than I can: https://www.igorslab.de/en/what-real-what-can-be-investigative-within-the-crashes-and-instabilities-of-the-force-rtx-3080-andrtx-3090/

https://youtu.be/x6bUUEEe-X8

This had led to many people who certainly are not electronics experts (such as myself) hunting the Internet for pictures that show which type has been used on the particular vendor specific boards we have chosen.

Jumping to conclusions such as "Vendor A uses POSCAPs Sp-Caps therefore rubbish" or "Vendor B used MLCCs, therefore good" seems very simplistic to me and I would feel much more confident in listening to the insight that many of you can offer.

Many thanks in advance for any insight you can offer.
« Last Edit: September 26, 2020, 11:01:42 am by EsxPaul »
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4700
  • Country: au
  • Question Everything... Except This Statement
The usual phrase to start it off with "Its a harder question than it appears"

Things that can be taken as fairly concrete:
smaller packages generally have lower inductance
smaller packages / capacitances generally have higher self resonance frequencies
placing capacitors in parallel both reduces the ESR and ESL of of the area they are decoupling vs just one of the same type
The lower the ESR / ESL at the noise frequencies of interest will attenuate it to a greater extent
MLCC capacitors generally have lower ESR than electrolytics of the same size

Things that make it complex...:
Parallel capacitors does not cause just additive improvements, adding more in parallel can sometimes make circuits less stable, as it can add more resonance points, and sometimes cause them to shift down in frequency,
Some circuits prefer some amount of ESR, as below a certain point parasitics in the circuit can form oscillators if not dampened by that ESR,
Different AIB's have populated a different number and placement of power supply phases, with a few pulling the memory phases in nonsense locations
Nvidea themselves specificed only 1 array of MLCC's, to meet there spec, however on there own boards, used 2, implying there proper testing of the reference design revealed some improvement large enough to make it worth the cost to fit not 1, but 2 arrays. if they could get away without any, they likely would not have specified them,

this dives into power plane decouping, and other complex 3D EM modelling to really dig down to what exactly is the cause, Nvidea as a company have enough resources and the raw design files to run these kinds of simulations, for us familiar with the topics, its usually more time effective to just remove capacitors until it starts failing, put the last one back on, then start pullng off elsewhere until it happens again, others just live by the rule of thumb that every chip gets a decoupling cap, usually around 100nf,

In my own opinion, the fastest way to resolve the problem would be for 10 of the problem card owners to swap the capacitors and see, however as that plays in to warrenty, I doubt that is going to happen, leaving this impass, its not like they are particuarly hard devices to purchase,
« Last Edit: September 26, 2020, 10:12:13 am by Rerouter »
 
The following users thanked this post: EsxPaul

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
My bet would be that since this happens only when the clock is boosted up, it likely has nothing to do with caps but much more likely with some voltage regulator browning out under the extra load or something getting hot enough to start failing.

That wouldn't be the first time with Nvidia's hardware.
 
The following users thanked this post: EsxPaul

Offline EsxPaulTopic starter

  • Newbie
  • Posts: 2
  • Country: gb
Many thanks to you both for taking the time to reply. It's much appreciated  :-+

Being a simple layman with no electronics expertise, I won't pretend that I understand the technicalities but this does allow me to be aware that the situation really isn't as clear cut as the gamer community in general would like to think.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
It certainly isn't and the gamer community (and the youtubers who cater to them, e.g. Linus) are a terrible source of technical information on anything.

Most of them have zero clue about what they are talking about and then you get persistent urban legends about things like "fixing" GPUs by "reflowing"  them or the opposite - warming them up just below the soldering temperature. And then people are frying their stuff in ovens or attacking their hardware with hardware store heatguns, destroying even perfectly fixable gear ...

E.g. in the case of the GPUs crashing at the higher clock it could be also simply due to some component (e.g. RAM on the card) which can't run at the higher speed reliably without increased voltage or some other issue - same as with overclocking. And there it will depend a lot on how the card is constructed, which components were used, etc. Almost certainly nothing to do with whether some decoupling is done using tantalum or ceramic caps.
 

Offline Tohr21

  • Newbie
  • Posts: 1
  • Country: ie
I'm curious to see what the official statement is going to be from both Nvidia and the board partners, if they will release any statements at all, and whether the board partners will be kept in line by Nvidia.

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
I'm curious to see what the official statement is going to be from both Nvidia and the board partners, if they will release any statements at all, and whether the board partners will be kept in line by Nvidia.

Not holding my breath. The 9xx series had some bad issues with overheating voltage regulators (VRMs) causing the cards to crash under load and nothing happened with it. Nvidia doesn't care - as long as it isn't a fault with the GPU itself it is the problem of the OEMs.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 17202
  • Country: us
  • DavidH
I would consider the power supply before the card.
 

Offline LeonR

  • Regular Contributor
  • *
  • Posts: 159
  • Country: br
  • PC hardware enthusiast
I've read that partners got proper drivers less than two weeks from launch... and the caps solutions were all standardized by nVidia. If there's someone to blame, it's the green company.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
I've read that partners got proper drivers less than two weeks from launch... and the caps solutions were all standardized by nVidia. If there's someone to blame, it's the green company.

Nuts if true but wouldn't surprise me. Any fool buying a brand new series of expensive hardware like this (the 3080 retails for 850-1000EUR!)  right after launch is just that - a fool. And an unpaid beta tester with more money than brains.

Still, the 3080s are sold out and likely will be in short supply at least until spring, according to Nvidia.  :palm:
« Last Edit: October 07, 2020, 09:10:22 am by janoc »
 

Offline exe

  • Supporter
  • ****
  • Posts: 2622
  • Country: nl
  • self-educated hobbyist
Nuts if true but wouldn't surprise me. Any fool buying a brand new series of expensive hardware like this (the 3080 retails for 850-1000EUR!)  right after launch is just that - a fool. And an unpaid beta tester with more money than brains.

There is more feelings in this statement than truth. At the end of the day "fools" can return their graphic cards within two weeks without any explanations. Not to start that "expensive" is a very relative and personal term.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
Nuts if true but wouldn't surprise me. Any fool buying a brand new series of expensive hardware like this (the 3080 retails for 850-1000EUR!)  right after launch is just that - a fool. And an unpaid beta tester with more money than brains.

There is more feelings in this statement than truth. At the end of the day "fools" can return their graphic cards within two weeks without any explanations. Not to start that "expensive" is a very relative and personal term.

I am not quite sure what your point is here. I am describing what is obviously my experience and my view of the situation.

If you have the money, want the aggravation with an unusable system and getting a part this expensive RMAd or returned (not every retailer will take it back no questions asked and without any fee - it certainly isn't standard everywhere!), be my guest. Also, not every problem appears within the first two weeks - e.g. the older 9xx series VRM problems manifested themselves over a longer period. Often still within warranty but outside of any hassle-free return window.

It is well worth it to not be an early adopter because often the new versions of gear ship with teething issues.

Whether it was poorly performing SATA controller from VIA on some motherboards in the past, Radeon cards that had power management problems and were unstable when the driver turned their clock down, Nvidia cards with the bad VRMs, infamous "Deathstar" disk drives, the bad Macbook keyboards, etc. Or even case in point - this thread itself.

I got bitten enough times by this in the past to not make the same mistake again. I will give the new hardware a few months until the dust settles, the initial warts are sorted out and some reliable reviews are out before considering buying it. I really don't need to have the newest and shiniest stuff that bad that it couldn't wait.

Maybe for you thousand bucks is pocket change, for me it certainly isn't even though I can afford it.
« Last Edit: October 07, 2020, 07:55:51 pm by janoc »
 

Offline LeonR

  • Regular Contributor
  • *
  • Posts: 159
  • Country: br
  • PC hardware enthusiast
I will give the new hardware a few months until the dust settles, the initial warts are sorted out and some reliable reviews are out before considering buying it.

That's my golden rule about buying any new tech. I've seen a increase on companies relying on customers as their QA/QC since the last decade or so. Even if it's covered my warranties it is annoying to deal with problems that shouldn't exist in the first place.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3885
  • Country: de
... it is annoying to deal with problems that shouldn't exist in the first place.

Exactly. I have better things to do with my time than to chase RMAs.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf