News EVGA Says RTX 3080 Cap Issues Caused Crashes, Confirms Stability Issues

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

nofanneeded

Respectable
Sep 29, 2019
1,541
251
2,090
I would have some concerns about the long-term stability and overall longevity of a card that needed a downgrade to be stable at launch: a design that is already marginally stable at launch assuming AiB don't down-clock their cards more than absolutely necessary to achieve stability over the warranty period has that much less headroom to accommodate normal wear.

If the problem is just in the specs sheets , that is the maximum clock numbers , I dont think it would be a problem much . Just like overclocking CPU's until it creashes.

But if it has something to do with the GPU itself internally and Samsung Manufacturing it , then it is a huge concern.

I am 100% sure that some one in Nvidia will be fired for this ... The specs about GPU maximum boost comes from Nvidia.
 
Last edited:

InvalidError

Titan
Moderator
I am 100% sure that some one in Nvidia will be fired for this ... The specs about GPU maximum boost comes from Nvidia.
Nvidia tells AiBs what the STOCK base and boost frequencies are, AiBs decide their own stock OC. The cards that have problems usually crash at frequencies higher than Nvidia's reference boost spec.

The biggest problem here is that Nvidia did not give AiBs drivers until launch to prevent leaks, so they couldn't QA against real-world loads and tweak boost tables accordingly before it was too late.
 
  • Like
Reactions: JarredWaltonGPU

nofanneeded

Respectable
Sep 29, 2019
1,541
251
2,090
Nvidia tells AiBs what the STOCK base and boost frequencies are, AiBs decide their own stock OC. The cards that have problems usually crash at frequencies higher than Nvidia's reference boost spec.

The biggest problem here is that Nvidia did not give AiBs drivers until launch to prevent leaks, so they couldn't QA against real-world loads and tweak boost tables accordingly before it was too late.

Still , I think Nvidia knows their GPU maximum potential . because simply almost all the time they release the same GPU with higher clocks and "Ti/super" it .. I dont think that nvidia Hides these specs from AiBs ...

I even think Nvidia Gives AiBs OC intervals to stay within , so that they dont compete against future "super" cards.. it is not that honest btw
 
PoSCap are solid polymer caps.

They are neither superior nor inferior, they just cover a different range (DC-10MHz) of the power supply filtering spectrum.

Different size capacitors are most effective across different ranges of the spectrum due to their packaging's intrinsic ESL. Once you pass a given capacitor's self-resonance frequency, it becomes more of an inductor than capacitor and stops contributing to supply filtering, which is why you see clusters of 3-5 different size caps to provide uniform coverage where smaller caps pick up at frequencies where the large caps become less effective.

The POSCAPs likely had too much total ESL to maintain sufficiently flat filtering until the next cap size down could pick up, so the GPU voltage got too noisy for stable operation beyond 2GHz on cards with no intermediate-size MLCCs between the POSCAPs and the rest.

Beginners' mistake as far as high speed circuit design is concerned.

Very well stated.
 
If you include boosting as stock, yes, some people do have problems at stock and some GPU manufacturers have issued vBIOS updates to lower clocks in boost tables as a work-around at least while they are weighing their options.

Option says $10 rebate check while the lawyers for the class action walk away with millions.

It's in their best interest to take the cards back. But just like the nForce chipset option and motherboard solder failure, I see another "sweep under the rug and pretend it doesn't exist" issue.
 
What do you have to say about the following though? Steve from Hardware Unboxed has a 3080 TUF Gaming, and it's doing it too.
So no, Asus isn't safe - none of the AIBs are.

View: https://twitter.com/i/web/status/1309659834468298753


As In said in a post above Igor said the caps issue is something that is quickly seen BUT probably not the reason for all crashes, driver instability is more than likely also a problem..... that will affect all makers.
 
What card did you get?

I have tested RTX 3080/3090 FE
Asus TUF 3080/3090
MSI 3080 Gaming X Trio
Gigabyte 3090 Eagle

Those last two had some stability issues, though the MSI seemed to get fixed with the public 456.38 drivers. The GB card couldn't finish my benchmarking in Shadow of the Tomb Raider or Forza Horizon 4 at factory stock settings, but a 15MHz downclock did the trick. But it might also become unstable during a longer play session.

Also, the FE cards definitely exceed 1710MHz -- that's just the official boost speed, but the 3080 FE hits about 2000 peak, and averages 1850-1950 depending on the game. The 3090 FE is a bit lower (30-50 MHz lower).

EVGA NVIDIA GeForce RTX 3090 24GB XC3 ULTRA GAMING Ampere
 
There is another perspective on this and that is the AIB’s have pushed the boost clocks too far for what is an irrelevant gain for anything other than benchmarks. The boost in question only seems to operate for a few seconds anyway in the right circumstances. I’ve seen it on my 3080 Gigabyte Gaming OC which at stock boosts to just under 2000mhz and so far has been awesome. I have tested a +65mhz OC and the peak boost goes up 2055mhz and seems stable in the benchmark and games I have run. However the performance gain in games is undetectable, 5% higher boost during gaming that lasts 2 seconds adds nothing. If I push higher OC the peak boost jumps to 2100mhz and would crash in Gears 5 when it hit 2100mhz but seems stable in my other games and benchmarks. Again the performance boost is undetectable in the games that run and I only see a boost in benchmarks. The boost that is causing the problem is not the speed the gpu is running at 99% of the time and adds nothing to gaming. If they hadn’t pushed the boosts so hard we wouldnt have this problem popping up and it was probably only done so their gpu performed fractionally higher in benchmarks but added nothing to gaming.
 

Phaaze88

Titan
Ambassador
Buildzoid isn't everyone's cup of tea, but he does shed some insight as to what's going on:
View: https://www.youtube.com/watch?v=THMukcOzB8g


Ampere appears to be even more sensitive to overclocks, unlike the last 2 gens. The AIBs got away with cheaper PCB designs before - higher profit margins, yo - but not this time around.
I said it before, but Nvidia's FE model is REALLY competitive.

There is another perspective on this and that is the AIB’s have pushed the boost clocks too far for what is an irrelevant gain for anything other than benchmarks. The boost in question only seems to operate for a few seconds anyway in the right circumstances. I’ve seen it on my 3080 Gigabyte Gaming OC which at stock boosts to just under 2000mhz and so far has been awesome. I have tested a +65mhz OC and the peak boost goes up 2055mhz and seems stable in the benchmark and games I have run. However the performance gain in games is undetectable, 5% higher boost during gaming that lasts 2 seconds adds nothing. If I push higher OC the peak boost jumps to 2100mhz and would crash in Gears 5 when it hit 2100mhz but seems stable in my other games and benchmarks. Again the performance boost is undetectable in the games that run and I only see a boost in benchmarks. The boost that is causing the problem is not the speed the gpu is running at 99% of the time and adds nothing to gaming. If they hadn’t pushed the boosts so hard we wouldnt have this problem popping up and it was probably only done so their gpu performed fractionally higher in benchmarks but added nothing to gaming.
~So, their usual shenanigans, like with their motherboards.
 
  • Like
Reactions: sizzling
Buildzoid isn't everyone's cup of tea, but he does shed some insight as to what's going on:
View: https://www.youtube.com/watch?v=THMukcOzB8g


Ampere appears to be even more sensitive to overclocks, unlike the last 2 gens. The AIBs got away with cheaper PCB designs before - higher profit margins, yo - but not this time around.
I said it before, but Nvidia's FE model is REALLY competitive.


~So, their usual shenanigans, like with their motherboards.
Thanks for the video. He states this capacitor choice makes a 10-50mhz difference to max gpu speed. This really means the AIB's are just pushing the boost too hard, no one will notice a 50mhz boost difference.
 
  • Like
Reactions: JarredWaltonGPU

ajr1775

Distinguished
Jun 1, 2014
55
18
18,535

ajr1775

Distinguished
Jun 1, 2014
55
18
18,535
So only third party cards are having issues?

Yes. The AIB cards that are factpru overclocked are having issues because the AIB went with a capacitor configuration that is the minimum spec. Instead of using all MLCC caps they used a mix to keep the price down. So, this is on the AIBs as much as it is on Nvidia. Perhaps even more on AIBs by the looks of it.
 

hannibal

Distinguished
Some new info.
Pauls harware did test the issue.
Every gpu he did have were stable.
But everytime he did oc the card over 2000hz it did crash.
No matter if it was asus or fe or anything else. The problem is that some cards boost to 2000hz on stock. So the caps definitely Are not the only problem or problem at all...

View: https://youtu.be/BkTgYMlCl4E
 

nofanneeded

Respectable
Sep 29, 2019
1,541
251
2,090
Some new info.
Pauls harware did test the issue.
Every gpu he did have were stable.
But everytime he did oc the card over 2000hz it did crash.
No matter if it was asus or fe or anything else. The problem is that some cards boost to 2000hz on stock. So the caps definitely Are not the only problem or problem at all...

View: https://youtu.be/BkTgYMlCl4E

I find it sad that Tomshardware now are not testing anything and not caring that other sites are beating them to everything.
 
  • Like
Reactions: hannibal
Some new info.
Pauls harware did test the issue.
Every gpu he did have were stable.
But everytime he did oc the card over 2000hz it did crash.
No matter if it was asus or fe or anything else. The problem is that some cards boost to 2000hz on stock. So the caps definitely Are not the only problem or problem at all...

View: https://youtu.be/BkTgYMlCl4E

To be honest if my card dose crash in games due to OC then will turn it down to its normal hz speed
 
So, here's my current data. It's limited sample size (as in, one of each GPU), so take it for what you will.

  1. 3080 FE: No repeatable crashing during testing at stock. Overclocking did encounter instability at +75MHz in a couple of games, but passed at +60MHz.
  2. 3090 FE: No repeatable crashing during testing at stock. No OC testing yet.
  3. Asus 3080 TUF Gaming OC: No repeatable crashing during testing at 'default' clock (didn't test OC mode yet).
  4. Asus 3090 TUF Gaming OC: No repeatable crashing during testing at 'default' clock (didn't test OC mode yet).
  5. MSI 3080 Gaming X Trio: Initial pre-release driver (456.16) crashed in Metro Exodus consistently without a -20MHz underclock. 456.38 driver appears to have corrected this. More testing is needed to confirm.
  6. Gigabyte 3090 Eagle: Consistent crashing at factory stock in multiple games (Shadow of the Tomb Raider, Metro Exodus, Forza Horizon 4, and The Division 2). -20MHz underclock fixed. Updating to 456.55 drivers to see if that changes anything.

So far, any card that maintains more than 2000MHz during a test has been questionable, but I haven't closely looked at the Asus card data. The 20MHz underclock on the MSI and Gigabyte cards is enough to push them to 1980-2000MHz in the tests I've checked. I can say that at default clocks on the Asus 3090 (1740MHz), things appear quite stable. I've just applied the OC mode profile, which bumps the clocks to 1770MHz, so now I'll see how that goes.

A minor tweak to the drivers to just limit boost a bit, or maybe tune some other aspect, might be enough to correct the instability. Yes, it's a bad showing for Nvidia's new Ampere GPUs and drivers at launch, but we have seen early drivers on new GPUs with issues before. What will really matter is how things are going forward. If the 456.55 and later drivers take care of the instability, the number of people really affected is going to be very small, for less than ten days.

There's the other aspect to consider, though: Even if the updated drivers fix the crashing problems, there's a very good chance manual overclocking is going to be quite limited on the GPU core. (VRAM overclocks could still hit 20.5Gbps, though -- we'll see.)
 
  • Like
Reactions: sizzling
I am no expert but to me for such a new architecture the level of consistency on max frequency seems incredibly tight. Is it possible that the AIB’s have struggled to differentiate themselves from the excellent FE and therefore add value to their higher tier cards. As a result they pushed the boost too hard?
 
I am no expert but to me for such a new architecture the level of consistency on max frequency seems incredibly tight. Is it possible that the AIB’s have struggled to differentiate themselves from the excellent FE and therefore add value to their higher tier cards. As a result they pushed the boost too hard?
I think the lack of access to early drivers (which is rumored) is a big part of the problem. AIBs maybe had some test drivers that only worked for certain things, and those things didn't allow them to fully suss out maximum clocks and stability settings.

This is not new, though. Over the past 15 years of testing hardware, I have encountered numerous factory overclocked GPUs that were pushed too far. They would be stable in 90-95% of games, but a few would crash at 'factory stock' settings. A 25-50MHz drop on the GPU clocks would almost always fix the problem. The less time an AIB has with 'final' drivers, the more likely they are to push too far. Combined with a new architecture and perhaps higher boost behavior, we're getting a rash of problems. It's going to be interesting to see what the 'solution' entails. Will we see recalls, or driver tweaks, or new designs with even higher quality power circuitry? Probably all of the above to varying degrees.
 
  • Like
Reactions: sizzling