RTX 2080 Ti Owners Complain of Defects, Nvidia Responds

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
As for "how widespread is it?":
* Hardware Unboxed spoke to a computer builder who had purchased nine FE cards, three of which were defective. One third in a sample of nine does seem a bit alarming.
* A poll on Sweclockers.se currently show that out of 51 RTX-cards...
... 12 are known defective,
... 8 are less than a week old and still running fine,
... 29 are still running fine after more than a week.
That's 24% known defective with a large margin of polling error.

When it comes to why the problem occur one suspect is the DDR6 memory modules from Micron.
 


IMHO, if possible, nVidia should have just fabed two dies, with the RTX portion being its own chip even within the same package. That way the could be binned and paired appropriately with overall higher yields in production of the end product.
 

Well, it adds cost to the packaging, and the extra data movement between the two chips would likely decrease performance and increase energy consumption.

Just tell me where you would even put the memory controllers? Are you going to route the RTX portion's memory transactions all through the main GPU die? What if the RT "cores" are like the Tensor cores, which aren't really cores at all (i.e. they're very closely coupled to the CUDA cores)?

Sure, you could build a separate ray tracing chip, but how much would it have to duplicate from the base GPU attain the same performance levels?

Basically, there are significant design efficiencies to bundling it all together. If you wanted to go multi-chip, then I'd sooner cut down the entire GPU into multiple dies than try to functionally partition them. AMD will probably try this soon, using Infinity Fabric to string together two (or more) GPU dies.
 
The comment section @ toms is weird. Talk negatively about INTEL, NVIDIA, and TomsHardware...You get instant up votes! Even the voice of the guy that reported both of his 2080Ti are working w/o issues was buried!!!!
 

Well, there's some history with this product. Certain negative sentiment finds widespread agreement.

As for people reporting failed RTX 2080 Ti's, I do wish they'd add more details, like the brand, model, symptoms, & state whether they received a replacement and whether that is working as specified.


That adds less information than someone with a failure. It's not as if the failure rate is > 50%, so someone with no problems doesn't tell us a whole lot. And it also lacked the kind of details I mentioned above.
 
This is what happened with my nVidia RTX 2080 TI - ASUS DUAL-RTX2080TI-O11G
https://www.youtube.com/watch?v=FiRbkXoBPmA
After waiting nearly 2 weeks for a replacement card and then being told there is no stock available I gave up on the RTX and bought 2x GTX 1080 Ti's. Not happy with the support from ASUS & nVidia. S/N: J9YVCM00A728296
 
NIVIDIA will get bugs worked out!!!! new memory could possibly be culprite I'm sticking to my EVGA 1080 super clocked for now.... cant belive how long I been using GPU,s say since the 4mb cards anyone remember 3DFX chips voodoo cards phew wee usualy I buy the mid range cards they always seem more stable...memory bandwidth is what I look forward to on newer cards..... still though that ray traceing looks sweetage!!!!!!!!
 
Well, no bugs here, just improper cooling mixed with a bad PCB design and quality control flaws. Kudos for Toms for finding the issue...

Funny, it is the same issue I was having with my evga 1080 FTW. The card died twice due to memory overheating.

Lessons learned, you want a 2080 TI, get the best cooling solution for it and forget OC.
 


I'm not going to claim to be smarter than the engineers that put all this together. It's why I believe this to be a management issue. If they knew there was going to be yield issues ahead of time, perhaps they should have engineering design the layout so it could be partitioned off in their own dies.

And yes, I agree, I think AMD will use Infinity Fabric to link up GPU dies.

One idea would be to stack dies vertically within the same package so as to keep the advantage of IO and latency near (if not the same) par in performance to a single unified die. The problem of course being complexity and thermal management. Not sure if doable, but perhaps put the memory controller at the bottom and the most active die at the top so the HSF can efficiently pull the heat from it as quickly as possible.


 
There might be OCP or voltage regulation problems with certain PSUs as well because of the insane power/voltage spikes with the 2080Ti series. bullzoid from actually hardcore overclocking thinks agressive OCP settings on some PSUs are not playing nice with the cards.
 


That's what capacitors are for. If the card is yanking that hard on current and voltage, then IMHO the cards should be having larger cans surface mounted.
 
TomsHardware...going down the drain since, ever : "Tom's Hardware Germany analyzed its infrared images of the GTX 2080 Ti reference card" Picture says MSI GTX 1080 TI. No wonder GamerNexus will bury you. Shame...
 
"Under certain conditions." What were the conditions? Seems like it would be useful to know if they were trying to push the thing until it dies, or testing normally. Why does the FLIR image say 1080Ti?
 


That was the wrong image, correction made and noted in the article. Thanks for the heads up!
 

Idk, i do remember another dude on reddit having problem with multi-rail PSUs with Vega64 because of insane spikes.
Forgot to add the link to the thread with the bullzoid comment and where the dude switched PSUs and the GPU stopped causing problems:

https://www.reddit.com/r/nvidia/comments/9su4da/nvidia_2080_ti_failures_vs_power_supply/

 
And how long have you worked for Nvidia comrade?