Question Is my graphics card dying? (video included)

Nov 29, 2022
5
1
15
0
Hello everyone, thanks in advance.



Since yesterday, my pc will either completely reboot or only crash the current program when my GPU is under heavy load. There are a lot of artifacts appearing on screen in both instances (see the video provided). I am able to reproduce this issue consistently and it occurs within mere minutes of heavy load (meaning videogames or 3d rendering). Lower intensity tasks don't seem to trigger the issue, I can have the pc on for hours. I tried rolling back to two different older Nvidia drivers, but it made no difference. The card was bought in 2018. I've been careful with the temperatures and have used MSI Afterburner to ensure that the temperature stays well under 80 degrees celsius at all times. I've also made it a habit to clean my case weekly, so I would assume that dust buildup isn't the issue. I have not, however, changed the thermal paste nor have I really done much else to the GPU. My system specs are as follows:

Motherboard: Asus Prime z590-P
PSU: Corsair RM750x 80 Plus Gold 750W
RAM: Kingston HyperX Fury DDR4 3200Mhz
CPU: Intel i5-11600K @ 3.90GHz (Noctua NH-D15)
GPU: RTX 2080 ti
Case: Fractal Design Meshify C (4 case fans)

Again, thanks for your time!
 
Nov 18, 2022
36
0
30
0
Try lowering the clock in your memory only and running the fans a bit faster, it may help.

I think 1 or more vRAM chips are dying. Try running a test for artifacts in kombustor or something.
 
Nov 29, 2022
5
1
15
0
Hey, thanks for the suggestion.

I ran a few tests with both memory and core clock at -502mhz in MSI Afterburner. There is a definite correlation there, but I still couldn't run the system at a heavy load for more than a few minutes. I had the fans on auto, but the temperature didn't rise above 71 celsius. This time the crash led to a blue screen with the following error message:

Stop code: VIDEO_TDR_FAILURE
What failed: nvlddmkm.sys

Do you think it's safe to say that the card is done for?
 
Nov 18, 2022
36
0
30
0
I was thinking maybe the memory thermal pads are dry or not making good contact.

It may be worth changing your thermal pads and thermal paste.

You may see the vRAM temps on gpuz, they should be under 100º
 
Nov 29, 2022
5
1
15
0
Sorry I couldn't get back to you sooner.

I wasn't able to find the GPU memory junction temperature in either GPU-Z or HWinfo. After googling seems like it might not be available for the RTX 2080 ti?
I did however check the hot spot temperature in gpu-z, which went up all the way to 94 degrees celsius, while the 'normal' GPU temperature stayed at 74 degrees.
Is that of any help?

Thanks again for the help, I appreciate it immensely.
 

Phaaze88

Titan
Ambassador
Ram overheating, or throwing errors out the wazoo - on its way out.
I think there was a rumor floating around that 2080Tis were prone to memory failures.
That dropping the memory clock all the way down didn't stop it, causes me to suspect it's not heat related...

I did however check the hot spot temperature in gpu-z, which went up all the way to 94 degrees celsius, while the 'normal' GPU temperature stayed at 74 degrees.
That's within expectations; hot spot being 10-20C higher than gpu core.
The memory thermals can't be read on most 20 series via software. That really started up with RTX 30, RX 6000, and above.
 
Nov 29, 2022
5
1
15
0
Thank you both so much for the help. I had someone more experienced than me come and give their view on the matter, and it does seem like the memory chips (?) are permanently damaged.

It seems that it’s a relatively common problem with the 2080 ti cards due to the strain they may experience near the point where it connects to the motherboard due to their weight. Perhaps that would explain why the crashing and artifacts don’t seem to be directly related to the temperature. It also started making a distinct ’buzzing’ noise when under increasing load that didn’t seem to come from the fan.

Since I’m unfortunately in a great rush to get a working GPU again, I’ll conclude that chances are this one is beyond repair and settle for getting a new one.

Once more, I can’t thank you enough for your help. All the best to you!
 
Reactions: Jirkey

ASK THE COMMUNITY