Question EVGA GTX 1080Ti FTW3 Display driver stops responding

LordMustang

Distinguished
Jun 29, 2015
26
0
18,530
Hi all,

For a few weeks now the GPU crashes after running a game for a while. At a random point, the game will freeze, after which it will force close. sometimes the game will report that the D3D device/driver was lost.

I'm unsure how to figure out what causes this issue. It didn't happen up until a few weeks ago. I haven't overclocked the card, it's still on EVGA's base clock speeds and voltage curves. Temperatures at all times have remained well below the limits.

Things I've tried:
  • Switching the Master/Slave (Normal/OC) BIOS switch on the GPU
  • Cleaning and replugging the 8-pins connector
  • Plugging the GPU into a different PCIE slot (PCIE x4 rather than the x16)
  • Cleaning drivers with DDU and reinstalling
  • Installing older driver version

Does anyone know other steps I could take to troubleshoot this? I've read that adjusting the TDR timing may be an option, but by the sound of it that doesn't seem to fix the issue.

Event Viewer error:
Display driver nvlddmkm stopped responding and has successfully recovered.
EventID: 4101
Name: Display
Level: Warning

Build:
AMD Ryzen 5 5600X
Corsair RM750 PSU
Gigabyte B550 AORUS PRO AC
Corsair DDR4 Vengeance LPX 2x16GB 3600
NZXT H510
 

Phaaze88

Titan
Ambassador
Things I've tried:
  • Switching the Master/Slave (Normal/OC) BIOS switch on the GPU
  • Cleaning and replugging the 8-pins connector
  • Plugging the GPU into a different PCIE slot (PCIE x4 rather than the x16)
  • Cleaning drivers with DDU and reinstalling
  • Installing older driver version
Add this to the list:
Go into Precision X, unlink the Power and Temperature Limits(they're linked by default in Afterburner, so maybe the same is true for Precision). Once un-linked, take the Power Limit slider and drag it all the way down. Click Apply and play away.
Ignore any performance impact - you just want to see if things will stop freezing/crashing.
 

LordMustang

Distinguished
Jun 29, 2015
26
0
18,530
Add this to the list:
Go into Precision X, unlink the Power and Temperature Limits(they're linked by default in Afterburner, so maybe the same is true for Precision). Once un-linked, take the Power Limit slider and drag it all the way down. Click Apply and play away.
Ignore any performance impact - you just want to see if things will stop freezing/crashing.
Using HWiNFO I monitored the 12V rail, and with a regular power target (100 in Precision X1) I noticed that the power kept dropping when I put the GPU under load, until it would eventually reset. Here's the graph:
Rut3Dbc.png


When decreased the power limit all the way to 45, the power curve was a lot more stable, without a considerable drop. The GPU/driver/D3D device did not reset.
q7qkVss.png


According to EVGA, it's likely that the fluctuation at 100 power limit is beyond acceptable ranges.
https://www.evga.com/support/faq/FAQdetails.aspx?faqid=59025

Do you think this may potentially indicate that the PSU's 12v rail is faulty? And if not, would that mean that the card is faulty given that it can't run on its factory power limit?
 
Last edited:

Phaaze88

Titan
Ambassador
Umm, I don't think screen shots like those can be used in place of a multimeter...

How old is that power supply?
Did you test overclocks(doesn't matter if it was cpu or gpu OCs) often while you had it? A bit of personal experience here, but I wore down what is regarded as a great psu in about 5 years, screwing around with many different tweaks on a 7820X. It wasn't possible to play games, or run gpu heavy apps without freezing. I could use the PC just fine with anything else. Then I later found it was playable by dropping the 1080Ti's power limit all the way down. So, I got a new psu, and haven't had any hiccups for the last 5 months.


BUT... In my scenario, the PC always froze, forcing a power down+restart. There was never a 'D3D device/driver was lost', even though I did try troubleshooting that at one point with DDU and a mix of recent 400series and ancient 300series drivers.
I have one other idea. 'D3D device/driver was lost', is usually driver related, but you've already tinkered with DDU and other drivers...
So I'm thinking the factory overclock might not be stable anymore? The built-in Gpu Boost 3.0 is software. When you open Precision again, do not lower the Power Limit. Instead, lower the core clock by 100, 200, and 300mhz then see what happens.
 

LordMustang

Distinguished
Jun 29, 2015
26
0
18,530
How old is that power supply?
Not that old, roughly ~1 year.

Did you test overclocks(doesn't matter if it was cpu or gpu OCs) often while you had it? A bit of personal experience here, but I wore down what is regarded as a great psu in about 5 years, screwing around with many different tweaks on a 7820X. It wasn't possible to play games, or run gpu heavy apps without freezing. I could use the PC just fine with anything else. Then I later found it was playable by dropping the 1080Ti's power limit all the way down. So, I got a new psu, and haven't had any hiccups for the last 5 months.
No, I have never overclocked the card myself. It's always run at factory settings.

So I'm thinking the factory overclock might not be stable anymore? The built-in Gpu Boost 3.0 is software. When you open Precision again, do not lower the Power Limit. Instead, lower the core clock by 100, 200, and 300mhz then see what happens.
Would that just detoriate over time? It would be sad to always have to run the GPU at reduced clock speeds.

I've tried putting the power target to maximum (128) and changing several values. I then ran the Unigine Superposition benchmark. These were the results:

GPU Clock speed: 0
Voltage (?): 0
Display driver loss after ~20 seconds

GPU Clock speed: -100 mHz
Voltage (?): 0
Display driver loss didn't occur

GPU Clock speed: +50 mHz
Voltage (?): 100
Display driver loss instantly

I'll try to place the GPU into a different build and run the same tests and check whether I get the same results. If the same crashes occur, would it be safe to assume the GPU is faulty?
 

Phaaze88

Titan
Ambassador
Would that just detoriate over time? It would be sad to always have to run the GPU at reduced clock speeds.
Actually, overclocks do degrade over time - that includes cpu and memory/XMP ones - though it usually takes some years to happen. We can delay that process with good management of temperatures and voltage - if we're allowed some degree of control.
Gpu Boost already does most of the gpu OC, like +300mhz. The vendor, EVGA, pushed it a bit further on their FTWs - looks like an extra 100mhz.


I'll try to place the GPU into a different build and run the same tests and check whether I get the same results. If the same crashes occur, would it be safe to assume the GPU is faulty?
Let us know how that goes.
In regards to the question, I can't say yet, but I've reason to believe the factory OC isn't stable:
-in the 2 tests you ran(dropped power limit and -100mhz), the gpu was running - I'm sure performance wasn't ideal in the first one, but in both tests, it still ran.
^In the power limited test, the gpu couldn't run at the EVGA OC, because there wasn't enough power to do so. In the -100mhz test, the gpu was practically running at Founder's Edition clocks.
-the RM750 is a good psu, and you haven't had it long.
-you've not been tinkering around with overclocks.

How long have you had this card?
 

LordMustang

Distinguished
Jun 29, 2015
26
0
18,530
Let us know how that goes.
In regards to the question, I can't say yet, but I've reason to believe the factory OC isn't stable:
-in the 2 tests you ran(dropped power limit and -100mhz), the gpu was running - I'm sure performance wasn't ideal in the first one, but in both tests, it still ran.
^In the power limited test, the gpu couldn't run at the EVGA OC, because there wasn't enough power to do so. In the -100mhz test, the gpu was practically running at Founder's Edition clocks.
-the RM750 is a good psu, and you haven't had it long.
-you've not been tinkering around with overclocks.

I've finally got around to plugging it into a different build, and the same behaviour is observed. So it's definitely the GPU.
How long have you had this card?
I've had it for about a year, but I've bought it off someone else who has had it since 2017-2018.

I guess there's a chance the card is on its way out then.
 
Last edited:

Phaaze88

Titan
Ambassador
I've finally got around to plugging it into a different build, and the same behaviour is observed. So it's definitely the GPU.

I've had it for about a year, but I've bought it off someone else who has had it since 2017-2018.

I guess there's a chance the card is on its way out then.
Aye. A bit hard to deny that it's the gpu at this point. You can keep using it if you downclock it a little, which isn't nearly as bad as dropping the power limit to 50%.

No telling what the previous owner did with it...