Question GPU unstable unless at 100% load/ voltage instability

AntaresSQ01

Reputable
Oct 10, 2016
75
1
4,645
1
HI,
I've had an issue detailed in these posts:
https://forums.tomshardware.com/threads/instability-gpu-output-cutting-off.3554915/#post-21467140
https://forums.tomshardware.com/threads/display-randomly-resetting.3531070/#post-21368123

Anyway the long and short of it: GPU keeps crashing and resetting even on stock speeds (happening every 15-30 mins depending on games)

Things that I ruled out 100%:
PSU
Processor
Ram
Cleaning card
Card thermals
Software (Drivers, windows etc)

Here is the weird part:

So under normal circumstances this card we're talking about
EVGA GTX 980 SC ACX 2.0
Is rated at
1266mhz Base clock
1367mhz Boost clock

Now when I play the card would sit around 1200mhz and voltages would vary around 1.010 to 1.070 volts

Underclocking the card seems to have solved the issue as i could go for 2-3 hours without a crash, haven't tried more, but that's a bit of a bad way to go about it.

So for some reason it seems like the card does not react to EVGA Precision XOC or MSI Afterburner at all. Clock adjustments, voltage slider and power target seem to do nothing (even when voltage controls are enabled or forcing constant voltage).
Anyway I started playing around with settings and testing with furmark. Furmark would pass on stock speeds just fine. Now, OC only by 50mhz (without adjusting voltages) and furmark would crash and recover (as you can see in the video in the 1st post) within 10 seconds. Now as far as i'm aware pretty much any card should be able to do 100mhz without touching anything else. So my next step was to flash a newer, factory bios. This seemed to have helped, now you could pass at 50mhz but would crash at 100mhz OC, Upping the voltage now somehow made it pass at 100mhz but in HWinfo the voltage reports were still between 1.010 and 1.070 despite bumping it by 0.087 (xoc and afterburner max).
I read up and heard stock bios on these cards is pretty bad and got myself a modded bios. Now the card would properly boost and voltages seemed fine (when not under load), the card would react to changing clocks (again not under load) and voltage. This would boost up to 1404 mhz at 1.200 volts.

Now here comes the fun part.
Furmark, 1440x900 Windowed, NO MSAA - Card usage peaks at 96%, voltages drop to about 1.080-1.120 volts and clocks fall to about 1300mhz, but still achieves better than ever score without crashing.
At this point I was curious what would happen if I OC. Again dead after 100mhz even with max vcore (same voltage and clocks when the test started running)

Now I tried a different setting on furmark on a whim.

1440x900 Windowed, but at 2x MSAA. On these settings the card was utilised 98% and voltages were around 1.140-1.180 and clocks at 1370mhz. Passed the test, crashed at 100mhz, no voltage, passed at 100mhz max vcore (even though HW info is still reporting the exact same clocks and voltage as without OC)

Now the part where this becomes confusing.

1440x900 Windowed, at 16x MSAA. At this point the card was peaked at 100% usage, and would you look at that. Vcore still as a rock at 1.200V and clocks steady at 1404mhz just as it would on light activity. Passes without crash. This point I start pushing up the numbers, 50mhz passed, 100mhz pass, 150mhz pass, 200mhz, crash, I up the voltage, this time it actually changes at +0.075V HWinfo is displaying 1.275V. So here I'm running this at a stable 1.275V at 1630mhz and 67°C.

I though I accidentally "clicked" something so I tried 1440x900 Windowed at NO MSAA, immediately hard crash, I had to restart the PC.

So my question to all is: Why is the card only stable at 100% utilisation? I thought it might not be able to hold voltage under load due to faulty power connectors on the card or something but strangely does just fine at 100% but not at 95%. Most games wont max out the card so in that regard, the instability makes sense. And I don't see why software OC doesn't actually change the numbers (unless at this 100% load). I'll try using the card on stock settings with the flashed bios, because relatively it seemed more stable and i would lose less performance if i have to underclock it than I did in it's original state.

I would appreciate any help, I will check the actual solders on the power connectors but on stock bios the card drew 200w as reported by HWinfo, on this it has gone up to 250w and it seems just as happy. As i mentioned at the start those things are 100% ruled out. I've either done/replaced everything listed there. The card IS probably faulty, but it's past it's warranty so I would rather try to fix it than just throw it away.

Thank you guys for any help!

Specs:
CPU: i7 7700K @ 4.80 GHZ
Motherboard: MSI Z270-A Pro
Ram: 2x8GB Crucial Ballistix Sport LT DDR4 @3333MHZ
SSD/HDD: Samsung 850 EVO 250GB SATA
Samsung 860 EVO 500GB SATA
Samsung HD204UI 2TB
GPU: EVGA GTX 980 SC ACX 2.0
PSU: Seasonic Prime Ultra Gold 1000W
Chassis: Corsair Graphite Series 230T
OS: Windows 10 Pro x64 Build 1903
 

ASK THE COMMUNITY

TRENDING THREADS

Latest posts