Major issue with GTX 980 Ti K|NGP|N

jrmelsha

Commendable
Jun 12, 2016
3
0
1,510
Hello,

I have had the GTX 980 Ti K|NGP|N since December of 2015 and I have a major issue that has repeated twice now. I had the card RMA'd about 30 days ago and it solved the problem but it reoccurred. The problem starts out where if I put the computer to sleep and attempt to wake it up, it would just sit on a black screen. I would be forced to hold the power button to get my system back (which is very dangerous...) I would then get a notification on the BIOS that there had been an "Overclocking Failure, press F1 to enter the BIOS." After about a week of that, it finally just stopped turning back on. What do I mean by not turn on? I could get into the BIOS and the Windows loading screen would appear, but as soon as it got to the login screen (AKA when the NVIDIA drivers are loaded), the system would just be all black screens. The only display that currently works is my DVI port! When it happened last time, I went out to BestBuy and purchased a GTX 970 from some random company and it worked perfectly, dropped right in. What does this mean? It is an isolated problem with these 2 GTX 980 Ti K|NGP|N cards I have used in my system. I personally measured the power supply rails with a scope 10 consecutive times for the boot up sequence and experienced no variations. I tested the current load and everything appears to be normal. Even my UPS is reporting sub-200w usage during the entire boot sequence (which is outstanding for what I have hooked up to it.)

My specs are as follows:

[EVERYTHING IS AT STOCK RUNNING SPEEDS, TO ISOLATE THIS PROBLEM]
Windows 10 x64 Enterprise
Intel i7-5960x
64 GB of G.SKILL 2133 MHZ (8 DIMMs)
ASUS x99 Deluxe U3.1 with the latest BIOS (as of 6/12/2016)
Seasonic Platinum 1000w SS-1000XP Active PFC
2x Samsung M.2 950 PRO 512GB (one running with the ASUS expansion card and one running natively on the motherboard.)
ASUS PCIE Thunderbolt card (single)

This is all powered by a 2500VA 1500W sustained battery UPS with Active PFC correction.

One of my monitors is a 34" widescreen that can only use DisplayPort to run at native resolution/speed, so that one is of course using a DisplayPort connection.

One of my 27"s is using a DVI->HDMI and using the single DVI located on the graphics card.
The other 27" is using the single HDMI port on the graphics card.

TLDR my problem:
When I turn the system on, the Windows loading happens and then right as it goes to the login screen there is just a black screen. The monitors are still on but there is no display but all black pixels. The computer is still running and the motherboard Q-code reports 99 (which means that it has handed off everything to the OS.) The only way to get out of the black screen is to hold down the power button. There is no mouse, sound, or any interaction. I have tried entering my password to login and pressing key-combos to do stuff, but clearly nothing is happening.

I was able to get into the OS by entering Safe Mode and uninstalling every trace of the NVIDIA driver and using Base Video mode. I am currently on the system on a single monitor running directly off of the DVI port with like...1024x768 resolution.

When the computer is force-shutdown after the black screen, the motherboard ALWAYS displays an error saying "Failed to overclock, press F1 to enter setup..."

This issue is identical to the previous incident in which RMAing the card solved it, as well as buying a temporary GTX 980 from Best Buy, which isolates the issue to the graphics cards. Moreover it is clear that when the NVIDIA driver runs, it invokes the problem. I have tried a portable Windows 10 Enterprise x64 install off of a flash drive and before the NVIDIA driver is installed, there is no issue. The moment you run the NVIDIA driver installation and it does it's usual screen flashing, the screen stays black and never recovers.

I have tried these driver versions: 368.39, 361.91. I do not fancy trying more :'( These versions have always worked previously for me, and it is a huge hassle to install, see if it works, go through the long removal process, and get my system back to operational at 1024x768 haha.

I have already started the RMA process for the graphics card and I will be purchasing ANOTHER temporary GTX 980 today when Best Buy opens. I would like to be able to fix this issue without going to all of that trouble.

Also, I have tried the other BIOS modes on the card (LN, O.C., and Normal).

Anyway... I hope I gave enough information, if anyone has ANY input or questions please feel free to post :)
 
Solution
CPU clock problems don't explain the display issues though. The only other thing I can think of would be the PCI bus clock screwing up but that would screw up the functionality of every PCI card in the system, not just the GPU (on top of which other GPUs didn't have issues).

jrmelsha

Commendable
Jun 12, 2016
3
0
1,510


Indeed. I used to have 2 GTX 780 Ti Classy's running on this power supply. This single 980 Ti is probably a walk in the park for it, lol. I would probably guess that it might be the culprit but none of my other components have had any hiccups. Besides that, I don't even game... I use this computer as workstation. I don't even know why I own this graphics card, I am probably going to sell it and buy a 960 or something.
 

jrmelsha

Commendable
Jun 12, 2016
3
0
1,510


Thank for the reply! I also tried older BIOS versions and I am now running on the latest version, nothing seems to change the outcome in that realm. I reset the BIOS a couple times completely, once by removing the battery for 24 hours, too. When you install a BIOS for ASUS motherboards, it also resets the BIOS. I am using basically all stock settings, other than some fan profile stuff.
 


If whatever applications you use support CUDA acceleration then it is well worth it to have a high-end consumer GPU.
 

Ryan_78

Honorable


or 1080 or 1070
 

Ryan_78

Honorable


yea honestly you're right. the 1080 is a gaming card. VR and 4K ready

workstation with the Titan or the 980ti though, is stronger.
 
CPU clock problems don't explain the display issues though. The only other thing I can think of would be the PCI bus clock screwing up but that would screw up the functionality of every PCI card in the system, not just the GPU (on top of which other GPUs didn't have issues).
 
Solution