GTX 970 giving me multiple Blue Screen of Death issues

James27

Reputable
Feb 14, 2015
39
0
4,530
So I recently bought a GTX 970. It installed fine, drivers installed and it re-started with no problems. I only get issues when I start up a game and get in to actual gameplay. I've tried two games so far. The first is Lords of the Fallen. Within a few seconds of loading a game, everything freezes, and then the whole screen goes a bright, lime green. After a short while my PC restarts. When it gets back on to Windows, an error report says it's recovered from a Blue Screen. The BCcode is 116.

The other game I've tried is Wolfenstein: The New Order. This game does slightly better, but it's still very unstable. It freezes for about a minute every few seconds, and when I open the menu. After a few minutes I actually do get a Blue Screen, and when it restarts the BCcode this time is f4.

Again, when I'm on the desktop, and in in-game menus, everything's fine. It's only when I get in to actual gameplay there are problems. Graphics drivers are definitely up to date. I did have an AMD card before but I used DDU to clean out the drivers. Probably also worth mentioning, once when it restarted from LotF's lime screen, I opened the case and the GPU felt super hot, but I according to GPUTemp it never went above 60 degrees.

Specs:

OS: Windows 7 64-bit
CPU: Core i7 3770
GPU: GTX 970
RAM: DDR3 16GB
Motherboard: P8H77-V LE
Hard Drive: wdc wd5000aakx
PSU: 600 watts

Edit:

More details on the problem, according to the Reliability Monitor.

Description
A problem with your video hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 2057

Files that help describe the problem
WD-20150304-1837.dmp
sysdata.xml
WERInternalMetadata.xml

Extra information about the problem
BCCode: 117
BCP1: FFFFFA8008B5F010
BCP2: FFFFF8800423645C
BCP3: 0000000000000000
BCP4: 0000000000000000
OS Version: 6_1_7601
Service Pack: 1_0
Product: 768_1
 
Solution
put a pointer to the actual GTX 970 card you have. various manufactures have made factory overclocked versions that will hit timing problems with the default device driver or the timing of the PCI/e bus settings in your BIOS. Also, some GPU can have GPU BIOS updates or you can use a utility to set the GPU timings to match the stock timings for the reference card.

-you might list the actual PSU name and model number.

-you would not have to install windows 10 or 8, I just mention that they will do extra data integrity checking of your hard drive for you.


This bugcheck 0x116 indicates that the graphics system did not respond within its allowed time out. I think the default is 30 seconds, that time out period can be changed via registry setting.

I have looked at many of these bugchecks and they have had various causes.
most people will do the basic, update the device driver for the graphics card.

Here is a few other things that relate to other systems that have produced the same error and how they were fixed:

-when using a newer Nvidia card be sure to upgrade your Ethernet driver or turn off the shadowplay streaming. Bugs in the ethernet drive bog down the graphics driver to the point it does not respond quick enough.

- if you don't use sound from your video cable to your monitor speakers, disable the graphics high definition sound support.
it will remove a lot of processing that the GPU will have to do and can help the GPU respond within the timeout.

- update your BIOS or reset it to defaults, people often turn off their machine and plug in a new graphics card without going to the BIOS and forcing the BIOS to redetect all the hardware. The BIOS makes a database of hardware and settings and sends this database to windows, if you don't go into BIOS and re detect the database will lie to windows and say you have hardware that is not in the machine, then winodows will detect the new hardware and try and not use any of the hardware settings that the BIOS said were already in use.

- update the BIOS for other reasons, it sets the defaults for the clock rates on the PCI/e bus. Some BIOS will overclock the defaults automatcially, this is bad if you already have a factory overclocked card. Some overclocked cards have the timing too far advanced and you need to overclock the PCI/e bus to get the timing signals in the electronics to match the card.

- update the BIOS and USB chips drivers, I have seen microsoft directx chained behind a interrupt for a broken version of a USB driver. The USB driver just took too long to pass the interrupt or did not pass the interrupt directx. Directx timer still goes off after 30 seconds and bugchecks the system.

- it has been a few years but some of the older SSD drives would get behind on their garbage collection and would just pause the system for 30 or 40 seconds and you would get a bugcheck in directx if you were running a game. These turned out to be firmware bugs in the SSD. You could get the same effect if you have a failing harddrive, or a solid state drive that was nearly full. (you want 20% free space on a SSD) If you run a spinning hard drive you should be running windows 10 or maybe windows 8.x to take advantage of the data integrity repairs the newer OS does on the spinning drive. Windows 7 requires a wipe of the drive and a low level full format (not a quick format) to fix these problem. (people never bother to do the full format, the do a quick format and just end up with the same problem after a windows 7 reinstall)

- I have seen some deadlocks in graphics driver code related to configuration problems between the motherboards outdated audio drivers and the graphics card audio support. If you update the motherboard audio support drivers the graphic card started to work ok. The graphics card has to support sound via HDMI cable, display port cables. Bad motherboard audio drivers can mess up the audio support on the GPU even if you don't have a speaker connected to it.
the graphics card does not know if you have a speaker, you have to disable the support if you don't need it. (don't remove the support, just disable it in device manager or the plug and play system will re install it)

-there are other common bugchecks that people also see related to the graphics card and its power supply
bugcheck 0x124, is common with underpowered GPU, or if someone does not connect all of the GPU power connections.

black sreen problem, is often caused by under rated power supplies OR not rebooting your system after microsoft updates a graphics driver.
and then running the OEM driver update software. (microsoft graphics updates require a reboot, the OEM version does not. just never run them both at the same time)

well, that is all I can think of off the top of my head.

 

James27

Reputable
Feb 14, 2015
39
0
4,530


Thanks for the reply. Shadowplay is turned off and, according to Driver Identifier, all my drivers are pretty much up to date. I disabled the high definition sound support, but it didn't help. As for updating the BIOS, I updated that a few weeks when I got a new CPU, and it seems I got the latest version then (4601). I can't find anything newer, anyway. I reset to defaults, that didn't work. Also, I'm PRETTY sure my hard drive isn't a SSD. Doesn't say it is in Device Manager, anyway. I shouldn't have to get Windows 8 or 10 to get it to run properly. I've had this hard drive since 2011/2012 and it's been working fine.

The only other thing that catches my eye what you said about "re detecting the database". What did you mean by that? Not sure how to do that. Or is that what you meant by updating the BIOS? Sorry, still a bit of a computer noob lol. It's an ASUS BIOS, by the way. Thanks again for the help.
 
when you update the BIOS or reset it to defaults or change any hardware setting in BIOS it should re scan the hardware and re assign intterupts and DMA channels. It puts all of these settings in a little database that it then passes on to windows. Often people turn off a machine, add new hardware then boot to windows, windows plug and play detects the new hardware and people think things are good. Windows ends up thinking you have two sets of hardware, one that it detected via plug and play and another that the BIOS said was there and windows think it is just not working.
But windows has to configure the new hardware and not use the settings of the "not working" hardware.

best to go to BIOS and reset it after adding new hardware to avoid this problem.

as for windows 10 or 8, they do nice things for old hard drives.
spinning hard drives have known failure (error) rates. (drive bearings wear, sectors become misaligned with the sector markers placed by the heads and controlled by the servo motors )

very high failure rated within the first 30 hours of operation, then about 15% a year cumulative until the drive "dies"
something like:
25% first year
15% second year
15% third year
15% forth year,...

until the drive is non functional.

windows 8 and 10 will attempt to extend the life of the drive by locating and moving data from sectors that are starting to produce read errors. It does this as a background process reads the data over and over until it gets a clean read, then marks the area as bad. windows 7 requires you to do a full format (not a quick format) to fix this problem. often people get errors and just do a quick format and reinstall, then different files are placed in the weak sectors and their problem might go away or it might not.


 

James27

Reputable
Feb 14, 2015
39
0
4,530


So basically what you're saying is, I just have to reset BIOS to defaults, which I already have done? :\

I suppose the only other thing to try is a full format and reinstall of Windows. I did do a reinstall of Windows recently, but I guess it was a quick format like you said. How do you do full format?
 

James27

Reputable
Feb 14, 2015
39
0
4,530
So I jumped ahead and did some searching online. Seems all you need to do is press the "format" button when you reinstall. Actually I think it took less time than before. Last time I did it it took a few hours, this time it seemed to only take an hour. Maybe because the hard drive wasn't as full. Anyway, it hasn't worked. Seems to be worse, in fact. I tried playing Wolfenstein. Almost immediately when to black screen, then restarted. Same BCcode, 116. F**k sake, I'm beginning to think I've been sold a lemon.

Could it be a power supply issue? I mean, according to this it only uses 150w, so I thought my 600w PSU would be more than enough. But I dunno, maybe not.
 
put a pointer to the actual GTX 970 card you have. various manufactures have made factory overclocked versions that will hit timing problems with the default device driver or the timing of the PCI/e bus settings in your BIOS. Also, some GPU can have GPU BIOS updates or you can use a utility to set the GPU timings to match the stock timings for the reference card.

-you might list the actual PSU name and model number.

-you would not have to install windows 10 or 8, I just mention that they will do extra data integrity checking of your hard drive for you.


 
Solution

James27

Reputable
Feb 14, 2015
39
0
4,530
Do you mean list my PSU name and number here? When I open the case the only thing I see is the serial number, which is kb110600566, and a label with the number 110527. Not really sure about everything else you said there lol. As I said, bit of a noob when it comes to this.

Edit: Thought you should know, I downloaded and ran BlueScreenView. It says it's caused by these files: dxgkrnl.sys, dxgmms1.sys, nvlddmkm.sys.
 

James27

Reputable
Feb 14, 2015
39
0
4,530
Sorry for bump and double post, but I've just discovered what the problem is. And that is, my GPU fans aren't spinning. Turns out, they spin when I first turn on my PC, then when Windows is about to load they stop. So, does anyone have any suggestions? I've tried switching the power connectors in both sockets... if that makes sense... which didn't make a difference.
 

James27

Reputable
Feb 14, 2015
39
0
4,530


Yeah but that's the thing. They don't. Otherwise I'm pretty I wouldn't be getting blue screens from overheating.

Edit: Actually, looks like you might have been right. I downloaded MSI Afterburner and was able to get the fans spinning manually. I tried playing Wolfenstein and still got the same problem. So, looks like the fans aren't the problem.

Anyway, I'm thinking of replacing my Mobo. Maybe the problem is that my current one can't handle these super modern specs!
 

James27

Reputable
Feb 14, 2015
39
0
4,530
Bit of a major update. I bought a new case (HAF 912 plus) and motherboard (P8H77-V LE), and pretty much put everything from my old case in to my new one. Things are better, but by no means perfect. First I tried Wolfenstein: The New Order. It didn't immediately crash, which was a good sign, but there were faint lines going across the screen, like this. Worse, it kept freezing every few minutes, usually for about a minute at a time. Pretty concerning, but thankfully it never crashed. Next I tried Lords of the Fallen. Here, the lines were a lot more noticeable, and the freezing a lot more frequent. So I decided to check temperatures while the game was still open. (RealTemp for the CPU and msi Afterburner for GPU). CPU seems fine. Temperature stayed at around 50/60°. With the GPU, however, I literally watched it rise. When it got to around 70° it actually did blue screen and crash.

This is has lead to believe it's almost definitely a GPU overheating problem. But I find this baffling because, as I've already said, this is a brand new GPU. How can it be doing this already? Is it worth taking the fan off and re-applying thermal paste? If not, what do you guys suggest? (And if I do, when it comes to applying the paste... I use the pea method for CPUs. Would that work for GPUs?)

Also, I've set the GPU fans back to auto. I figure it's best to just leave them to it. As popatim said, they do kick in when needed.

Edit: I think I've solved it! It seems it was factory overclocked, as johnbl suggested, so I simply lowered the clock setting using Afterburner. And, yeah, everything seems fine! Wolfenstein and LOTF are running silky smoothly on highest settings, with no freezes whatsoever. Temperature while playing LOTF hovers at around 70°, which I hear is pretty standard. The only problem is the lines I mentioned before still remain, though not as intense. But I hear getting an HD monitor should sort that out... yeah I'm still using VGA lol. There's that, and I'm not entirely happy with some of the sounds coming from my case, but I might just be paranoid.

So I suppose you could say the solution here was getting a new Motherboard and/or lowering the clock settings. I've chosen the post where johnbl suggested it could be overclocked as the solution, since it seems that was the main culprit here.
 

James27

Reputable
Feb 14, 2015
39
0
4,530
Really sorry for the triple post, but I'm still having major problems. Since my last post I've been getting another Blue Screen with 7a as the BCcode. I didn't mention this before, but just before I was about to get blue screen (the original and this one) my PC would make a noise. It's kind of hard to explain, kinda like a high "shew". Anyway, I always assumed it was coming from my graphics card, but I opened my the case and realized it was actually coming from my hard drive. So these two things made me think part of the problem could be my hard drive was simply on the way out. After a while it actually died. So I got a new one (HDD, not an SSD sadly), hoping it would solve all my problems. Well... not quite.

I've tried playing Lord of The Fallen, got a blue screen, and according the Reliability Meter this is what I'm now dealing with:

Description
Faulting Application Path: C:\Program Files (x86)\Steam\steamapps\common\Lords Of The Fallen\bin\LordsOfTheFallen.exe

Problem signature
Problem Event Name: APPCRASH
Application Name: LordsOfTheFallen.exe
Application Version: 0.0.0.0
Application Timestamp: 54f449cc
Fault Module Name: LordsOfTheFallen.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 54f449cc
Exception Code: c0000005
Exception Offset: 000000000d36daa0
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 2057
Additional Information 1: 60f7
Additional Information 2: 60f78930c47d809951bb61d2e31b8c88
Additional Information 3: 594f
Additional Information 4: 594f94a1eb45f76eac3c65d50c3f2e3a



Got two for Steam as well, which happened the next day at the same time:

Description
Faulting Application Path: C:\Program Files (x86)\Steam\Steam.exe

Problem signature
Problem Event Name: APPCRASH
Application Name: Steam.exe
Application Version: 2.70.82.9
Application Timestamp: 552c4097
Fault Module Name: ntdll.dll
Fault Module Version: 6.1.7601.18798
Fault Module Timestamp: 5507b3e0
Exception Code: c0000005
Exception Offset: 0002e066
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 2057
Additional Information 1: b9c9
Additional Information 2: b9c94e5f46fb4195ab5642e2690ef06c
Additional Information 3: 3028
Additional Information 4: 3028920251af47656d9c1c8d9b92026e

Extra information about the problem
Bucket ID: 1057553070



Description
Faulting Application Path: C:\Program Files (x86)\Steam\Steam.exe

Problem signature
Problem Event Name: BEX
Application Name: Steam.exe
Application Version: 2.70.82.9
Application Timestamp: 552c4097
Fault Module Name: tier0_s.dll_unloaded
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 552c4038
Exception Offset: 72b7d640
Exception Code: c0000005
Exception Data: 00000008
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 2057

Files that help describe the problem
WERInternalMetadata.xml
AppCompat.txt
WERDataCollectionFailure.txt
 

James27

Reputable
Feb 14, 2015
39
0
4,530
Damn, I've already tried that. I'll do it again and see if it helps.

I'm not using Avast.

Edit: It hasn't worked. Also, my network drivers appear to be all up to date. Also, forgot to say this before, I ran a full Disk Check and it didn't find any problems.
 

James27

Reputable
Feb 14, 2015
39
0
4,530
OK, so I'm going to need some opinions. First of all, is it possible for this to be caused by a faulty power supply? Apart from the RAM (which I've tested, didn't find any problems), it's the only thing I haven't replaced. So it COULD be on the way out, but... I dunno. Could it be causing my hard drive to fail, and this APPCRASH?

If not, is it worth taking my PC to some sort of repair shop, and hopefully get this sorted once and for all?
 

James27

Reputable
Feb 14, 2015
39
0
4,530
Sorry for dragging up this old thread yet again.

So, I still get 116 bluescreen, but there's another, more serious bluescreen I get when I lower the clock speed that says: Stop: 0x000000F4. It still only occurs in actual gameplay, and I also get periodic freezing and sound loops. The more demanding the game the more likely these thing happen.

I've tried everything suggested here (Second post). All the tests have shown no errors, including the memtest86+, which I ran for 10 passes. The only interesting thing is, when I run the extended version of the Memory Diagnostics Tool, it always gets stuck at 21%. It seems to be a pretty common problem, but I wanted to check is this a sign that my RAM is busted, or doesn't it matter since, as I already said, the memtest showed no errors for 10 passes?

Does anyone have any other suggestions, or tests for other parts I could try?
 
you can edit the registry and change your tdrdelay timeout to 8 or 10 seconds, it the card is just lagging it can prevent a bugcheck.
https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx

- if you have a factory overclocked graphics card you might just increase the clock on the PCI/e bus by a few percent and see it that helps.
the default PCI/e should be 100MHZ.

- google how to force a memory dump via keyboard. And force a kernel memory dump, I can take a look at the hardware settings, sometimes I have seen some really strange interrupt settings where the graphics driver was chained on the same interrupt with something like a USB device.
(and the USB driver was old and very slow and cause the graphics system to timeout because the USB driver looked at the interrupt first)

- if you think you might have software driver related problem, run cmd.exe as an admin then run
verifier.exe /standard /all
reboot
and wait for the next bugcheck.

note: user verifier.exe /reset
to turn off verifier functions after you are done testing or your system will run slowly.
use the verifier with the kernel memory dump for best results. if you use it with a mini memory dump all you will get it the name of the bad driver and a basic description of the problem. The kernel memory dump will have full details of the problem.



 

James27

Reputable
Feb 14, 2015
39
0
4,530
Thanks for the quick reply. I'll definitely check out the PCI/e. I did have a feeling my GPU might be factory overclocked, so it would be good to finally know for sure. I assume I go to BIOS for that? Anyway, I'll try those other things and get back to you.

Edit: Well, nowI feel kinda silly lol. I don't know how to check the PCI/e. I went to the BIOS but didn't see anything about the PCI/e MHz. Where exactly should I be looking?

2nd edit: So I forced a crash using the keyboard. How do I post memory dumps?