Various BSOD's, Especially under heavy GPU Load (Not Always)

Water-T

Reputable
Jan 12, 2017
43
0
4,530
Hello Tom's Hardware community! :)

I have been having problems with my first pc build for a couple of weeks now, and it is really concerning me.

Please note that before I downloaded or installed any third party software I have, I installed all the latest drivers from my motherboard's manufacture webpage, as well as flashed to the latest bios. (Yes for the replacement as well on the new correct web page for it)

I started out getting the occasional IRQL_NOT_LESS_OR_EQUAL BSOD errors while playing games like the Witcher 3 after 10 to 30 mins or so. The causing driver would always be ntoskrnl.exe, which I have found is a cryptic way of saying some random driver isn't working right. So I reboot into safe mode, use DDU to uninstall my graphics drivers, and then reboot. Then I downloaded my manufactures latest driver for my gpu and installed it. This did not fix my problem. I started digging around to my best abilities to find the culprit driver, but nothing I did made any change. I also downloaded Memtest onto a flash drive and tested each of my ram sticks individually with 10 passes each, with NO errors. And I also scanned my HDD and two SSD's with no errors again. I finally resorted to backing up my important files, and doing a clean install of windows. Unfortunately wasn't a fix, so I clean installed a couple more times, and still was getting BSOD errors... I did this with my new motherboard too, and same results.

Since my first bit of troubleshooting, I have replaced a good bit of hardware. I replaced the PSU twice, replaced the motherboard, replaced the ram, and I even replaced the cpu cooler from a hyper 212 evo to a corsair h110i v2 aio! Ey, at least the temps are on the low 😉

All of my current hardware:
Asus z170-AR Motherboard (Old one was Asus z170-A)
Intel Core i7 6700k
Crucial Ballistix 16gb 2400 (2x8) RAM
EVGA GTX 1070 SC GPU
EVGA SuperNOVA 850 P2 80+ PLATINUM 850W PSU
Two Samsung 850 Evo 500gb SSD's (One has Windows on it, the othe has my games)
One Seagate 2TB HDD (I made this drive the place where downloads, music, and photos go)
Corsair H110i V2 Cooler
Two Corsair 140 Fans on top of pc
One Corsair 120 fan on back of pc
Panteks SP400 Eclipse ATX Case
Peripherals and such: Razer Naga Chroma Mouse, Razer Chroma Black Widow X, Razer Kraken 7.1 headset, Logitech HD Pro Webcam C920


With all of the replacing I have done, I still am getting the IRQL NOT LESS OR EQUAL errors like explained before, but now I am getting a few other random ones; that don't always occur in a game. I'll copy a couple as examples (Also note I will include the latest few dump files at the end of this):

KMODE_EXCEPTION_NOT_HANDLED
DRIVER_PAGE_FAULT_IN_FREED_SPECIAL_POOL
One had a blank bug check, but had this bug check code 0x0000011d

I was also using the windows driver verifier for a while to try to get it to crash so I could better identify the driver, but I couldn't understand it. I would only keep getting BSOD errors following when actually clicking to shut down the pc. I have since then turned off the driver verifier mode.

Here are the 3 most recent BSOD Dump Files (Newest on top):
1) http://www.mediafire.com/file/jhjtakmqi80ct6h/011617-3671-01.dmp
2) http://www.mediafire.com/file/h2viyziqtb8yb5r/011617-3593-02.dmp
3) http://www.mediafire.com/file/y7kho4jqmh34d7y/011617-3578-03.dmp

4) (This is a dump file from a BSOD that the Verifier triggered. Thought it may have different info) http://www.mediafire.com/file/wome1vx1v9zv6d5/011517-6796-01.dmp



Thank you x100000000 in advance to anyone who thinks they can help me! :)







 
Solution
Hopefully someone else can give you a better answer from reading those dumps, but seeing the kind of BSODs and the fact that you've reinstalled Windows and changed mobo, PSU and tested RAM with memtest86, I'd say its your CPU the faulty one. Stress test it with Prime95 (small FFT option), it shouldn't take long for it to fail a test/crash the system.
Hopefully someone else can give you a better answer from reading those dumps, but seeing the kind of BSODs and the fact that you've reinstalled Windows and changed mobo, PSU and tested RAM with memtest86, I'd say its your CPU the faulty one. Stress test it with Prime95 (small FFT option), it shouldn't take long for it to fail a test/crash the system.
 
Solution


Thank you for that suggestion, because I normally use AIDA64 to stability test my cpu. I did an 8 pass with the small FFT option and it passed on all cores. Also no crash. One thing to note though; The temps on my cores were way hotter than I've seen on AIDA64, sitting in the 70's and 80's on all 4 cores. My bios is set to default and my voltage reads 1.344v, but under load it sometimes jumps up to 1.366v.

 


I only installed the Corsair Link software to change the rgb color, its saved to the last color when I uninstalled it :)
The WinRing0x64.sys driver is located in both the PrecisionX OC software files, as well as the RealTemp software files. Both of these I have never heard of having to not use or contact the manufacturer to be able to use. Do you think it could be my cpu, as Radikal_ was suggesting?
 


I uninstalled RealTemp & Precision X OC.
I really need some software to control my gpu's fan curve, without Precision X OC the fans don't spin.

Would you recommend MSI Afterburner?
 


My ACX fans wont spin without a program like that though 🙁

I don't want my gpu to overheat under load!

Ans also, I still can't rule out a faulty cpu can I? I mean it's one of the last pieces of hardware I haven't switched.
 



Last night I uninstalled Precicion X OC and RealTemp, and only got MSI Afterburner to control my gpu. I also removed and re installed the water block of the cpu cooler with new thermal paste as well.

I have barely had a BSOD since these changes. There was one IRQL_NOT_LESS_OR_EQUAL error regarding USBXHCI.SYS, but I seemed to have fixed that myself by updating a USB driver.

The only other BSOD I have gotten is one of the original culprits;

DRIVER_IRQL_NOT_LESS_OR_EQUAL
0x000000d1
ntoskrnl.exe
ntoskrnl.exe+14a6f0

Can anyone fill me in on what driver this is telling me about? I can't tell, and I have already used DDU again to fully remove all of my graphics drivers. I then installed my gpu's latest driver. Gonna see if that's the cause, since I have re seated my gpu as well last night. If I'm wrong let me know!

Dump file: http://www.mediafire.com/file/oq14113g3op3u1v/011717-3406-01.dmp

Thank you! Seems I'm making progress
 


Had another BSOD happen, but this time I was in the Witcher 3 for over 30 mins. Loud buzz coming off the speakers too when its blue screened.

Dump file: http://www.mediafire.com/file/zdjluqbzfdxx8sc/011817-3625-01.dmp

Uninstalling my network driver and then getting the latest one again.

After getting better driver for Network, I got another BSOD Kernel Security Check error.

Dump File: http://www.mediafire.com/file/12y18wh83yhutz3/011817-3390-01.dmp
 


I'm actually agreeing with this, because I just got another bsod and the minidump pointed at nvlddmkm.sys again.
I have thoroughly looked up how to properly uninstall and reinstall my latest drivers, doing so more than 5 times, and I still get the same or similar errors.

Recently opened a ticket with EVGA, will see what they do to help.

Thank you for sticking with me in this troubleshooting journey Paul!

Here is the newest dump file: http://www.mediafire.com/file/ihlo7bst638r6fi/011817-3578-01.dmp

I'll continue to update this thread

 


Although I RMA'd my EVGA GTX 1070 SC for the second time, before I swapped out cards, I did some more troubleshooting with my CPU and potentially faulty GPU. I went into my bios and double checked to make sure everything was manufacture defaulted. I then ran an earlier version of Prime 95 (safer for skylake chips) all day for 12 hours. None of my physical or threaded cores failed, and with respect to the default vcore being set to a roomy 1.345v, the temps on the cores stayed in the high 60's and low 70's in celsius. Before swapping GPU's out I also ran the Witcher 3 to put stress on the GPU and see if it was cause any crashes. To my surprise, no crashes (this was also with the default bios still in place, and also the GPU is not OC'd). I used DDU to uninstall my video drivers before installed the replacement GPU, and of course installed the newest drivers on the new card. After replacing the GPU with the RMA GPU EVGA sent me, I ran the same tests with the same settings as before. Other than some improvements from the new GPU, the Witcher 3 still didn't crash the entire time I had it running.

So for a few days I kept everything running at its default settings. Since I have had no issues, I began to try to find a stable OC for my CPU. Right now I am running my i7 6700k at 4.5GHz with the vcore at 1.296v. I made sure it was stable by running Prime95 (small FFTs) for 12 hours, every core passed smoothly, with temperatures staying no higher than 68c. I was still able run the Witcher 3 again with no crashing or resetting. (Since my CPU OC seems stable, I put the vcore mode to adaptive so I save power and temps while idle) At this point I became confident that I could OC my GPU with MSI Afterburner , so I used Heaven and Fire Strike to test everything. I managed to get a stable OC of +79MHz for the core clock and +400MHz for the memory frequency. With my fan curve, the GPU never goes above 70c under load for a long night of gaming with this extra offset. The CPU cores never go above 55c while under stress from the Witcher 3, but tend to stay in the 40's and even dip into the mid to high 30's.

In conclusion, I'd have to say my issues were being cause by a combination of an unstable OC on my CPU, and drivers that were having problems because of that instability.

I would like to thank the guys here who helped me along the way, I have been BSOD-free for over a week and counting!

Here is my latest FS score : http://www.3dmark.com/3dm/17630950