[SOLVED] Question: Windows 10 - Likely culprit of BSOD - NMI Hardware Failure?

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Windows 10/64-bit
Motherboard: Gigabyte P55A-UD4P
CPU: i7-860 @ 3.8 ghz
16gb ram
GPU: EVGA GTX 970 (latest Nvidia driver, running @ stock clocks, no aftermarket software)
Dell 25" 2560x1440

1 week ago, I started getting BSOD - NMI Hardware Failures, while playing games.
I have not made any hardware changes.
I figured it might be a Nvidia driver, so I downloaded the latest one.
The BSOD continue, while playing games - 3 different games so far.

I have never analyzed a memory dump before, so I have set up the system to create a Full memory dump - is that what I want to create? The default appeared to be a Kernel memory dump.

I have also downloaded the WinDbg Preview - to analyze the minidump the next time it happens. Hopefully I know what I'm looking for, otherwise is there some folks here that are good at analyzing these things?
-----------------------------------------
Edit: Just ran the Windows Memory Diagnostic app - no errors found.
 
Last edited:

Colif

Win 10 Master
Moderator
better to set it up for minidumps, main reason is size
Dumps can get really big, several gb. Minidumps are mostly around 1mb or less, much easier for other people to look at

Can you follow option one on the following link - here - and then do this step below: Small memory dumps - Have Windows Create a Small Memory Dump (Minidump) on BSOD - that creates a file in c windows/minidump after the next BSOD

copy that file to documents
upload the copy from documents to a file sharing web site, and share the link in your thread so we can help fix the problem

I haven't seen that BSOD before.

Do you have latest BIOS on motherboard?
are both ram modules made by same company? Mismatched ram can cause this
Having latest Nvidia driver can be a bad idea but until we look at dumps, we really don't know what cause might be.
 
Reactions: Jason3022

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Thanks for the instructions - I will change dump size to Small memory dumps.

My motherboard hasn't been updated since 2010 (Gigabyte P55a-UD4P), but does have the last one available.
RAM modules are of the same company -G.Skill

Since the problem started happening prior to installing the latest Nvidia driver, I'm beginning to wonder if it really is hardware related (failing GPU/motherboard/PSU), rather than software. The problem only happens with playing a video game though. I won't speculate further until the minidump is created

I'll post here again the next time the small memory dump file is created.
 

Colif

Win 10 Master
Moderator
Reactions: Jason3022

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0

Colif

Win 10 Master
Moderator
i have sent link to a friend, he will convert

both are hardware errors. we can't ignore that. WHEA = WIndows Hardware Error Architecture
Dumps will show us (maybe) but what are rest of specs? What motherboard?

WHEA errors can be any hardware and sometimes drivers.
Are you running MSI afterburner as remove if so?
If GPU is from Asus, remove GPU Tweak 2

I should wait for dumps before guessing further.
 

gardenman

Admirable
Herald
Hi, I ran the dump file through the debugger and got the following information: https://pste.eu/p/TyVe.html

File information:092019-14046-01.dmp (Sep 20 2019 - 18:05:16)
Bugcheck:WHEA_UNCORRECTABLE_ERROR (124)
Probably caused by:GenuineIntel (Process: GameOverlayUI.)
Uptime:2 Day(s), 7 Hour(s), 02 Min(s), and 13 Sec(s)

BIOS info was not included in the dump file.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 10 Master
Moderator
uninstall acronis
uninstall Comodo
uninstall EaseUS Todo Backup
They all made before 2013 so too old for win 10.

Try this as I suggested and run windows update after DDU to use the drivers that come with windows - they just older Nvidia drivers - https://forums.tomshardware.com/faq/how-to-perform-a-clean-install-of-your-video-card-drivers.2402269/

GameOverlayUI is part of steam, it is the victim. Points at GPU being cause perhaps.

maybe run some benchmarks on GPU and see if you can make PC BSOD.
could try either of these -
https://geeks3d.com/furmark/

https://benchmark.unigine.com/heaven
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
What is Comodo?
I don't use Acronis anymore (if it's on my computer, it's not installed anymore under 'Add/Remove')
I still use ToDo Backup (but it's not running in the background, unless I start it). That could be an issue?

------------------------------------------------------

I will use the DDU (haven't used that since I had an AMD 5770 (8+ years ago?).
Then I'll run the windows update
Then I'll run benchmarks.

Maybe the GPU is dying...
-------------------------------------------------------
Will report back
 

Colif

Win 10 Master
Moderator
Aug 29 2007tifsfilt.sysAcronis True Image File System Filter driver
Aug 29 2007timntr.sysAcronis True Image Backup Archive Explorer driver
Dec 02 2010vdbus.sysComodo Backup driver

I would download autoruns and use it to stop those 3 running at startup. It only stops them at startup, if any program needs the drivers it can start them - https://docs.microsoft.com/en-us/sysinternals/downloads/autoruns

I am not sure about GPU, I find cards that won't accept drivers at all are the dying variety, your card will let you install drivers which makes me wonder.

If you still use todo, maybe see if there is a newer version as its from 2012.
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Autoruns - what a fantastic piece of software - thanks. I filtered, then disabled the 3 you suggested.

I do like Todo, (or Acronis - whichever is less expensive), so I should get a newer version you're right.

-----------------------------------

  • Have run DDU
  • Ran Windows update - allowed it to install windows driver (from October 2017?).
----------------------------------
Will run GPU stress tests when I get a chance to see what happens. Should I run these tests all night, or a few hours etc?
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Update:

-ran DDU.
-allowed Windows to install display driver, Oct.2017.
-disabled a few .sys files as suggested above, via Autoruns software.
-ran FurMark for 30 minutes (2560x1440, 4xAA) - the fps were terrible as expected, but no crashes.

So, I suppose I can continue playing a Steam game and wait for the next bsod, if it's going to happen.
 

gardenman

Admirable
Herald
I ran the dump files through the debugger and got the following information: https://pste.eu/p/rAPB.html

File information:092319-10812-01.dmp (Sep 23 2019 - 22:25:48)
Bugcheck:WHEA_UNCORRECTABLE_ERROR (124)
Probably caused by:GenuineIntel (Process: dwm.exe)
Uptime:0 Day(s), 13 Hour(s), 39 Min(s), and 00 Sec(s)

File information:092019-14046-01.dmp (Sep 20 2019 - 18:05:16)
Bugcheck:WHEA_UNCORRECTABLE_ERROR (124)
Probably caused by:GenuineIntel (Process: GameOverlayUI.)
Uptime:2 Day(s), 7 Hour(s), 02 Min(s), and 13 Sec(s)

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 10 Master
Moderator
Every BSOD is pointing at GPU and yet it survives benchmarks

DWM = WIndows Desktop Manager
GameoverlayUI is part of steam

Maybe it is dying but it just feels like a trap, very few BSOD are that obvious. I don't see anything else on PC that would be setting GPU off.

Have you cleaned card/PC recently? Could be caused by heat as uptime on last 2 BSOD wasn't immediate so could be heat?

You need to test your GPU in another PC, and also test another GPU in your PC. So find a friend or a repair store
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
I could run a longer GPU stress test session?
When I ran FurMark for 30 minutes, the GPU hit 75 degrees celsius which didn't seem bad at all. (2560x1440 @ 4xAA)

I will take a can of compressed air and blow around the GPU?

Failing that, I suppose I could buy another GPU and try that. If it fails too, then return it for refund and back to the drawing board.
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Update: I believe my cpu fan is not working at all. I tested the cpu header with another fan and it spun up, so I'm inclined to believe the cpu fan is shot, rather than the motherboard fan header. Or maybe the psu is on its way out.

I will buy another cpu fan and find out.
I was starting to get bsods, almost immediately upon entering windows. This is now at computer not working period stage. Is it possible the cpu temp can blast up that high without a cpu fan spinning? (Hyper 212+ heat sink).


Should a cpu fan spin up right away when you turn on the computer?

Typing this on my phone lol.
 

Colif

Win 10 Master
Moderator
That is an unexpected turn. Nothing until this stage was suggesting it was anything other than gpu (even though I said it felt too obvious). Once you get a new fan, try running this on CPU - https://www.infopackets.com/news/10113/how-fix-bootable-prime95-stress-test-hardware

What PSU do you have?

was about to suggest running without a GPU until I saw motherboard doesn't allow that option.

Should a cpu fan spin up right away when you turn on the computer?
I believe so, all the fans in my pc spin up at startup until windows takes over
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Update:

(1) I will receive the cpu fan tomorrow in the mail - same Coolermaster 120mm that came with the Hyper 212.

(2) I have gone back in the BIOS and removed the overclock, CPU is back at default - i7-860 (2.8 ghz). It was running fine for 7+ years at that overclock, but I'm wondering if the PSU is degrading and couldn't handle it? Or because the cpu fan was not working for so long and I wasn't aware it was broken.

I meant to answer your question before and forgot at the time:
PSU = Antec Earthwatts 650 (model: EA-650)

(3) In the meantime - although I have 3 drives (2 ssd,1 hd), I have disconnected 2 and just have the o/s drive running. Used compressed air to blow dust bunnies off gpu and cpu heatsink.

(4) I am running the computer right now with the case open and a small desktop fan blowing on it .
Using 'CoreTemp' software, would you believe the temp dropped 20 degrees celsius with the use of a desktop fan? I don't know if CPU temperature thresholds were what was potentially causing the BSOD; I'm just guessing.

(5) So far - although I've not tested games, no BSOD as I've been typing this message.

----------------------------

Will update again when cpu fan arrives tomorrow.
 

Colif

Win 10 Master
Moderator
2) I didn't know you were overclocked. 7 years is a good run, it could just be wear and tear

i had an antec PSU in last PC, it went bang after 7 years. I think that age is probably a good time to start thinking about a new one. Even if it works fine.

I doubt PC would run for long without a CPU fan working. Not without all the case fans running much harder. Especially with CPU overclocked

4) WHEA errors can be caused by heat
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Update - Saturday:

I installed the new CPU fan to the heatsink (using 12 volt fan header). It works fine.

Here's the problem now - 2 of my drives (1 ssd, 1 hd) are not being detected by the operating system.
(I logged into Steam and my installed games were nowhere to be found lol).

I don't know a great about computers, but I'm inclined to believe the PSU is not supplying enough juice to operate the other 2 drives, after reconnecting the 12v fan header? Would that be a good guess?

If so - apparently it's time to buy a new PSU and am open to suggestions for a quality one if you think that's the case.
PSU age: April 2010. Brand: Antec Earthwatts 650w (EA650)
---------------------------
Note: I will turn off computer and disconnect 12v header to test that theory.
 

Jason3022

Distinguished
Apr 16, 2012
50
1
18,535
0
Ok thanks - I think I'm understanding what was happening at this point.

2 Issues:
-PSU failing
-CPU fan failed

Leading to:
-Overclocked cpu, causing crashes because of PSU no longer supplying the power needed to sustain, as well as overheating due to failed cpu fan.

I disconnected the new cpu fan (12v header) and sure enough all 3 drives appear in Windows.
----------------------------------
I was & still am really contemplating buying new computer parts prior to these recent bsod issues, this coming Black Friday/Cyber Monday.
Yet in the meantime, clearly I need a new PSU before I'm not even able to power the machine on.

-----------------------------------

I've only ever changed ram, gpu and cpu, hd/ssd,optical drives. I have never set up a new psu so this will be interesting. I'll need to watch Youtube videos and take some photographs of the current setup before pulling it apart.

Thanks again for the help.
 

ASK THE COMMUNITY

TRENDING THREADS