Question Have Been Having Lots of BSODs and Hard Freezes on a New Computer, Finally Narrowed It Down, Need Help Solving It

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Feb 12, 2021
60
0
30
  • 64 bit
  • New System started with Windows 10
  • Picked up the prebuilt from cyberpowerpc here

Bought November 30th 2020

Its been having some odd crash issues for a while now. I updated Windows 10, and updated video drivers, and have checked for driver updates on other parts to no avail, though I may be missing something.

I've noticed a few blue screens, saying WHEA Uncorrectable Error, I've had the computer just freeze randomly, even while doing nothing (mouse can't move, no audio comes through, ctrl+alt+del does nothing, and I have to manually power cycle the PC to get it working), and I've left my computer on, and (the most common instance) I've walked away from the computer or left it on, and when I get back to it the PC has power cycled on its own due to some problem (whether its the blue screen, the freezing, or some unknown third issue, I don't know). These issues seem to happen indiscriminately, and occur whether I'm actively using the PC (though uncommon) or if barely anything is open and running. The PC isn't overclocked as far as I know, and the temps seem fine and sit at around 40c, and considering I'm in a hot and humid environment I'm not complaining.

They all share the same keywords, task category, event id, etc.


I've been having crashes pointing to ntoskrnl.exe for months now.

I have done memtest86, 8 passes, no failures. (Technically, two 4-passes back to back)

I've checked both my OS SSD and my HDD for disk errors - no problems.

I've run an sfc scan to check for windows corrupted files - none found.

The last step, running driver verifier, has crashed my pc before getting to windows 3 times in a row. Using BlueScreenView to look at the DMP file points all of them to RzDev_0226.sys.

Great, I have a culprit. Unfortunately, is there no way to roll this driver back, or update it? There are only two razer devices plugged in to my PC, the huntsman elite keyboard, and a firefly cloth edition mousepad. This is a brand new PC (purchased in December) with a 10850k and an rtx 3090, so I'm doubtful its anything due to having an old PC, and like I said, I ran those previous tests expecting it to be a hardware issue already, as this crash never occurred on my old PC.

So... what can I do? Beg Razer to update their drivers causing my PC to crash, or do I have to buy a different keyboard? Is there a quicker solution?

I'm legitimately desperate to get this fixed. I've been struggling with this problem for months. I've been searching for the file and have found no solutions..
 
Still working on solving this. I was unable to test moving the M.2 to another slot, because they glued the standoff in place, and I don't have any other standoffs of my own to use, and I don't presently have good tools available to maneuver inside the tight space in the case to force it out. I've tried to get in contact with them about them replacing the SSD/MOBO, but I haven't heard back yet. I assume they'd do a file transfer for me in that instance, but if not, I'll need to buy a backup drive to store files on. As of now, I still get blue screens, though seemingly not as often lately, not that it doesn't still happen.
 
I wouldn't assume they will back up your files, I would get a 2nd drive and backup the files. Better safe than sorry.

Glued standoff down. I see cyberpower aren't completely finished with doing stupid things.

keep giving us dumps in case it is something else.
 
Alright, finally back with an update!

Got it sent in and just got the PC back. I imaged my drives, sent it in, it at least looks like, based on the receipt, they replaced the SSD and motherboard as well as the network card, maybe, and then sent it back. I restored the images so I could get my PC otherwise back to where it was.

I got a blue screen relatively soon after. I was trying to stress test through non-stress tests, so I played a game that tends to crash it (Total War: Warhammer II), streamed my screen on Discord, watched someone else's streamed game on Discord, and had video playing on another monitor.

KMODE_EXCEPTION_NOT_HANDLED

The dump file is here:

https://www.dropbox.com/s/7hu4ryvmp3exec4/061621-10000-01.dmp?dl=0

There are other dump files from before I sent the PC in that I haven't uploaded, but I figure that won't be as helpful.

I'd really like to nail this down and fix my pc at some point. Its been 8 months.

Edit: After that blue screen, I did a sfc /scannow, it found some stuff and supposedly fixed some things. Also did a Windows Update and ran a chkdsk /f. Haven't had a blue screen since, at least not yet, but if there's anything I know about this issue, it likes to go dormant for some periods of time and then crop back up in full force. Curious what this last blue screen was really caused by, but I'll keep trying to force a blue screen if possible.
 
Last edited:
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/a13s2fdm/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:061621-10000-01.dmp (Jun 16 2021 - 21:43:49)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:ntkrnlmp.exe (Process: System)
Uptime:0 Day(s), 0 Hour(s), 43 Min(s), and 14 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.

Edit: Results from the dump file below: https://jsfiddle.net/xdo1h8jL/show
 
Last edited:
bios on new board is newer than one on old.

I would have assumed there be a newer version of Intel Management Engine interface to match bios but maybe not. Can check on here - https://www.intel.com.au/content/www/au/en/support/intel-driver-support-assistant.html

I wonder what WIFI card they are using. Not sure why they just don't use the Wi-Fi variant of the motherboard, must save money, probably got a warehouse full of the WIFI cards

You don't see SoundBlaster's a lot anymore
 
I attempted to do a clean install of windows from a freshly created recovery USB last night, and still ended up getting a blue screen today after a bit of relatively light use:
https://www.dropbox.com/s/i3dwpvx76rqzt80/061921-7343-01.dmp?dl=0

I downloaded the Intel DSA, it shows there are no new drivers for my system. I seriously have no idea what is wrong with this PC, even after they replaced the SSD and MOBO, and with a fresh Windows install, I'm still getting blue screens. I've run out of ideas, so I am completely at your mercy at this point as far as trying to solve it goes.

As for the SoundBlaster, I purchased an external audio card because the headphones I used with my old PC utilized digital audio, and this new PC didn't have a connector for digital audio, so I ended up just buying a sound card that had one.

Also, as a side question, if this most recent blue screen is still the same problem as the others, and the problem has not been solved, is there any harm and me rolling back to my imaged copy of my PC before I reset everything? Easier on me to not have to set stuff up again. Of course, if this problem is different and unrelated to the old problem, then I'll stick with it. I also know its probably nearly impossible to truly tell if this one is related to the old ones, so I'll stick with the reset for now until instructed otherwise.
 
Last edited:
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/jb37vfar/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:061921-7343-01.dmp (Jun 19 2021 - 15:17:47)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: SNKRX.exe)
Uptime:0 Day(s), 3 Hour(s), 43 Min(s), and 38 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
the last bsod in Gardenmans report mentions DirectX

dxgkrnl!DXGDXGIKEYEDMUTEX::CloseFromDevice+0x254

99% of the time it mentions DirectX its GPU drivers to blame

you crashing connecting to plex could also be gpu drivers

try running ddu in safe mode and reinstall gpu drivers - https://forums.tomshardware.com/faq...n-install-of-your-video-card-drivers.2402269/

the 1% its not gpu drivers its sound. just thought i mention that as it could be the usb thing. really small chance but worth mentioning.
 
the last bsod in Gardenmans report mentions DirectX

dxgkrnl!DXGDXGIKEYEDMUTEX::CloseFromDevice+0x254

99% of the time it mentions DirectX its GPU drivers to blame

you crashing connecting to plex could also be gpu drivers

try running ddu in safe mode and reinstall gpu drivers - https://forums.tomshardware.com/faq...n-install-of-your-video-card-drivers.2402269/

the 1% its not gpu drivers its sound. just thought i mention that as it could be the usb thing. really small chance but worth mentioning.


I have gone into safe mode, clean uninstalled drivers with said tool, rebooted into safe mode again, installed the downloaded, newest drivers, and am now back on. Will report back if I get another BSOD.


CyberPower is notorious for using crappy PSUs. Considering that this rig isn't likely to run properly on anything less then tier A unit, question I have is what PSU model they put in?


It claims its "1,000 Watts - Standard 80 Plus Gold Power Supply (Included) ", but the specific model I'm not sure. I'll get the computer unplugged and take a look in a bit and will post it when I do!
 
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/2pqox9f8/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:061921-7359-01.dmp (Jun 19 2021 - 17:09:40)
Bugcheck:UNEXPECTED_KERNEL_MODE_TRAP (7F)
Probably caused by:memory_corruption (Process: Spotify.exe)
Uptime:0 Day(s), 1 Hour(s), 51 Min(s), and 02 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.

Edit: Results for dump below: https://jsfiddle.net/689shtuq/show
 
Last edited:
I did indeed crash again. The dump file is here: https://www.dropbox.com/s/gcwokxypsw8mmwa/062021-6781-01.dmp?dl=0

This would be after having done the graphics drivers clean installation.

In case it isn't said enough, thanks to all of you for helping me out still. I think I'd have gone crazy without all of your help in debugging and trying to find the cause of these issues.

Edit: Of note, I was trying out a new TRPG (Wildermyth), and had just opened it when it crashed. Plex was running in the background, but I was not streaming or using it or anything of the sort, if that matters. (In response to the fiddle.) Also of note, Windows Update did update some drivers yesterday, and I think some audio drivers were on the list. My Creative Sound BlasterX G5 doesn't seem to have had an official driver update since 2018, at least per their support website here: https://support.creative.com/downloads/welcome.aspx?nDriverType=1#type_1
 
Last edited:
It claims its "1,000 Watts - Standard 80 Plus Gold Power Supply (Included) ", but the specific model I'm not sure.
People on the Cyberpower forums even tell users to avoid the standard PSU so it could well be the cause of your problems. Better off with a Corsair, EVGA or Seasonic unit, they deliver what they say they do.

okay, that last bsod was caused by a page fault. Page faults are normal operations of windows, if CPU doesn't have info it needs on its caches, it creates a page fault and requests it from ram. Normally you see 1 page fault, your error is about... at least 58 of them, all on same part of ram. over and over and until it crashed. Never seen that many before.

i guess that is what a kernel mode trap looks like. stack overflow error.

no drivers mentioned in error.

have i suggested memtest before?
Try running memtest86 on each of your ram sticks, one stick at a time, up to 4 passes. Only error count you want is 0, any higher could be cause of the BSOD. Remove/replace ram sticks with errors. Memtest is created as a bootable USB so that you don’t need windows to run it
 
People on the Cyberpower forums even tell users to avoid the standard PSU so it could well be the cause of your problems. Better off with a Corsair, EVGA or Seasonic unit, they deliver what they say they do.

okay, that last bsod was caused by a page fault. Page faults are normal operations of windows, if CPU doesn't have info it needs on its caches, it creates a page fault and requests it from ram. Normally you see 1 page fault, your error is about... at least 58 of them, all on same part of ram. over and over and until it crashed. Never seen that many before.

i guess that is what a kernel mode trap looks like. stack overflow error.

no drivers mentioned in error.

have i suggested memtest before?
Try running memtest86 on each of your ram sticks, one stick at a time, up to 4 passes. Only error count you want is 0, any higher could be cause of the BSOD. Remove/replace ram sticks with errors. Memtest is created as a bootable USB so that you don’t need windows to run it


I've run memtest86 before, 8 passes (4 at a time), with both sticks still plugged in, and there were 0 errors. Is it necessary to unplug one of the ram sticks and test them individually? If it requires one at a time to be definitive, I'll run them tomorrow.

This wouldn't be the first time we've had this many page faults however. See here:


Error 1 text is the same line over and over. Not seen it before
just a piece of it, entire dump text is all the same Page fault in a particular address on C.
fffff884788a1000 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1190 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1320 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a14b0 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1640 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a17d0 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1960 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1af0 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1c80 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
fffff884788a1e10 fffff80537c0393c : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x3c
Page faults are normal operations of windows, regardless of what they sound like. If Windows cannot find what it needs in ram, it goes to Page file. That is what a page fault is.
Its normal fine, but not for it to want to access the same location about 50 times in a row before giving up. So the drive wasn't responding at all.

Error 2 is the Nvidia driver Verifer pointed at
Error 3 is a USB item, HIDCLASS also covers any USB
Error 4 is also HIDCLASS (starts wondering about the Soundblaster since its USB)
Error 5 victim - audiodg.exe = audio device graph isolation.
used by audio drivers.
error 6 is the 2nd time I ever seen something like Error 1. Its the same address again.

since error 2 blamed a part of Nvidia drivers, which no longer exists on crash 5, lets see what other audio drivers there are
Mar 14 2019nvvad64v.sysNvidia Virtual Audio driver http://www.nvidia.com/

Oct 08 2020ksUSBa64.sysSound BlasterX Katana USB driver (Creative Technology Ltd.)
Oct 16 2020nvhda64v.sysNvidia HDMI Audio Device http://www.nvidia.com/

Dec 22 2020RTKVHD64.sysRealtek Audio System driver https://www.realtek.com/en/
Do you use realtek at all?
the Nvidia drivers only work over HDMI


Error 1 & 6 make me think we need to look at the ssd
what are specs of the PC?

But they replaced the SSD and MOBO and this error is still continuing.

As for the PSU, all I can readily do for that at the moment is contact them and send it back to them (again) to get it replaced, and just getting in contact with them can take weeks, so there's not much I can do on that front immediately. I'll begin my phone tag with their support line and try to get in contact again, but I don't expect any response anytime soon. I had asked them to replace the PSU this last time I sent it back, along with the SSD and MOBO, because, with USB devices plugged into the PC, there is noticeable coil whine while the PC is powered off, but, according to the receipt of sorts, and the fact the coil whine still exists, it doesn't look like they did that.

Do you happen to have any insights into the secondary fiddle run there as well? It only has a single page fault, but it also says:
"
BugCheck A, {0, ff, 0, fffff8032600399f}
*** WARNING: Unable to verify timestamp for win32k.sys

"
which is new as far as I know.

Are we thinking its not graphics/audio driver related anymore? I suppose graphics drivers have been mostly cleared since the clean installation, but I can't particularly say the same for the audio drivers, though I don't know if they could cause all these blue screens.
 
Are we thinking its not graphics/audio driver related anymore? I suppose graphics drivers have been mostly cleared since the clean installation, but I can't particularly say the same for the audio drivers, though I don't know if they could cause all these blue screens.
There are just too many different errors. You squash one "faulty" driver only to have next one crash. In my book that's pretty sure hardware problem, with top 2 suspects RAM and drive. RAM you tested, drive got replaced. At this point it's really desperate mode.
As for the PSU, all I can readily do for that at the moment is contact them and send it back to them (again) to get it replaced
First thing is, we really need to know what PSU model it is. If it's crap, calling them for replacement is not going to achieve anything - because they will replace it with another crap unit. Or most likely they'll do nothing because PSU "works".
 
SO far only dump I haven't commented on is the last one as Gardenman hasn't converted it for me yet.

win32k.sys is part of windows. If its corrupted, something is corrupting parts of windows. Power influences everything, so I don't want to keep pointing to it... but if ram is okay, and they replaced the storage drives... there are only so many things that can corrupt data. Bad power is one of them.

yeah, playing whack a driver just to have another appear is hardware normally.

If it's crap, calling them for replacement is not going to achieve anything - because they will replace it with another crap unit. Or most likely they'll do nothing because PSU "works".
yep. especially when they make the crap psu. or use a 3rd party one.

I would buy a Corsair or EVGA or Seasonic PSU and put it in yourself. Or if that breaks warranty, pay them the difference and ask them to install one. They sell them after all. You can generally ask them to change it before you buy,
 
So for another status update, they are paying for the shipping label to send it back. I requested they put a better power supply in, and they stated they aren't allowed to "upgrade" the part, and that realistically all they can do is replace it with the same model of part.

With that in mind, and them fronting the bill, I will send it to them to replace it. They will be replacing the PSU, and I told them to also replace the RAM, or at least thoroughly test it, while they are at it, in the off chance that it is an issue. I also told them to actually do their job and test the damn thing before sending it back, considering they sent it back still broken the last time as well, even though I told them about the PSU problems then as well. With this, the SSD, MOBO, PSU, and potentially RAM will have been swapped.

I understand the PSU model is crappy, so I am expecting to have to replace the PSU myself, once it returns. But for now, I'll be throwing it back to them, if for anything, for spite and making them do their jobs and work on fixing it themselves, considering the past 7 months I've put in to debugging myself with the help of all of you. So thanks for all the continued support so far, I sincerely can't thank you enough. I'll be back in around a month or so with news, hopefully positive, but potentially instead with a new dump file to be scanned.
 
oh sneaky. 2 on the same post...

that one occurred because of a Double Fault. (tennis anyone?)
A double fault can only occur following an interrupt or exception, which are signals that tell a computer’s CPU to halt any currently running tasks in order to deal with important system events, such as the addition of new hardware or a program making an invalid memory request. Interrupts and exceptions are normal functions of modern computers and are accomplished by running a special type of software known as an interrupt handler or exception handler. The CPU will attempt to run one of these highly specialized programs and then resume normal operation. When a handler encounters an error or cannot correct the condition that led to the exception or interrupt, a double fault has taken place.

Unlike interrupts and exceptions, a double fault is a serious error that is not expected during normal operation. The system will attempt to run a special double fault handler, but in contrast to other types of handlers, it only collects diagnostic information and does not fix the problem. In many cases, unsaved work will be lost. A “stop error,” more infamously known as the "blue screen of death," may be displayed. It is also possible for a third error to occur when the system tries to run the double fault handler, something known as a triple fault.[

Common causes of double faults include physical problems in the computer’s memory, CPU, or video card as well as bugs in a device driver or other system software.
https://www.easytechjunkie.com/what-is-a-double-fault.htm

i rarely see these, i thought it was something else at first.

this happened on CPU, it hit a breakpoint and then had 2 doublefaults, before it accessed ram and created the error. the doublefaults were the errors

before you return it, try running this and see if it can identify any drivers - https://forums.tomshardware.com/threads/driver-verifier-instructions.3686888/
 
Had another crash (Before I saw your reply), so here it is: https://www.dropbox.com/s/uv2e5wffhnxq9z1/062121-10031-01.dmp?dl=0

As for driver verifier, I did that back before I sent the PC to get fixed the first time, and it pinged my razer drivers, which I then removed, and my nvidia rtx voice drivers (which I no longer have, but at the time I also then removed them), and this process continued for a while until it was just crashing from some Windows stuff, if I remember correctly. Right now, its storming really bad out, so I will not be on said PC for the time being, and I'll be sending the PC to them on Wednesday, so if I have time after work tomorrow I'll try to run driver verifier again and get back to you on that one.

Edit: Had a second crash: https://www.dropbox.com/s/pejjxcx9xs6ykxt/062121-9703-01.dmp?dl=0

This one looked like my resolution dropped on the blue screen, and my second monitor stopped displaying like it was unconnected while it was happening.

Edit: And a third one. Interesting error. Think it mentioned missed clock count or something. https://www.dropbox.com/s/p1aythc1tf3dpb7/062121-10000-01.dmp?dl=0
 
Last edited:
I ran the dump files through the debugger and got the following information: https://jsfiddle.net/gp2hLxre/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.
File information:062121-9703-01.dmp (Jun 21 2021 - 19:43:48)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process: ChromaVisualizer.exe)
Uptime:0 Day(s), 2 Hour(s), 41 Min(s), and 25 Sec(s)

File information:062121-10031-01.dmp (Jun 21 2021 - 17:01:59)
Bugcheck:KERNEL_SECURITY_CHECK_FAILURE (139)
Probably caused by:memory_corruption (Process: System)
Uptime:1 Day(s), 16 Hour(s), 06 Min(s), and 04 Sec(s)
This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
errors aren't consistent. that is one reason to look at PSU.

I helped someone else with a cyberpower who had same psu & GPU, and I am curious if the pcie power cables for 8 pins have individual cables running to them from the PSU or if they used the same cable with 2 connectors on the end. They should have used 2 cables but I have seen them use the same cable and that isn't ideal on a 3090. It isn't good for the cable either.

I don't know if that would be enough to cause the errors but maybe, if the cable is trying to draw more than PSU can give, it will cause instability in rest of system