[SOLVED] Multiple BSOD. Can't find the source of the problem.

juanfropro

Commendable
Aug 25, 2018
17
0
1,510
Hi.

Not long ago I started having multiple random BSOD:

IRQL NOT LESS OR EQUAL
KERNEL SECURITY CHECK FAILURE
SYSTEM SERVICE EXCEPTION
SYSTEM THREAD EXCEPTION NOT HANDLED
PAGE FAULT IN NONPAGED AREA
MEMORY MANAGEMENT
KMODE_EXCEPTION_NOT_HANDLED
TIMER OR DPC INVALID
DPC_WATCHDOG_VIOLATION


Usually while gaming, other times when the computer was idling, when powering on, when powering off...

After looking into the issue I got the advice of reinstalling windows and after doing it it got better but I still get some BSOD every now and then during gaming, sometimes just as I launch a game, sometimes after a few minutes or after an hour, not consistent.

Usually I get the IRQL_NOT_LESS_OR_EQUAL (a)

I reinstalled Windows a couple of times.
I checked my drives and one HDD had some bad/corrupt sectors. I Have a new one the BSOD still happen.
I've ran Memtest multiple times. I have two equal sets of 16 gb (2X8GB) and I've tried swaping them out and switching the ram slots but still found no errors.
System File Checker doesn't find any issues.
Windows memory diagnostic doesn't find errors.
GPU Drivers are updated.
I went through all the items in device manager using the "update drivers" and two of them got updated.
I tried driver verifier and before and it catched one of the drivers of my usb wifi adapter. I changed to a wired connection and BSOD still happen, but verifier doesn't seem to find anything. Around that time I remember that after a BIOS Update I had to reset CMOS because my monitors didn't show anything when turning on the PC.
I updated bios to the latest version.
I replaced my PSU.

I have some random ideas but nothing solid.
Maybe it is the GPU/GPU slot? I tried with my previous gpu and still got the error
Could it be that all the RAM slots became faulty for some reason?
Recently I moved my PC and currently Everything comes from the same plug via a plug strip. Maybe that's too much? I already had some BSOD before moving the PC, but each device had its own plug.
I replaced the stock CPU cooler for a liquid kit and I remember swapping where the pump and the fans connected because the fans would always go at max speed. Now it works cold and usually quiet. The timing might correlate but couldn't say for sure maybe the BSOD got really bad about a month after.
Faulty motherboard or CPU.


Here are my PC components:
CPU: AMD Ryzen 5 3600 3.6GHz
RAM: Corsair Vengeance LPX DDR4 3200 PC4-25600 16GB 2x8GB CL16 (I expanded with other 16 GB but currently I have only 2 8gb installed)
Motherboard: MSI B450 Tomahawk
GPU: Sapphire Pulse Radeon RX 580 8GB GDDR5
SSD (OS W10): Samsung 850 Evo SSD Series 120GB SATA3 (5 years old)
SSD 2 (Programs): Crucial MX500 SSD 500GB SATA
HDD (Data): Seagate Barracuda ST4000DM004 4000GB Serial ATA II
Other: 4 port USB PCIe slot. I have a Valve Index Headset, a HOTAS and a webcam but currently only the keyboard, mouse and a USB wireless adapter are plugged in.

No Overclocking. Temps seem fine.

Here I have some of the latests minidumps.


Any pointers?
 
Last edited:
I have two equal sets of 16 gb (2X8GB) and I've tried swaping them out and switching the ram slots but still found no errors.
have you tried running with just 1 set in? Even if the sets are the exact same model, fact they weren't sold to work together can be enough to cause oddities. You sure the 2 x 8gb installed now are same set? Maybe run with 8gb in and see if it still bsod.

TIMER OR DPC INVALID isn't a normal BSOD I see. Odd errors always make me look at ram first.

I will ask a friend to convert the dumps, but he is asleep for a few more hours.
 
Hi, I ran the dump files through the debugger and got the following information: https://jsfiddle.net/6ycrn3kd/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.
File information:103020-6406-01.dmp (Oct 30 2020 - 14:22:16)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 32 Min(s), and 56 Sec(s)

File information:103020-6390-01.dmp (Oct 30 2020 - 13:23:51)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 38 Min(s), and 31 Sec(s)

File information:102720-6421-01.dmp (Oct 27 2020 - 11:47:51)
Bugcheck:PAGE_FAULT_IN_NONPAGED_AREA (50)
Probably caused by:memory_corruption (Process: chrome.exe)
Uptime:0 Day(s), 10 Hour(s), 37 Min(s), and 15 Sec(s)

File information:102620-6796-01.dmp (Oct 26 2020 - 14:18:32)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 3 Hour(s), 09 Min(s), and 13 Sec(s)

File information:102620-6421-01.dmp (Oct 26 2020 - 14:49:33)
Bugcheck:THREAD_STUCK_IN_DEVICE_DRIVER_M (100000EA)
Driver warnings:*** WARNING: Unable to verify timestamp for amdkmdag.sys
Probably caused by:dxgkrnl.sys (Process: Taskmgr.exe)
Uptime:0 Day(s), 0 Hour(s), 28 Min(s), and 43 Sec(s)

File information:102420-7250-01.dmp (Oct 24 2020 - 09:06:28)
Bugcheck:DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 2 Hour(s), 35 Min(s), and 07 Sec(s)

File information:102420-6906-01.dmp (Oct 24 2020 - 06:29:57)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 4 Hour(s), 28 Min(s), and 08 Sec(s)

File information:101820-7093-01.dmp (Oct 18 2020 - 01:58:15)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: svchost.exe)
Uptime:0 Day(s), 16 Hour(s), 08 Min(s), and 32 Sec(s)

File information:101820-6015-01.dmp (Oct 18 2020 - 05:17:03)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 12 Min(s), and 37 Sec(s)

File information:101820-4703-01.dmp (Oct 18 2020 - 05:32:07)
Bugcheck:DRIVER_VERIFIER_DETECTED_VIOLATION (C4)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 00 Min(s), and 10 Sec(s)
Possible Motherboard page: https://www.msi.com/Motherboard/B450-TOMAHAWK
There is a BIOS update available for your system. You are using version 1.D and the latest is 1.E. Wait for additional information before deciding to update or not. Important: Verify that I have linked to the correct motherboard. Updating your BIOS can be risky. Never try it when you might lose power (lightning storms, recent power outages, etc).

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
error 5 is amd drivers. that error - Thread stuck in device driver - is almost exclusively AMD so not sure I could blame ram for it. I see you have newer drivers since this error

error 6 is most likely the lan/wifi drivers, it mentions tcpip.sys which is a file used by windows to talk to the internet. also mentions ndis which is network driver interface specification.

error 10 caused by netr28ux - which you clearly worked out as you updated driver to a 2016 version but It may still be cause of error 6

Jul 15 2016netr28ux.sysRalink Wireless Adapter driver https://www.mediatek.com/

no, i didn't miss first 4 errors, they just aren't telling me anything that tells me what cause might be. since that driver is still on pc, it could be it.

my guess is you need a new WIFI card. its the theme of 2020, old WIFI cards just stop working. get one that supports 802.11ax as its not going to be an old card then.

these could be problems as well, anything older than june 2015 is suspect -
Dec 10 2013ViaHub3.sysVIA Labs Superspeed USB Host Controller Driver http://www.via-labs.com/
Dec 10 2013xhcdrv.sysVIA Labs eXtensible Host Controller driver http://www.via-labs.com/
 
Hi,

About the RAM Sticks:
When I tested them it was always the matching pair.

About the BIOS:
I did update updated the BIOS this morning before making this post and I got another BSOD but it got stuck at 0% and it didn't generate a dump. I think it was System service exception.

About the errors:
netr28ux.sys was (I think) one of the drivers of the USB wifi adapter. I'm currently using a wired connection and the adapter is unplugged but the driver shows up in the list generated by the command driverquery.

I do have have a PCIe USB (Installed at the beginning of the year) and looking in the installer disk it seems to be the one using the ViaHub3.sys and xhcdrv.sys drivers. In the device manager there is a eXtensible host VIA 3.0 - 1.0 (Microsoft) Controller but it doesn't show those drivers. Also ViaHub3.sys doesn't show up via the command driverquery.
It is supossed to support W10 but the drivers on the disk seem to be aimed at XP, W7 y W8. I could try to uninstal the card and uninstall those drivers, but I don't know how to find them.
 
I got a SYSTEM_SERVICE_EXCEPTION (3b) again (while trying to run a game as usual) but this time it generated a dump. Here it is.

EDIT: I ran the dump through winDbg preview and it had a lot of repeating "Kernel Symbols are Wrong". Is that part of the crash or something to do with winDbg?
 
Last edited:
driver is still on pc but if the wifi adapter isn't there, it won't be used.

I do have have a PCIe USB (Installed at the beginning of the year) and looking in the installer disk it seems to be the one using the ViaHub3.sys and xhcdrv.sys drivers. In the device manager there is a eXtensib
same applies to this, don't need to remove drivers if hardware isn't there.

EDIT: I ran the dump through winDbg preview and it had a lot of repeating "Kernel Symbols are Wrong". Is that part of the crash or something to do with winDbg?
um , @gardenman might know that answer better than I do, I think its the debugger.
 
There's some way you can fix them sometimes. It's been suggested to me to try different things but for me it seems to be a bad dump file most of the time. The symbols are downloaded from a Microsoft server. Sometimes it can't be reached too. So basically it's a Windbg issue, or a dump issue, depending on the dump.

This time it was a dump file issue. I had the same issue, with my older debugger. I edited out most of the errors.

Results: https://jsfiddle.net/98t0jvmr/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:110220-7187-01.dmp (Nov 2 2020 - 11:54:10)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process: ?)
Uptime:0 Day(s), 0 Hour(s), 04 Min(s), and 47 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
Not enough left of dump to even knoww what crashed.

wonder why there are 3 versions of this driver

I don't mean on your PC, I just mean there is amdgpio1 & amdgpio2 as well. I have number 2 installed, it might be related to chipsets.
yours is dated 2016, it seems to be newest version too, I was just checking
 
Hi,

I have 2 questions:

This one is a mix of curiosity and ignorance: Is there a way to "force" the BSOD to happen? I've seen people suggest using benchmark/strest-test programs like FurMark or OCCT in order to isolate possible causes but from what I've seen that is done to check stability while overclocking (wich I'm not doing at the moment). Also is that healthy for the PC?

The other one is that I found out about the Reliabilty history and there are a bunch of critical errors including "Hardware errors". Would that be useful for troubleshooting?
 
This one is a mix of curiosity and ignorance: Is there a way to "force" the BSOD to happen?

well, those hardware testers can be used to check stability of non overclocked systems as well. Can find problem hardware. I have heard that furmark can be bad for GPU so not sure I would suggest it,

there is also a way to check drivers to see if they are cause but I try not to suggest it as it can also break windows. Its part of windows so its ironic it also breaks it. I try to use every other tool first.

Reliability history
there is a way to save the history, can you click the link in bottom left. this lets you create a file that includes the errors... I haven't ever used it before but it may help here. If you upload file to a file sharing wesbsite I will see what it shows...
Some of those critical hardware errors are likely to just be restarts, as it counts an unexpected restart after a BSOD as a critical error. I know as I have seen it on my system before.

can you also run this https://www.sysnative.com/forums/pages/bsodcollectionapp/
 
Here is the history, but most of the readable text is in Spanish I don't know if that will be a big impediment or if there is a way to change it.

I tried bsodcollectionapp but it gets stuck writing "Waiting for SystemInfo" over and over.

In other news I just had a freeze (in a game as usual). I lost sound of the game (only the game, I had youtube in the background and is was going fine) and a moment later everything just froze, both screen frozen and no sound. I waited a while and then the PC shut down just by pressing the power button without holding it. It didn't left dumps or any error besides the "Windows didn't close correctly" event after booting up again.

EDIT: I have a day long weekend coming and since currently I can pretty much do nothing with my PC that I can't do with a tablet I could try making a clean windows install. Would that help troubleshooting?
 
This is weird:

I ran sfc /scannow and now it says that it found and repaired files. It generated a log, I can link it if it's useful.

The only time it had shown anything was when I started trying to work out the current problems a few months ago and I've run it several times since.
 
More news (not good ones). Had another crash (Not BSOD).
This time while watching Netflix (with Edge) .The screen turned reddish brown and the PC restarted.

The event viewer shower an Error with Origin:WHEA-Logger and ID:18. Here is the xml view of the error.

It didn't create a minidump but it did create one in C:\Windows\LiveKernelReports\WHEA. Here it is.

Damm this is frustrating🙁
 
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/uozj3xv4/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:WHEA-20201105-2126.dmp (Nov 5 2020 - 15:26:12)
Bugcheck:WHEA_UNCORRECTABLE_ERROR (124)
Driver warnings:*** WARNING: Unable to verify timestamp for ntoskrnl.exe
Probably caused by:? (Process: ?)
Uptime:0 Day(s), 0 Hour(s), 00 Min(s), and 04 Sec(s)

Comment: We are beginning to suspect the latest version of Windows is causing issues with the symbols, either that or the Microsoft server that serves them up.

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
I sort of assumed the reliability thing would load into the one on my system. It works for system info. How dumb lol.

I could have done without whea errors. WHEA = Windows Hardware Error Architecture
error called by CPU but not necessarily caused by it
I could probably guess what process was anyway. Not overly useful. its normally HAL or ntoskrnl. parts of windows that sit between programs and hardware.

WHEA means we have to look at hardware a bit more closely although they too can be caused by drivers. not often but can be.

The screen turned reddish brown and the PC restarted.

Things already tried
What was the other GPU you used?
We have had one BSOD that I only ever see with AMD GPU (Thread stuck in device driver)
Replaced PSU - What power supply do you have?
Already ran memtest a few times

I tried driver verifier and before and it catched one of the drivers of my usb wifi adapter. I changed to a wired connection and BSOD still happen, but verifier doesn't seem to find anything.
took systems life in own hands running that, it can leave PC in boot loop. ITs the tool I hadn't suggested yet, Since it finds nothing the problem is likely to be hardware somehow.


CPU: AMD Ryzen 5 3600 3.6GHz
RAM: Corsair Vengeance LPX DDR4 3200 PC4-25600 16GB 2x8GB CL16 (I expanded with other 16 GB but currently I have only 2 8gb installed)
Motherboard: MSI B450 Tomahawk
GPU: Sapphire Pulse Radeon RX 580 8GB GDDR5
SSD (OS W10): Samsung 850 Evo SSD Series 120GB SATA3 (5 years old)
SSD 2 (Programs): Crucial MX500 SSD 500GB SATA
HDD (Data): Seagate Barracuda ST4000DM004 4000GB Serial ATA II
Other: 4 port USB PCIe slot. I have a Valve Index Headset, a HOTAS and a webcam but currently only the keyboard, mouse and a USB wireless adapter are plugged in.

checked storage drives at all? bad storage can make drivers look bad just as much as ram can.

run Prime95 on CPU - All - https://www.mersenne.org/download/
Prime 95 Guide: http://www.playtool.com/pages/prime95/prime95.html
 
What was the other GPU you used?
The GPU I used was Sapphire R9 380 OC Nitro Dual-X 4GB GDDR5.

Replaced PSU - What power supply do you have?

I have a Corsair RM750 750W 80 Plus Gold Full Modular currently.

checked storage drives at all? bad storage can make drivers look bad just as much as ram can.

I did check and found some errors in a HHD I already replaced. Also I removed another old (8years) HDD just in case.

The OS SSD is oldish. As I mentioned I ran sfc /scannow and it did find some stuff that it dind't show previously. What are some tests I can throw at it and the other drives?
 
I ran prime95 for a little while and one of the workers showed this:

...
[Nov 6 21:56] FATAL ERROR: Rounding was 0.5, expected less than 0.4
[Nov 6 21:56] Hardware failure detected, consult stress.txt file.
...

It's late here now so I won't keep this airplane engine running.

I can keep at it tomorrow if that error is inconclusive.

EDIT: The current prime 95 is a little different than the one showed guide you linked, I guess the program has been more updated than the guided. There wasn't a "In-place large FFTs " so i used "Large FFTs" instead. Also there seemed to be one worker for every core and everything (CPU and RAM) was at 100%, so maybe the current version doesn't need one instance for every core?
 
Last edited:
Slightly positive update:

I managed to get in contact with the vendor and managed to start the warranty process for the CPU and Motherboard. Unless the problem comes from a <Mod Edit> invisible gremlin living in my PC's case those are the only components I haven't swapped in one way or another, so hopefully this will be the end of this issue.

I'll Update when I get something.

Thanks to everyone who helped!! 🏆
May we never need to talk about this again.
 
Last edited by a moderator:
Another good update.

After processing the warranty I'll be getting a RMA for the CPU and a reimbursement for the motherboard (is out of stock), but they didn't told me if they found a fault 😕

If after checking everything else replacing CPU and getting a new Motherboard doesn't fix the issue I guess I'll donate my whole PC to The Warren's Occult Museum as a haunted PC and I'll become a hermit in a mountain somewhere.
Hopefully by late next week I'll have everything assembled again and I'll se if it works.
 
I'm back with good news followed by bad news.

I got new CPU (same model) and motherboard (Asus TUF GAMING B550-PLUS ) and installed them. I've been without issue since last Friday, gaming, browsing with tons of tabs, some vr, netflix, using hotas, webcam, prime95 running without showing harware error messages... everything ok.

But today I had 3 BSOD in an hour:
Link
Link
Link

I also had some weird behavior, I'll try to timeline it for clarity:

I have been using the PC via remote desktop to run some Virtual Machines (Using VMWare) for work, yesterday I installed VMWare and today I enabled the CPU Virtualization option in BIOS at the beginning of the day and enabled remote access (I use a cabled connection). The day went without issue as usual but after finishing work when I moved one of my monitor's hdmi cable from my work's laptop to my PC I wasn't getting any signal. I turned the PC off (just pressed once the power button and waited until it was of).

Then, after turning it on again, while watching some youtube videos I was getting a lot of chrome errors (STATUS ACCESS VIOLATION I think) continuously for about 15-20 minutes a few seconds into a video, then I had the first crash.

I tried launching a game to see if is was a one time thing, after a while the game froze (I was still getting audio from it and from a background youtube video) and then I had the second crash.

I inserted the CD that came with the motherboard to see if I had missing some driver installation and while reading it the windows explorer froze. While that happened I turned the PC off to disable the virtualization in BIOS and I think that's when I had the third crash.

After that, while trying to see if the event viewer game any info everything kind of stopped working, I had mouse cursor and hovering over elements like buttons highlited them, also stuff like the caps lock light in the keyboard worked. But nothing else, the windows key wouldn't bring the windows menu up, I could not maximize or minimize anything.

I turned it off again and here I am frustrated as <Mod Edit> to the point of feeling sick. Half my PC is new parts at this point and it is still failing.

The only positive thing I can see is that maybe it is cause by VMWare or by enabling remote desktop? Before that everything seemed to work fine but wouln't be the first time I've had 2-3 days of everything working and the starting seeing BSOD again.

So far I've uninstalled VMWare and reseted the BIOS to default.

EDIT: Forgot to mention that in the event viewer i saw a few times warnings about amdkmdag stoping working and recovering correctly, don't know if its relevant but outside of the "PC turned off badly" errors is the one I kind of understand.
 
Last edited by a moderator:
I ran the dump files through the debugger and got the following information: https://jsfiddle.net/xL0vmskn/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.
File information:112420-7562-01.dmp (Nov 24 2020 - 12:38:23)
Bugcheck:APC_INDEX_MISMATCH (1)
Probably caused by:memory_corruption (Process: explorer.exe)
Uptime:0 Day(s), 0 Hour(s), 19 Min(s), and 18 Sec(s)

File information:112420-7187-01.dmp (Nov 24 2020 - 13:18:44)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process: ApplicationFrameHost.exe)
Uptime:0 Day(s), 0 Hour(s), 17 Min(s), and 48 Sec(s)

File information:112420-6812-01.dmp (Nov 24 2020 - 12:14:36)
Bugcheck:PAGE_FAULT_IN_NONPAGED_AREA (50)
Probably caused by:memory_corruption (Process: chrome.exe)
Uptime:0 Day(s), 0 Hour(s), 19 Min(s), and 30 Sec(s)
This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
you have a dvd player? they pretty rare in new PC

Driver CD: You better off going to website of the motherboard that the expect the driver CD to be up to date

RAM: Corsair Vengeance LPX DDR4 3200 PC4-25600 16GB 2x8GB CL16 (I expanded with other 16 GB but currently I have only 2 8gb installed)

i see 4 sticks of ram in again, and well, it could still be the fact they aren't a matching set that is cause of errors. Try running with 16gb and if errors continue, swap sets.
 
you have a dvd player? they pretty rare in new PC

Driver CD: You better off going to website of the motherboard that the expect the driver CD to be up to date

I ended up doing just that, going to the website and installing manually. Only one of the drivers wasn't updated to the last version.
It is and old one, I very rarely use it really, the PC isn't new, only a few parts are. I only mentioned because it seemed weird that the PC had a meltdown trying to read it the first time.


i see 4 sticks of ram in again, and well, it could still be the fact they aren't a matching set that is cause of errors. Try running with 16gb and if errors continue, swap sets.
I already did that, I tried alternating both between the sets and between the ram slots, but maybe some of those iterations were only for testing them with memtest (I've tried so many things I can't remember exactly). I'm tempted to just buy a single 2x16GB set just to discard that at this point. Would that have caused problems since the beginning?