Question Very frequent BSODs, need help identifying Driver issue

Nov 30, 2020
15
1
10
Hi, my recently built rig has been ramping up in BSODs over the past month since it was built, from roughly once every couple days to the current BSOD 2 minutes after login. The BSOD really ramped up after I reset the PC with a clean install.

Parts:
MSI X570 TOMAHAWK WIFI
RYZEN 5 3600
4X8 GSkill Trident Z Neo 3600 16C( only 2 modules in right now, both memtest86 passed, running system default 2133HZ)
MSI 3x Ventus OC 3080

Errors are very frequent, and vary alot.
Here are some of the recent ones:
SYSTEM_SERVICE_EXCEPTION
IRQL_NOT_LESS_OR_EQUAL
DRIVER_IRQL_NOT_LESS_OR_EQUAL
VIDEO_SCHEDULER_INTERNAL_ERROR
APC_INDEX_MISMATCH
KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE
KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE_WITH_RAISED_IRQL
DPC_WATCHDOG_VIOLATION

I will update this post with dump logs as soon as Windows Memory Diagnostics finishes running.

I clean installed the PC this morning, and have been slowly reinstalling drivers through safe mode. Any pointers on how to fix these BSOD would be greatly appreciated, I really need this computer working for my final project.
 
Nov 30, 2020
15
1
10
Things I've tried so far:

Boot from safe mode - Seems to be very stable, hasn'tcrashed at all.
Disabled all non-windows services in msconfig - More stable, still crashes but takes 5-10 minutes as opposed to <5 minutes to crash
Dump Files from these crashes: https://1drv.ms/u/s!AgBsOh_Kj0nlkhECpuiwEpdIh-iu?e=tQbc7t

Reseated RAM/GPU
Memtest both sticks individually and together for 8 passes, no errors
Enabled and disabled XMP 3600 profiles

Reinstalled latest version of Nvidia drivers, wiped old with DDU first.
Installed iCUE.

CHKDSK - No errors
SFC - No errors
DISM - No errors
 
Last edited:
Jul 19, 2020
52
1
45
Hi, my recently built rig has been ramping up in BSODs over the past month since it was built, from roughly once every couple days to the current BSOD 2 minutes after login. The BSOD really ramped up after I reset the PC with a clean install.

Parts:
MSI X570 TOMAHAWK WIFI
RYZEN 5 3600
4X8 GSkill Trident Z Neo 3600 16C( only 2 modules in right now, both memtest86 passed, running system default 2133HZ)
MSI 3x Ventus OC 3080

Errors are very frequent, and vary alot.
Here are some of the recent ones:
SYSTEM_SERVICE_EXCEPTION
IRQL_NOT_LESS_OR_EQUAL
DRIVER_IRQL_NOT_LESS_OR_EQUAL
VIDEO_SCHEDULER_INTERNAL_ERROR
APC_INDEX_MISMATCH
KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE
KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE_WITH_RAISED_IRQL
DPC_WATCHDOG_VIOLATION

I will update this post with dump logs as soon as Windows Memory Diagnostics finishes running.

I clean installed the PC this morning, and have been slowly reinstalling drivers through safe mode. Any pointers on how to fix these BSOD would be greatly appreciated, I really need this computer working for my final project.
had a very simmilar problem with my pc im intel platform using tomohawk board didnt really figure out the problem but i rma my cpu and mother baord problem seems to be gone as i had simmilar issue and simmilar error codes and a few ppl gave me few tips on how to try to fix it i could link the thread

here is the link https://forums.tomshardware.com/threads/bsod.3662272/page-2#post-22086353
 
Nov 30, 2020
15
1
10
had a very simmilar problem with my pc im intel platform using tomohawk board didnt really figure out the problem but i rma my cpu and mother baord problem seems to be gone as i had simmilar issue and simmilar error codes and a few ppl gave me few tips on how to try to fix it i could link the thread

here is the link https://forums.tomshardware.com/threads/bsod.3662272/page-2#post-22086353

Ideally I would not like to RMA just yet, as I still need my computer for my final projects this semester. If it came down to it though, I would be willing to RMA my hardware once the semester ends late December.

Side note: The system seems to severely stutter about 10 seconds after login, I checked Task Manager and it seems to me that my CPU usage is really high right after login, but cools down afterward to around 30-40%. This might be because I have a ton of non-microsoft services disabled through msconfig.

I went through each option in Device Manager and updated each one, found 3 that could be updated, restarted and was stable for 20 minutes before crashing with these logs.
https://1drv.ms/u/s!AgBsOh_Kj0nlkhnBO22PD22vITcB?e=BcSnQ2
 
Ideally I would not like to RMA just yet, as I still need my computer for my final projects this semester. If it came down to it though, I would be willing to RMA my hardware once the semester ends late December.

Side note: The system seems to severely stutter about 10 seconds after login, I checked Task Manager and it seems to me that my CPU usage is really high right after login, but cools down afterward to around 30-40%. This might be because I have a ton of non-microsoft services disabled through msconfig.

I went through each option in Device Manager and updated each one, found 3 that could be updated, restarted and was stable for 20 minutes before crashing with these logs.
https://1drv.ms/u/s!AgBsOh_Kj0nlkhnBO22PD22vITcB?e=BcSnQ2
you got something going wrong in your power management.
i only looked at the minidumps. let me see if you have a kernel dump to look at.

start by turning off any sleep functions in windows control panel power management. set it to high performance just so your system does not crash.

you would want to also go into your bios and turn of any special power management features that you might have. Anything that tweaks clock speeds when the system goes idle or tells hardware to go to sleep.

in windows you should start cmd.exe as an admin then
run
powercfg.exe /energy
and copy the report to your documents folder and take a look at it.

you might also run
powercfg.exe with these options to see what it shows. (might crash machine if there are bugs in the cpu/bios or support drivers)
/systemsleepdiagnosticsGenerates a diagnostic report of system sleep transitions.
/systempowerreportGenerates a diagnostic system power transition report.
info on Powercfg command-line options | Microsoft Docs
 
you got something going wrong in your power management.
i only looked at the minidumps. let me see if you have a kernel dump to look at.

start by turning off any sleep functions in windows control panel power management. set it to high performance just so your system does not crash.

you would want to also go into your bios and turn of any special power management features that you might have. Anything that tweaks clock speeds when the system goes idle or tells hardware to go to sleep.

in windows you should start cmd.exe as an admin then
run
powercfg.exe /energy
and copy the report to your documents folder and take a look at it.

you might also run
powercfg.exe with these options to see what it shows. (might crash machine if there are bugs in the cpu/bios or support drivers)
/systemsleepdiagnosticsGenerates a diagnostic report of system sleep transitions.
/systempowerreportGenerates a diagnostic system power transition report.
info on Powercfg command-line options | Microsoft Docs
H150iRGBPROXT
this device looks like it is on a UBb 3 port that is being put to sleep. if it is a AIO cooler that would be bad news.

still looking at the kernel dump, it is not very helpful but i can look at internal error logs.
 
H150iRGBPROXT
this device looks like it is on a UBb 3 port that is being put to sleep. if it is a AIO cooler that would be bad news.

still looking at the kernel dump, it is not very helpful but i can look at internal error logs.

sorry, looked at a bunch of the power/sleep stuff in the debugger. I just don't know how it should be working to tell what is wrong.
I do think there is some issue since stuff was going to sleep when the system was only up for 2 minutes.
on some of the minidumps it showed power management problems, changes that happened that windows did not request.
bogus values being passed to windows power management functions.
one case a CPU core woke up from sleep but was not told to wake up by windows. Something is wrong in the cpu/bios or chipset driver
settings. maybe AMD pushed out a bad bios fix to the motherboard vendors. I am seeing a lot of low level problems with people with this CPU.
Identifier = REG_SZ AMD64 Family 23 Model 113 Stepping 0
ACPI\AuthenticAMD_-AMD64_Family_23_Model_113-_AMD_Ryzen_5_3600_6-Core_Processor
and only this series of CPU. similar issues from 3 different motherboard vendors now.


i think the driver would be amdppm.sys
you should also look in the list off running services and see if you see one for AMD processor driver and see if it is running
you might need to start it.
AMD Processor Driver (AmdPPM) Service Defaults in Windows 10 (revertservice.com)


more info from here Device Guard VBS BSOD: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M amdppm.sys | Page 3 | Sysnative Forums
here is some text from a post about amdppm.sys and certain bios settings:
cut from post below:

I believe this is a security feature, rather than a bug.
CPPC is designed hand CPU preferred core control from the UEFI/chipset to the O/S, if CPPC initializes upon boot in tandem with strict enforcement measures of SecureBoot + CSM disabled, amdppm.sys attempts to to write or read to memory areas of the BIOS restricted by the hypervisor or UEFI, triggering the crash.
Enabling CSM allows one to use SecureBoot, HVCI, IMMOU, VBS, and CI-config, by relaxing UEFI/hypervisor security restrictions.
This allows greater compatibility at the expense of security.
 
Last edited:
Nov 30, 2020
15
1
10
sorry, looked at a bunch of the power/sleep stuff in the debugger. I just don't know how it should be working to tell what is wrong.
I do think there is some issue since stuff was going to sleep when the system was only up for 2 minutes.
on some of the minidumps it showed power management problems, changes that happened that windows did not request.
bogus values being passed to windows power management functions.
one case a CPU core woke up from sleep but was not told to wake up by windows. Something is wrong in the cpu/bios or chipset driver
settings. maybe AMD pushed out a bad bios fix to the motherboard vendors. I am seeing a lot of low level problems with people with this CPU.
Identifier = REG_SZ AMD64 Family 23 Model 113 Stepping 0
ACPI\AuthenticAMD_-AMD64_Family_23_Model_113-_AMD_Ryzen_5_3600_6-Core_Processor
and only this series of CPU. similar issues from 3 different motherboard vendors now.


i think the driver would be amdppm.sys
you should also look in the list off running services and see if you see one for AMD processor driver and see if it is running
you might need to start it.
AMD Processor Driver (AmdPPM) Service Defaults in Windows 10 (revertservice.com)


more info from here Device Guard VBS BSOD: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M amdppm.sys | Page 3 | Sysnative Forums
here is some text from a post about amdppm.sys and certain bios settings:
cut from post below:

I believe this is a security feature, rather than a bug.
CPPC is designed hand CPU preferred core control from the UEFI/chipset to the O/S, if CPPC initializes upon boot in tandem with strict enforcement measures of SecureBoot + CSM disabled, amdppm.sys attempts to to write or read to memory areas of the BIOS restricted by the hypervisor or UEFI, triggering the crash.
Enabling CSM allows one to use SecureBoot, HVCI, IMMOU, VBS, and CI-config, by relaxing UEFI/hypervisor security restrictions.
This allows greater compatibility at the expense of security.

First of all, thank you for your time and response.
Since I last posted, I have reseated every piece of hardware on my motherboard, and done a fresh install of windows from a USB. After loading in, I have not installed any driver except the latest Nvidia driver for my graphics card.
This was stable for about 5 hours, at which point it began crashing again as soon as I started working. I suspected this may have been due to me signing into Microsoft OneDrive, and so have disabled that. However, the crashes continue.
After a few crashes, I have come back to the forum. I have done as you suggested in your first post, and set all of my power settings to Never sleep and/or maximum performance. A copy of the /energy scan can be found in the dump log folder. A number of errors have been found. As far as the AIO being connected to USB3, I am not sure how else to connect the AIO. As far as I know, the AIO must be connected to some USB 3 to be powered. Is this not the case?

Here is the latest dump logs.
https://1drv.ms/u/s!AgBsOh_Kj0nlkiF6nS9Fb2q5nkml?e=4EWsRk
 
run msinfo32.exe and look at the entry for bios mode and tell me what it says.
then run services and see if you can find a service that controls the amdppm and see if it is currently running/started

-----------
for the usb port being suspended, start control panel, look at the hardware then device manager, find the usb root hub and right click
and select properties. then go to the power management tab and make sure
Allow the computer to turn off this device to save power is not selected. i would check other usb hubs and devices and see if they have a power management tab that you can change the save power setting. you might even move the connector to its own usb tree/hub by itself.

in one of your bugchecks the usb port to your aio was suspended. The system was up for 2 minutes which was plenty of time for the port to be working again. 2 minutes is too fast for the system to be shutting down ports to save power.
which was why I was thinking something wrong with the power management.
----------

the reason for the first two checks was because of a NSA report that came out in june that indicated that a PC can be hacked by using some of the interfaces to the bios that motherboard vendors provide. ie using the interface files to send a program into bios so it can be loaded again after a computer has been wiped. I think the response to this was to block some of these bios interfaces if you boot in a secure UEFI mode and allow them if you boot in the non secure legacy mode (called something else now, I think)

so in a uefi mode the service for the AMD power management might be turned off now
and for a legacy boot it might be turned on.

it might also depend on where you get the file, IE from Microsoft windows install or from a AMD chipset update provided by AMD or the motherboard vendor.

this is speculation and is my guess. But there is something going bad with the power management

-maybe the usb port being suspended stopped the cpu cooling for 2 minutes, the cpu gets hot and the system tries to throttle the cpu to a lower clock speed via the power management interface and it was blocked got a bugcheck.

here is the amd buglist for the CPU (it could be a bug that is not patched in the microcode)
23 decimal = 17 hex
113 decimal = 71 hex
3F hex = 63 decimal
your cpu was
AMD64 Family 23 Model 113 Stepping 0
which would be family 17h model 71h
so the docs do not include new bugs for models after model 63
i would guess your cpu is a zen2 which has some new hardware features to block Spectre 4 attacks.
(you have a newer cpu with some CPU hardware changes, but I can not see a list of known bugs for it)

Intel's Spectre 'Variant 4' Performance Tested: Speculative Store Bypass (techspot.com)
Zen 2 - Wikipedia

Revision Guide for AMD Family 17h Models 30h-3Fh Processors
 
Last edited:
Nov 30, 2020
15
1
10
SMBIOS Version 2.8. I downloaded the latest non-beta BIOS off the MSI website and flashed it. Edit: BIOS mode is UEFI, I was looking at the wrong line.

I can't find anything relating to amdppm in Services.

USB Root Hub power management has been fixed.

Crashing has definitely gone down, but is still occuring. Here are the latest dump files if you're interested.
https://1drv.ms/u/s!AgBsOh_Kj0nlkinw549jMYTZpVt5?e=7GDg0u
 
Last edited:
SMBIOS Version 2.8. I downloaded the latest non-beta BIOS off the MSI website and flashed it. Edit: BIOS mode is UEFI, I was looking at the wrong line.

I can't find anything relating to amdppm in Services.

USB Root Hub power management has been fixed.

Crashing has definitely gone down, but is still occuring. Here are the latest dump files if you're interested.
https://1drv.ms/u/s!AgBsOh_Kj0nlkinw549jMYTZpVt5?e=7GDg0u

you might see about changing your bios mode to legacy mode and see if you still crash. if you have UEFI mode then the system might be blocking access to the BIOS tables for security reasons. In legacy mode it would allow the access. (just a guess as to the cause)

also, the beta bios has new functions, if you have a driver from amd that expects to access the functions and they are not there then this is the crash you would get. So, you might want to try the beta bios version.
here is what the new functions are suppose to do and shows what settings to enable in bios:
AMD Smart Access Memory | AMD
 
I'll see about flashing the BIOS with the beta release tomorrow, I've switched the BIOS to legacy mode in the meanwhile.
The blue screens have definitely gone down in frequency, and now are pretty much only APC_INDEX_MISMATCH now. I'll edit this post in 2 hours with the dump logs.
Edit: https://1drv.ms/u/s!AgBsOh_Kj0nlki6JlA7hJ0eLYxPp?e=y0kQrq
Dump logs

no one is really making any progress with this bugcheck APC_INDEX_MISMATCH
I expect AMD knows what the problem is and they would put any potential fix in this driver update:
AMD Drivers and Support for Radeon, Radeon Pro, FirePro, APU, CPU, Ryzen, desktops, laptops
processors->AMD ryzen Processors->AMD ryzen 5 desktop Processor-> then select your cpu.

I would give it a try and see if it helps.
 
Nov 30, 2020
15
1
10
Alright, it's been a while and I'm still experiencing the crashes. I've been managing to ignore them for the most part, but I still crash about 5-6 times throughout the day.

I've gone to the Driver support for my AMD processor, which prompted me to install Ryzen Master which I have done. I have also flashed the MOBO to the latest beta build.

Here are the latest dump logs. Error codes are mostly kernal related i.e: KERNAL_SECURITY_CHECK_FAILURE/KERNAL_MODE_TRAP. I've gotten a few codes pointing to amdppm.sys, which I still am not sure what you mean by the service controlling it in Services. For all of my searching, I cannot find any service in Services that relates to AMD.

https://1drv.ms/u/s!AgBsOh_Kj0nlkjWA5LILJHqaUigR?e=kvrRj5
 
Alright, it's been a while and I'm still experiencing the crashes. I've been managing to ignore them for the most part, but I still crash about 5-6 times throughout the day.

I've gone to the Driver support for my AMD processor, which prompted me to install Ryzen Master which I have done. I have also flashed the MOBO to the latest beta build.

Here are the latest dump logs. Error codes are mostly kernal related i.e: KERNAL_SECURITY_CHECK_FAILURE/KERNAL_MODE_TRAP. I've gotten a few codes pointing to amdppm.sys, which I still am not sure what you mean by the service controlling it in Services. For all of my searching, I cannot find any service in Services that relates to AMD.

https://1drv.ms/u/s!AgBsOh_Kj0nlkjWA5LILJHqaUigR?e=kvrRj5
I looked at one of the mini dumps: it was some timer going off since some hardware did not respond.
you would have to provide a kernel dump for the proper info to be in the memory.dmp. Minidumps just provide limited data.

you have a old version of this driver:
C:\Program Files\AMD\RyzenMaster\bin\AMDRyzenMasterDriver.sys Tue Mar 31 21:07:41 2020

it should be newer than your bios release date:
BiosReleaseDate = 11/16/2020

did you find any driver update here:
AMD Drivers and Support for Radeon, Radeon Pro, FirePro, APU, CPU, Ryzen, desktops, laptops
 
Nov 30, 2020
15
1
10
When I go to the link and put in my CPU(Ryzen 5 3600), all it does is take me to a download link for Ryzen Master(unspecified ver) page. That is what I did when I installed Ryzen Master the first time.

Just to be sure, I have uninstalled and reinstalled RyzenMaster. I have also downloaded and installed the latest Chipset Drivers from the Motherboard website.

I will generate a kernal dump and upload it here. In the meanwhile, here are some more minidumps if you care to look at them. The frequency of crashing has gone back up to about 20 times in 24 hours now. Oddly enough, they seem to happen more during light usage ie browsing chrome than when I'm actively working and using Verilog/MPLabX.

https://onedrive.live.com/?id=E5498FCA1F3A6C00!2363&cid=E5498FCA1F3A6C00 Kernel Dump and more minidumps.

Some more symptoms just to know: The system seems to black screen briefly while watching videos or on Chrome. I believe this is due to the GPU restarting/crashing. I have uninstalled the GPU driver and reinstalled it. Occasionally, Chrome will report errors of its own. I will make note of these as they come up and provide a list at a later point.

There is one thing I am somewhat concerned may be causing the blue screens. With your earlier mention of the H150i AIO wakeup being unpredictable, the fact that the fans on the radiator do not light up is starting to concern me. Initially, I had disregarded this as the fans spin and I don't mind the LEDs not being on. However, the LEDs not working might be indicative of a wiring problem. I will take apart my PC and rewire it later this weekend to see if that will resolve the issue.
 
Last edited:
Nov 30, 2020
15
1
10
Is it possible that a memory leak is occuring? In the last 24 hours a new symptom has cropped up where active applications will black screen, then reset as if they had crashed. I.E. A video tab will black screen, reload, and start from the beginning. This usually precedes the BSOD by about 3-5 minutes.
 
Nov 30, 2020
15
1
10
The full breakdown of my build is:
CPU: Ryzen 5 3600
GPU: MSI 3x Ventus OC 3080
PSU: Seasonic Focus Plus Gold 1000W
RAM: 2/4x8GB GSkill Trident Z 16CL Samsung B die, F4-3600C16D-16GTZN (I have 4 identical modules, but only 2 are in at the moment.)
MOBO: MSI x570 Tomahawk WIFI
NVMe: Sabrent Rocket 4.0 1TB

AIO: Corsair H150i PRO RGB XT
Misc: 6x Lian Li 120mm Fans.
Razer Naga Trinity
Razer Blackwidow V3
Razer Firefly V2
Blue Snowball
Logitech Webcam
Ethernet Connection
Corsair Void Pro RGB Wireless, USB Dongle

AOC 24G2 144Hz 24" Monitor using DisplayPort
ASUS 60Hz 24" Monitor using HDMI
 
Nov 30, 2020
15
1
10
It seems that whenever I watch a longer video on Chrome the system will crash. I have switched to Microsoft edge and the crashing has drastically dropped to 5~ a day, with most errors being APC_INDEX_MISMATCH.
 
Nov 30, 2020
15
1
10
Something interesting that I've noticed. In the last 3 days of crashing, I have always crashed 10 minutes before a Security Intelligence Update for Microsoft Defender Antivirus is successfully installed according to the Reliability Monitor.