Question Need help with BSOD constantly happening

Status
Not open for further replies.

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
Hello!

I'm frequently running into the "SYSTEM_SERVICE_EXCEPTION" and other error messages like KERNEL_SECURITY_CHECK_FAILURE randomly on my PC. Its been happening for going on 3 weeks now and lately its been nothing but SYSTEM SERVICE EXCEPTIONS. I would appreciate help to determine what I should test next and how I can do it. I can reliably reproduce the issue when playing a game. It will start the game and the moment I do any movement inside the game it crashes.

Here is my machine info:
---------------------------------------------------------------------------------------------------
CPU: AMD Ryzen 7 3700X
Motherboard: GIGABYTE X570 AORUS ELITE ATX
RAM: Crucial Ballistix Sport LT 32GB (2 x 16GB) DDR4-3200 PC4-25600 CL16
SSD: SAMSUNG E 500GB 970EVO NVME M.2 SSD
GPU: ASUS AMD RADEON RX5700 8G
PSU: CORSAIR RMX750X FM 80+G ATX PSU
Chassis: Fractal Meshify Mid-Tower
OS: Windows 10, 10.0, version 2009, build: 19044 (x64)
---------------------------------------------------------------------------------------------------


Here is what I have tried so far that has not helped

  • Clean install of latest GPU drivers
  • Tried even downgrading GPU drivers
  • Updated BIOS to latest version
  • Updated AMD Chipset Drivers
  • Updated Motherboard Drivers
  • Cleaned out GPU and reseated it in the motherboard
  • Tried running Driver Verifier on Windows with no results
  • Ran MemTest86 and passed
  • Reinstalled Windows 10 (twice now)
---------------------------------------------------------------------------------------------------

I ran the WhoCrashed application and it just gives me generic information. Here are the dump files: https://drive.google.com/file/d/1qn59oQlxFbAAfW5KO66XOEOoNaiwCYRr/view?usp=sharing

Any help is appreciated thank you!
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
Thanks! I am running that test now, however when I first started it up I got a BSOD with the same SYSTEM_SERVICE_EXCEPTION. I got the program to run on the second try. I noticed when I pull up the AMD Radeon Software Performance, until I opened my browser my CPU voltage was at 1.1V. Is that too low? Tried looking online but only found CPU over-voltage.
 

gardenman

Distinguished
Moderator
Hi, I ran the dump files through the debugger and got the following information: https://jsfiddle.net/nsfjm6ev/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.
File information:041522-5234-01.dmp (Apr 15 2022 - 20:09:06)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process running at time of crash: MankindRemastered.exe)
Uptime:0 Day(s), 0 Hour(s), 03 Min(s), and 39 Sec(s)

File information:041022-6406-01.dmp (Apr 10 2022 - 19:51:51)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process running at time of crash: MankindRemastered.exe)
Uptime:0 Day(s), 0 Hour(s), 05 Min(s), and 05 Sec(s)

File information:041022-5609-01.dmp (Apr 10 2022 - 01:33:57)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process running at time of crash: MankindRemastered.exe)
Uptime:0 Day(s), 0 Hour(s), 03 Min(s), and 26 Sec(s)

File information:041022-5187-01.dmp (Apr 10 2022 - 20:33:58)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process running at time of crash: mscorsvw.exe)
Uptime:0 Day(s), 0 Hour(s), 10 Min(s), and 23 Sec(s)

File information:041022-4203-01.dmp (Apr 10 2022 - 20:23:09)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process running at time of crash: MankindRemastered.exe)
Uptime:0 Day(s), 0 Hour(s), 23 Min(s), and 37 Sec(s)
Possible Motherboard page: https://www.gigabyte.com/Motherboard/X570-AORUS-ELITE-rev-10#kf
There is a minor BIOS update available for your system. Wait for additional information before deciding to update or not. Important: Verify that I have linked to the correct motherboard. Updating your BIOS can be risky. Never try it when you might lose power (lightning storms, recent power outages, etc).

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 

ubuysa

Honorable
Jul 29, 2016
17
0
10,520
3
I've just taken a close look at all five minidumps you posted and there is one thing common to all of them. If you examine the active thread, the stack trace contains a trap frame for the system service caller (nt!KiSystemServiceUser+0x18). Examining this trap frame (in each of the dumps) reveals user mode code passing an invalid pointer; eg.
Code:
00007ffa f5b8d074     ??              ???
The question marks indicate invalid data. This results in an immediate general protection fault, the trap frame for the system service routine (in each of the dumps) contains an invalid address (because of the duff data passed by the user mode process); eg.
Code:
fffff802 7aff8700 f785f800000000020000 test dword ptr [rbp+0F8h],200h ss:9098:498c0c36`a057153a=????????
Note the question marks again indicating an invalid address.

I would thus suspect the Mankind Remastered game itself, since in 4 of the five dumps mankindremastered.exe is the active process, in the other one the active process is mscorsvw.exe, this is the .NET Runtime Optimization Service and if mankindremastered.exe is a .NET application then a BSOD with this process in control might be expected.

I can see no driver errors flagged in any of these dumps, so rather than pointing at a driver (which runs in kernel mode) it seems these errors are a ultimately down to user mode code making invalid system service calls.
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
So I ran the Prime95 program overnight and woke up this morning to my find my PC restarted. I think It bluescreened last night and then when I went to go run WhoCrashed, it BSOD again. As for the game, I can reproduce the issue consistently with opening that game, other times it is random when I am just working in VS Code or some dev program. I confirmed with other players in the community that I seem to be the only one with this issue.

Here are the latest dump files: https://drive.google.com/file/d/1SjfL5UcxpW0brcsRUDGsivjfI6D6tMaB/view?usp=sharing
 
you might download autoruns from here: Autoruns for Windows - Windows Sysinternals | Microsoft Docs

find the menu options to hide microsoft entries. then find the driver
AMDRyzenMasterDriver and disable it and reboot.
This driver overrides various cpu voltages set in bios.
otherwise you will have to track down what service was using a bad address.

the process name was whocrashedex.exe
(has WhoCrashed.exe now running a service? I have not installed it in a long time)

first bugcheck the process was searchfilterhost.exe
(I would disable the ryzen master driver and retest for failure.)
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
you might download autoruns from here: Autoruns for Windows - Windows Sysinternals | Microsoft Docs

find the menu options to hide microsoft entries. then find the driver
AMDRyzenMasterDriver and disable it and reboot.
This driver overrides various cpu voltages set in bios.
otherwise you will have to track down what service was using a bad address.

the process name was whocrashedex.exe
(has WhoCrashed.exe now running a service? I have not installed it in a long time)

first bugcheck the process was searchfilterhost.exe
(I would disable the ryzen master driver and retest for failure.)

So I did this and it seemed to help intially. My computer ran an update and updated to Windows 11 for some reason as well. I was able to play the game more however after 15 mins or so I got another BSOD with a different message KMODE_EXCEPTION_NOT_HANDLED. I am at a loss as to what is causing these blue screens

Here are the dump files : https://drive.google.com/file/d/1IvCbxIK0sabBlQzWOyIUV0-SeS3SilyS/view?usp=sharing
 
first bugcheck was a the edge browser
msedgewebview2.exe
looked like a service tried to access a memory address that was
bogus
0xaaaaaaaaaaaaaaaa

(not in user space or kernel memory range)

ryzen driver is loaded:
C:\WINDOWS\system32\AMDRyzenMasterDriver.sys Thu Jun 24 22:21:58 2021
(cpu speed looks like a slight underclock)
~MHz = REG_DWORD 3593
Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
Identifier = REG_SZ AMD64 Family 23 Model 113 Stepping 0
ProcessorNameString = REG_SZ AMD Ryzen 7 3700X 8-Core Processor
Update Status = REG_DWORD 3
VendorIdentifier = REG_SZ AuthenticAMD
---------------
will look at second dump in a moment

driver tried to access a bogus memory address of (-1)
0xffffffffffffffff

looks like windows was running this function
nt!ReadAMDMsr

I think windows was trying to read AMDMsr
AMD cpu Model Specific Registers
and got the unexpected -1 in one of the registers.

most likely to be a bug in one of the amd drivers on the machine. (or a bug in cpu that has not been patched)

here are some of the third party amd drivers:
\SystemRoot\System32\drivers\amdfendr.sys Thu Dec 9 21:13:07 2021
\SystemRoot\System32\drivers\amdfendrmgr.sys Thu Dec 9 21:13:20 2021
\SystemRoot\System32\drivers\amdgpio2.sys Wed Mar 11 04:15:48 2020
\SystemRoot\System32\DriverStore\FileRepository\u0377495.inf_amd64_58cc395c0bf03a26\B377432\amdkmdag.sys Wed Mar 9 15:58:26 2022
\SystemRoot\System32\drivers\amdkmpfd.sys Fri Nov 6 12:02:44 2020
\SystemRoot\System32\drivers\AMDPCIDev.sys Mon May 17 22:18:23 2021
\SystemRoot\System32\drivers\amdpsp.sys Fri Jun 11 13:35:10 2021
\SystemRoot\System32\DriverStore\FileRepository\amdsafd.inf_amd64_edd3335a4253bf6d\amdsafd.sys Tue Nov 2 14:48:30 2021
\SystemRoot\System32\drivers\amdxe.sys Mon Aug 16 08:48:56 2021
unexpected driver:
\SystemRoot\system32\drivers\AtihdWT6.sys Wed Oct 27 14:14:10 2021

microsoft provided amd drivers:
\SystemRoot\System32\drivers\amdppm.sys
\SystemRoot\system32\mcupdate_AuthenticAMD.dll

I would first remove the ryzen master driver again .

looks like most of your amd drivers are related to your GPU. then you have some related to your cpu secruity processor and a amd link processor and some amd audio streaming,
not sure about this driver: AtihdWT6.sys
just because of the old name ATI.

try and remove the ryzen master driver and see if it helps.
 
Last edited:
I just had another one with code REFERENCE_BY_POINTER. File dump: https://drive.google.com/file/d/1xA3USBgSf8RHC7nWrBpMHfpScmVH7Ubr/view?usp=sharing

I ran the autorun.exe again and saw that the RyzenMasterDriver was enabled. I disabled it again and restarted
you have to delete the entry to get rid of it. disabling it is good for testing so that windows will just skip the loading of the driver but after some windows updates the selective load might get disabled which would reenable the driver.
you can delete the entry, if you want the driver back you can install the cpu chipset drivers directly from AMD and it will get installed again.

in this bugcheck, looked like chrome was running the system tracks each object and when something is done using it the count goes down by 1. this object the count went from zero (not in use) to -1 meaning it was released too many times. when the count goes to 0 it means the windows memory manager is free to clean up the memory. -1 is a big problem for the memory manager so it calls a bugcheck.
maybe you have a chrome extension?

the underflow is in some table that tracks the use of thread and processes.
the actual table inside windows is a target for cheat software and anticheat software. lots of info and code on hacking this table. just fyi:
UnKnoWnCheaTs - Multiplayer Game Hacking and Cheats - View Single Post - [Coding] Hide kernel thread (PsCreateSystemThread)

could be some software saying it is done with something and windows did not know it was using it. IE count was not raised but got lowered later.

guess malware or utilities could do the same as well as just a bug in a driver or service.
PspReferenceCidTableEntry
 
Last edited:

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
you have to delete the entry to get rid of it. disabling it is good for testing so that windows will just skip the loading of the driver but after some windows updates the selective load might get disabled which would reenable the driver.
you can delete the entry, if you want the driver back you can install the cpu chipset drivers directly from AMD and it will get installed again.

in this bugcheck, looked like chrome was running the system tracks each object and when something is done using it the count goes down by 1. this object the count went from zero (not in use) to -1 meaning it was released too many times. when the count goes to 0 it means the windows memory manager is free to clean up the memory. -1 is a big problem for the memory manager so it calls a bugcheck.
maybe you have a chrome extension?

the underflow is in some table that tracks the use of thread and processes.
the actual table inside windows is a target for cheat software and anticheat software. lots of info and code on hacking this table. just fyi:
UnKnoWnCheaTs - Multiplayer Game Hacking and Cheats - View Single Post - [Coding] Hide kernel thread (PsCreateSystemThread)

could be some software saying it is done with something and windows did not know it was using it. IE count was not raised but got lowered later.

guess malware or utilities could do the same as well as just a bug in a driver or service.
PspReferenceCidTableEntry
I really appreciate your help! I will delete that driver as it seems to work. However after a day of good performance, I have been getting several new BSOD errors that I was hoping you could look at. Several are "REFERENCE_BY_POINTER" and "KMODE_EXCEPTION_NOT_HANDLED" as well as "BAD_OBJECT_HEADER". Here is a link to the dump files and again I really appreciate your help!

Dump: https://drive.google.com/file/d/142hMaagMdtURs2eyZF3yRZXbYqxMT9IQ/view?usp=sharing
 
next bugcheck looked like a game
mankindremastered.exe
trying to use a file handle that most likely already released back to the system. (all zeros)

------------
next bugcheck was an access violation the address of the driver was -1
the code was
nt!ReadAMDMsr
basically something trying to read AMD cpu model specfic registers. I think i looked this up the other day and on error all the bits are to be set to zero. Something set them all binary 1's
your cpu=
Identifier = REG_SZ AMD64 Family 23 Model 113 Stepping 0
ProcessorNameString = REG_SZ AMD Ryzen 7 3700X 8-Core Processor

which is family 23 = 17h
model 113 = 71h
setpping 0

I tried to find AMD cpu buglist for this cpu but they make it hard to find. I could only find buglists for earlier versions. the first spect i could find was for stepping b0

maybe the debugger is wrong about the stepping version. Stepping is the version of the cpu.
I think I looked into a problem like this in 2019 but the story iIgot was that cpu was not released in the USA. it was manufactured in china and only released in china. But the chips version had issues and the USA released a newer version. Only problem was people in china sold the CPU on the grey market which ended up being sold on the usa market. So, when/where did you get the CPU? (or the debugger displays bad step version, just as likely)
 
options: malware
bad chrome extension
bad program MankindRemastered.exe

problem with one of your AMD chip drivers
problem with BIOS
CPU bug that does not have the proper microcode patch.

(when I look at the cpu microcode patch level the debugger just returns a error, often happens on amd cpus)

maybe reinstall every amd driver? or wipe Windows and only use the amd drivers that microsoft pushes out via windows update. Microsoft versions of the drivers get reported back and fixed when they cause a bugcheck. motherboard vendors drivers and drivers directly from amd do not get updated. Microsoft does not install the ryzen master driver which means you got it from the motherboard vendor or directly from AMD. You might now have a mixed build of amd drivers: some from microsoft some from the motherboard vendor, some directly from AMD. if they were made from different source code versions they could have various bugs when working together.

only real debugging option would be to make kernel dump so I can see what is going on with the various cpus look at kernel resources. most likely will not get far if the problem is reading from a CPU MSR (register for a particular cpu version)

i think the register was 64 bits each 1 flag indicates some event happend. On error it should all be 64 bits of zeros and instead it was 64 bits of 1's (everything happened and windows started processing each flag and the info was not there so it bugchecked)
it could just be a driver bug also. it is very common for programmers to use -1 as a error and 0 means success. in this case the cpu indicates that for a error the register should be set to 0. otherwise each 1 in the fllag means something to process.
 
Last edited:

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
@johnbl things got better for a few days now they are back with a vengeance. I have deleted that driver and now only sometimes when I play the game does it BSOD. Its also BSOD when I am trying to use visual studio code or watch YouTube. Also sometimes it crashes the PC and I see the BSOD screen for a split second before PC shuts off and when it comes back on there is no dump files for when that happens. Today it did that to me twice and also several more BSOD Screens. I am at a loss on what to do at this point

Here is the latest dump files : https://drive.google.com/file/d/1wIDUr1GjR4tjUvoFg5pRJ_Q81DE79Ojn/view?usp=sharing
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
Ok so tonight Ive had non stop constant BSOD. I accidently turned on XMP in the BIOS and then after that couldnt get to the home screen without BSOD coming. I was able to disable the XMP and was still getting BSOD errors. I finally got the computer stable enough to grab the dump files. Anyone who have suggestions please let me know , I am ready to toss this thing out the window.

Dump Files: https://drive.google.com/file/d/1FbjIOY6Q2WqNXKmCt1YGXBzbp9Fugrfr/view?usp=sharing
 

ubuysa

Honorable
Jul 29, 2016
17
0
10,520
3
This is the key data from the dumps:
Code:
042722-6234-01.dmp SYSTEM_SERVICE_EXCEPTION
Exception code 0xC0000005 (memory access violation)
Bad address nt!KiSystemServiceUser+0x18: fffff807`2e028d32 650fae142580010000 ldmxcsr dword ptr gs:[180h] gs:002b:00000000`00000180=????????
Process in control steam.exe
FAILURE_BUCKET_ID:  0x3B_c0000005_STACKPTR_ERROR_nt!KiSystemServiceUser (stack pointer error)

042722-6484-01.dmp SYSTEM_SERVICE_EXCEPTION
Exception code 0xC0000005 (memory access violation)
Bad address nt!KiSystemServiceUser+0x18: fffff807`2e028d32 650fae142580010000 ldmxcsr dword ptr gs:[180h] gs:002b:00000000`00000180=????????
Process in control steam.exe
FAILURE_BUCKET_ID:  0x3B_c0000005_STACKPTR_ERROR_nt!KiSystemServiceUser

042722-6937-01.dmp SYSTEM_SERVICE_EXCEPTION
Exception code 0xC0000005 (memory access violation)
Bad address nt!KiSystemServiceUser+0x18: fffff807`2e028d32 650fae142580010000 ldmxcsr dword ptr gs:[180h] gs:002b:00000000`00000180=????????
Process in control steam.exe
FAILURE_BUCKET_ID:  0x3B_c0000005_STACKPTR_ERROR_nt!KiSystemServiceUser

042722-17484-01.dmp SYSTEM_SERVICE_EXCEPTION
Exception code 0xC0000005 (memory access violation)
Page fault nt!KiPageFault+0x5ef: fffff802`082258af 0fae55ac        ldmxcsr dword ptr [rbp-54h] ss:0018:ffffe080`5b428dcc=00001f80
Process in control MBAMService.exe (fileinfo.sys driver)
FAILURE_BUCKET_ID:  0x3B_c0000005_VRF_STACKPTR_ERROR_fileinfo!FIPostCreateCallback

042722-17906-01.dmp IRQL_NOT_LESS_OR_EQUAL
Bad address nt!MiComputeMaximumFaultCluster+0x47: fffff802`2eb2a7a7 410fb78270010000 movzx   eax,word ptr [r10+170h] ds:fffffd8e`c4c1bc50=????
Process in control 1g-16.exe (MBAM)
IMAGE_NAME:  memory_corruption
The first three dumps are identical and appear to be Steam related. They are all invalid stack pointers (pointing to an address that doesn't exist). Typically this is a driver error, but I suspect this may be a RAM issue.

The last two have Malwarebytes processes in control and both are memory errors. Both are referencing invalid addresses. Again, I suspect RAM may be the issue here.
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
@ubuysa I purchased new RAM today and tried it out. Computer seems to be running slower in response time since yesterday (even with old RAM). New RAM and I still get BSOD errors. SYSTEM_REFERENCE_EXCEPTION, IRLQ_LESS_NOT_EQUAL and REFERENCE_BY_POINTER. I have included the new dump files. Do you think it could be a bad CPU? I am not sure how to even test for that?

New dump files: https://drive.google.com/file/d/1vx0lSbk8uuxOC2T7vGGYKuDZSWmxgsOn/view?usp=sharing
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
56,552
4,475
160,690
10,208
So I ran the Prime95 program overnight and woke up this morning to my find my PC restarted. I think It bluescreened last night and then when I went to go run WhoCrashed, it BSOD again.
can you look in the folder that prime 95 was installed in, it might have a file called stress.txt which are the results of the scan. It might show a clue, since the only way to test Ryzen CPU is Prime 95.
copy/paste any results you get into here.
 

effektz

Commendable
Jul 15, 2019
60
0
1,540
1
I found a file that said results.txt and it looks like there is a gap between when the computer shut down and I started it back up. Everything shows passed before and after. I re-ran the prime95 test for 10 hours and again everything passed. I ran a FurMark torture test on the GPU as well and nothing seemed to fail. A friend wanted me to test moving the RAM slots from A2/B2 to A1/A2 and I got an instant BSOD with MEMORY_MANAGEMENT and then unable to boot. I switched them back to A2/B2 and computer finally booted up. For the sake of trying to see if it might help, I ran a UserBenchMark on the computer. Obviously it did not perform well at all, even the new RAM I installed does not show performing great.

UserBenchmarks: Game 46%, Desk 84%, Work 48%
CPU: AMD Ryzen 7 3700X - 83.8%
GPU: AMD RX 5700 - 50.9%
SSD: Samsung 970 Evo NVMe PCIe M.2 500GB - 51.7%
HDD: WD Green 3TB (2011) - 55.8%
RAM: G.SKILL Ripjaws V DDR4 3200 C16 2x16GB - 65.7%
MBD: Gigabyte X570 AORUS ELITE
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
56,552
4,475
160,690
10,208
A friend wanted me to test moving the RAM slots from A2/B2 to A1/A2 and I got an instant BSOD with MEMORY_MANAGEMENT and then unable to boot. I switched them back to A2/B2 and computer finally booted up.
I have the wifi version of the motherboard you using,
if you had tried A1 & B1 it would have worked

the channels are A1 & B1 or A2 & B2
you only have A2 & A1 filled if you have more than 2 sticks.

the manual seems to contradict itself.
I can see why your friend would suggest it based on the channel descriptions but the table matches how I can see my ram set up

See page 12 - https://download.gigabyte.com/FileList/Manual/mb_manual_x570-aorus-elite-wifi_1002_e_v1.pdf?v=01f88767fb5b58cab7d7d233fb757234
 
Status
Not open for further replies.

ASK THE COMMUNITY

TRENDING THREADS