[SOLVED] BSOD - - - Crashing out of nowhere - - I've no clue what the next steps are ?

Nov 4, 2023
7
2
15
Hello,

I have a problem with my computer few last days (I bought the PC at April this year). Just to describe the problem. One day I was peacefully playing Counter Strike with my brother and then it happened. The screen freeze and he scary BSOD shows up. So, I have just restarted PC and went back in game after few minutes it happened again... and again. I ended in the situation I personally called BSOD hell. I turned my PC on, log in to the system, 10 seconds is everything loading up, freeze, BSOD, and again.

I tried some repairs (not relevant for now), nothing helped so I reinstalled Windows and I was setting everything up using Armoury Crate software from ASUS to get the latest drivers and then it happened again, with the clean install of the Windows, just updating a drivers. I ended in the BSOD hell again. So I reinstalled the computer again.

I was more conscious about the next steps, so I let the Windows get everything (updates, security), then I started to update drivers one by one. I installed one driver and restart a computer, waited for a while and then the next driver and so on. I installed the latest drivers for everything. Everything worked just fine. Except that I found that one of the memory stick (DRAM 0 slot) shows up 0,5 degree temperature in the BIOS. Weird, right? I was like, cool I found the issue, it is the RAM stick with weird contact. I pulled out both sticks, switched them if the temperature will be now in the DRAM 1 slot. It was not, all looked great. So I took Prime95 and gave it 3 hours of stress testing. No crash, not a single issue. So I started installing next software like Steam, etc.

After a day of happiness the stealthy BSODs came back. Just a few, not the whole BSOD hell. So I start debugging. Event viewer + Google if something suits my case. I found that the HyperX Cloud 7.1 USB sound card was making troubles so I updated firmware. One issue down but it was not relevant to the BSOD. So I went deeper the WinDbg + Google.

I have two types of BSOD errors - CLOCK_WATCHDOG_TIMEOUT and MACHINE_CHECK_EXCEPTION. I googled that should be issue with the RAM, thus I run Memtest86 several times for each stick, all good, 0 errors. BTW the DRAM 1 slot (stick from DRAM 0 slot with 0,5 temp) shows the weird temperature again (was good).

I have no clue what should I do next. Help me please. Everything was perfect since I bought this PC, but the few last days it is a struggle.

HW INFO:
MB: ASUS ROG STRIX B660-I GAMING WIFI
CPU: Intel Core i5-13600KF
RAM: Corsair 32GB KIT DDR5 5600MHz CL36 Vengeance
SSD: Kingston KC3000 NVMe 2TB
PSU: Corsair RM850 White (2021)
Windows 11 Pro

TLDR:
My computer throws two types of BSOD out of nowhere. CLOCK_WATCHDOG_TIMEOUT and MACHINE_CHECK_EXCEPTION. One of the RAM stick shows weird temperature. No overclocking was done at all. All is stock.
I have:
  • Reinstalled Windows (twice)
  • Prime95 stress tested whole system for a few hours with the latest drivers installed
  • Memtested both memory sticks multiple times without errors.
  • I have the latest drivers and updates everywhere. Checked with all the manufacturers.
Google Drive folder with dump files <- DUMP files here

I can buy a new part and return the broken one, no problem, but I can not buy a whole new PC.
 

Lutfij

Titan
Moderator
Welcome to the forums, newcomer!

Reinstalled Windows (twice)
Where did you source the installer for the OS? Did you install the OS in offline mode+manually installed all relevant drivers that are latest, in an elevated command, i.e, Right click installer>Run as Administrator?
 
Nov 4, 2023
7
2
15
Welcome to the forums, newcomer!

Reinstalled Windows (twice)
Where did you source the installer for the OS? Did you install the OS in offline mode+manually installed all relevant drivers that are latest, in an elevated command, i.e, Right click installer>Run as Administrator?

Thank you!

Installer came from the Factory reset. I tried the online one and that failed. All drivers are installed via ArmouryCrate from ASUS.
 
Nov 4, 2023
7
2
15
You're way behind on updates. I would update to the latest version, 2802, and then see if you are still having the same issues.
Thanks for the reply!

I updated BIOS yesterday and got a BSOD 2 times. I focused on CPU now, Intel Diagnostics all OK, stress tests OK. But with some Googling I found that for some people with very same problem as me and similar PC setup, helped turning off the CPU C States Support in the BIOS. I am testing the system now.

It is super weird issue but lot of 13600K(F)s have these problems out of nowhere .
 
Last edited:

ubuysa

Distinguished
First things first; four of the dumps are (as you say) 0x101, CLOCK_WATCHDOG_TIMEOUT and these can only be fully analysed with the full kernel dump. Unfortunately there is only one of those recorded and it's always for the most recent BSOD, which isn't a 0x101 in your case. Sod's law of course!

If/when you get another 0x101 BSOD please copy the file C:\Windows\Memory.dmp to a temp folder somewhere to prevent it being overwritten by a subsequent BSOD. Then upload it to the cloud - it will be large.

The 0x101 BSOD is almost always a CPU issue, so if you're overclocking that K series CPU please remove it and run at stock frequencies until your problem is resolved. Same with your RAM, please disable XMP and run at stock frequencies. Any overclocking introduces instability, and 0x101 is the common excessive overclock BSOD (so too is the 0x9C, MACHINE_CHECK_EXCEPTION).

The latest dump is a 0x3B, SYSTEM_SERVICE_EXCEPTION which means that some sort of exception occurred whilst running in kernel mode. In the call stack you can see that virtual memory operations were in progress...
Code:
STACK_TEXT:
fffff086`a3fff490 fffff806`28749be4   nt!MiGetProtoPteAddress+0x38
fffff086`a3fff4e0 fffff806`2874ba2b   nt!MiQueryAddressState+0x564
fffff086`a3fff700 fffff806`28bda75f   nt!MiQueryAddressSpan+0x24b
fffff086`a3fff7c0 fffff806`28bda005   nt!MmQueryVirtualMemory+0x73f
fffff086`a3fff960 fffff806`288274e5   nt!NtQueryVirtualMemory+0x25
fffff086`a3fff9b0 00007ffb`43c0f854   nt!KiSystemServiceCopyEnd+0x25
00000039`95fffd58 00000000`00000000   0x00007ffb`43c0f854
The error occurred in the last call there (the top one)...
Code:
CONTEXT:  ffffe001ddf7e900 -- (.cxr 0xffffe001ddf7e900)
rax=ffffcc8720a80b20 rbx=00000007ffaef9ee rcx=ffffcc8720a80aa0
rdx=ffffcc8721a44520 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8062874a948 rsp=fffff086a3fff490 rbp=fffff086a3fff5e0
 r8=0000000000000004  r9=fffff086a3fff568 r10=00000007ffaebae0
r11=ffffcc8721a44568 r12=0000000000000004 r13=fffff43ffd77cf70
r14=fffff086a3fff568 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050246
nt!MiGetProtoPteAddress+0x38:
fffff806`2874a948 f7413800000008  test    dword ptr [rcx+38h],8000000h ds:002b:ffffcc87`20a80ad8=????????
Resetting default scope
You can see the failing instruction is a TEST instruction using the RCX register as a memory pointer. However, the resulting memory address is invalid (=????????) meaning that the memory location was either not allocated, was paged-out (which is not allowed since we're running at an elevated IRQL), or the RAM page was bad.

When you tested the RAM what tester did you use? If that 32GB kit is 2x16GB then try running on just one stick for a day or two (or until you get a BSOD) and then swap RAM sticks and run on just the other for a day or two (or until you get a BSOD). That is the 'gold standard' RAM test.

I really would like to see the kernel dump for the next 0x101 BSOD however...
 
I don't know about it being the "gold standard RAM test" but it is certainly a method that can be useful in determining which stick is causing problems, if in fact it IS a physical memory issue. If it's a configuration, firmware or motherboard issue, all of which are possible in this scenario, it's not going to tell you much.

And as far as the memory testing goes, while it's also not fool proof, make sure when running memtest that you test EACH individual stick for FOUR FULL PASSES of all 11 tests using Memtest86 and if they both pass all four passes then test them again, together, for another four full passes. This will take a fairly long amount of time to do all the testing if you choose to do it.

It would also not hurt to run the advanced Windows memory diagnostic tests as well.

As well, after updating the BIOS did you perform a hard reset to ensure there are no settings that "stuck"? Because, it definitely happens. A lot.

BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.
 
Nov 4, 2023
7
2
15
First things first; four of the dumps are (as you say) 0x101, CLOCK_WATCHDOG_TIMEOUT and these can only be fully analysed with the full kernel dump. Unfortunately there is only one of those recorded and it's always for the most recent BSOD, which isn't a 0x101 in your case. Sod's law of course!

If/when you get another 0x101 BSOD please copy the file C:\Windows\Memory.dmp to a temp folder somewhere to prevent it being overwritten by a subsequent BSOD. Then upload it to the cloud - it will be large.

The 0x101 BSOD is almost always a CPU issue, so if you're overclocking that K series CPU please remove it and run at stock frequencies until your problem is resolved. Same with your RAM, please disable XMP and run at stock frequencies. Any overclocking introduces instability, and 0x101 is the common excessive overclock BSOD (so too is the 0x9C, MACHINE_CHECK_EXCEPTION).

The latest dump is a 0x3B, SYSTEM_SERVICE_EXCEPTION which means that some sort of exception occurred whilst running in kernel mode. In the call stack you can see that virtual memory operations were in progress...
Code:
STACK_TEXT:
fffff086`a3fff490 fffff806`28749be4   nt!MiGetProtoPteAddress+0x38
fffff086`a3fff4e0 fffff806`2874ba2b   nt!MiQueryAddressState+0x564
fffff086`a3fff700 fffff806`28bda75f   nt!MiQueryAddressSpan+0x24b
fffff086`a3fff7c0 fffff806`28bda005   nt!MmQueryVirtualMemory+0x73f
fffff086`a3fff960 fffff806`288274e5   nt!NtQueryVirtualMemory+0x25
fffff086`a3fff9b0 00007ffb`43c0f854   nt!KiSystemServiceCopyEnd+0x25
00000039`95fffd58 00000000`00000000   0x00007ffb`43c0f854
The error occurred in the last call there (the top one)...
Code:
CONTEXT:  ffffe001ddf7e900 -- (.cxr 0xffffe001ddf7e900)
rax=ffffcc8720a80b20 rbx=00000007ffaef9ee rcx=ffffcc8720a80aa0
rdx=ffffcc8721a44520 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8062874a948 rsp=fffff086a3fff490 rbp=fffff086a3fff5e0
 r8=0000000000000004  r9=fffff086a3fff568 r10=00000007ffaebae0
r11=ffffcc8721a44568 r12=0000000000000004 r13=fffff43ffd77cf70
r14=fffff086a3fff568 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050246
nt!MiGetProtoPteAddress+0x38:
fffff806`2874a948 f7413800000008  test    dword ptr [rcx+38h],8000000h ds:002b:ffffcc87`20a80ad8=????????
Resetting default scope
You can see the failing instruction is a TEST instruction using the RCX register as a memory pointer. However, the resulting memory address is invalid (=????????) meaning that the memory location was either not allocated, was paged-out (which is not allowed since we're running at an elevated IRQL), or the RAM page was bad.

When you tested the RAM what tester did you use? If that 32GB kit is 2x16GB then try running on just one stick for a day or two (or until you get a BSOD) and then swap RAM sticks and run on just the other for a day or two (or until you get a BSOD). That is the 'gold standard' RAM test.

I really would like to see the kernel dump for the next 0x101 BSOD however...
I appreciate the detailed insight from your side! I still do not know how to properly debug these .dmp files, thus thanks for you help.

I tested my sticks with memtest86 (free), already heard that there are missing some tests in the free version, but I run 2 complete tests for each stick and one for both sticks. Also, I was running one day with one stick, another day with the other stick. I got BSOD both days... I was almost losing it.

BUT since I turned off the CPU C States Support in the BIOS I had no BSOD. I run several stress tests, Counter Strike for 2 hours. Wish me luck. Hopefully that was the issue. I wrote more about it in previous replies.

My next steps will be, if another BSOD comes, to upload whole Memory.dmp, so you can debug it. Turn off XMP as you advice and I will continue in trying to solve it.
 
Nov 4, 2023
7
2
15
I don't know about it being the "gold standard RAM test" but it is certainly a method that can be useful in determining which stick is causing problems, if in fact it IS a physical memory issue. If it's a configuration, firmware or motherboard issue, all of which are possible in this scenario, it's not going to tell you much.

And as far as the memory testing goes, while it's also not fool proof, make sure when running memtest that you test EACH individual stick for FOUR FULL PASSES of all 11 tests using Memtest86 and if they both pass all four passes then test them again, together, for another four full passes. This will take a fairly long amount of time to do all the testing if you choose to do it.

It would also not hurt to run the advanced Windows memory diagnostic tests as well.

As well, after updating the BIOS did you perform a hard reset to ensure there are no settings that "stuck"? Because, it definitely happens. A lot.

BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.

Nice! Appreciate these BIOS hard-reset instructions. I didn't know that. Will do that after next BSOD.

I run the Memtest86 multiple times, Windows Memory diagnostics also, Intel diagnostics and if I will find more diagnostics tools I will run them too!

Thanks! I will update the thread within few days or after next BSOD.
 
IMO, based on how many times I've seen this exact thing happen, I'd recommend doing a hard reset of the BIOS followed by a clean install of Windows and DO NOT install Armory Crate. You do NOT NEED Armory Crate to install the manufacturer drivers which can be manually downloaded from the product support page for your motherboard. I don't know how many threads I've participated in where the end result was that Armory Crate was the reason the user was having problems, including BSOD, freezing, errors and just generally poor performance and OS issues. Same goes for most all of the bundled motherboard utilities regardless of manufacturer. I recommend not using ANY of them.

Just clean install Windows, perform all the Windows updates until there are none remaining and then download and install the ASUS B660-I Gaming WiFi chipset, WiFi, LAN, Audio and Bluetooth drivers. Make sure you download the ones meant for Windows 11 as those for Windows 10 are not always the same as those for 11 and may themselves cause problems. Then install the latest driver for your graphics card, which you did not list and which would be useful information to have here, as well as those needed for any other add in cards or peripherals.
 

ubuysa

Distinguished
That last 0x3B BSOD was an outlier in your uploaded dumps and there are other common causes for this besides RAM. If your RAM is passing two runs of Memtest86 then we can assume it's fine for now.

Disabling C States is a common workaround for the 0x101 BSOD. It seems that in some CPUs one or more processors are tardy at coming out of the lower power state and this causes them to miss the clock synchronisation interrupt and thus we get a 0x101 BSOD. In all three of the 0x101 dumps you uploaded, processor #5 was the one missing the clock synchronisation interrupt, so it's certainly a possibility that this is the problem. I've see this before in AMD CPUs, less often in Intel CPUs however.

Disabling C States of course stops the processors entering low power states at the expense of a bit more power consumption and is a reliable workaround. There really isn't much else you can do if that is the cause of these 0x101 BSODs. It's worth mentioning that it is also possible for a rogue driver to cause a 0x101 BSOD as well, so it's not necessarily the CPU.

I you get more CLOCK_WATCHDOG_TIMEOUT BSODs please immediately copy the C:\Windows\Memory.dmp file to a temporary folder somewhere, so it's not overwritten by the next BSOD. You can then upload several kernel dumps to give a clearer picture of what's happening.
 
  • Like
Reactions: Conerzyo
Nov 4, 2023
7
2
15
Just to let you know guys, it was defective CPU unit. It was really weird that when I run a stress test in Intel's XTU (with the old CPU) and nothing happened. CPU usage was still between 1 and 4 percentage. I run the benchmark in XTU that worked but the benchmark run put the CPU unit into constant Thermal throttling state in 2 seconds from starting.

The new CPU runs the stress tests without any issue and the thermals look much better mostly around 80-90 degrees. It was thermal throttling few times at the end of the stress test but just for few seconds to get temperature down.

Also, the benchmark runs smoothly with maximum temperature 88 degrees. Wish me luck, and BIG thanks for all your help!