Question Please Help: Random reboots and crashes

Dec 3, 2023
4
1
15
Hello all, I am turning to this community because I am at a total loss.

I recently built a new PC that randomly reboots and/or crashes fairly frequently. It reboots/crashes without warning (i.e. no BSOD, and no Windows Event Viewer logs that point to a problem outside of the "Critical Kernel-Power (41)" error as a result of the crash itself). There is no consistency to when it crashes. I went 5 days with no crash, and just had 2 crashes today.

These crashes do not appear to be related to system load. they will occur when I am gaming, but also when I am just browsing the web (i.e. watching YouTube). I cannot say 100% that I have had it crash at idle, or when there wasn't some form of video playing. This would be difficult to validate because I don't use the PC for other purposes very frequently, or for a long enough usage period to cause a crash.

Things I have tried:
- Disabling automatic restarts on system failure, in hopes to get a better read (still restarts w/o notification)
- Undoing/redoing all cable connections
- Getting all OS + driver updates (latest drivers direct from manufacturer)
- Updating to the lastest BIOS
- Replacing cables and reseating hardware
- Running a complete memtest86 (100% pass with 0 failures)
- Disabling all "performance" features in BIOS for vanilla boot configuration
- Replacing PSU with a brand new unit

I have been actively monitoring the temperatures and load for the CPU, GPU, memory and storage. Everything is running within a reasonable range. I have attached a log file after the latest crash, which occurred after a gaming session (no games were running during the crash). If you go back ~30m you will see the system under load while playing Cyberpunk at max settings. The metrics are still all good, and since the crash occurred in even more optimal ranges, it is even more mysterious.

I used GenericLogViewer (v6.4) to look at the graphs from 1 hour prior to the crash (see screenshot and CSV log file), and nothing looks anomalistic - clicking through every sensor metric.

I am truly hoping that someone in the community can spot something I am not, or offer other suggestions. The only remaining things I can think to do are, replacing the MOBO, GPU and CPU, but I am truly hoping it doesn't come to that.

System specs:
- Motherboard: ASUS TUF Gaming B650-PLUS WiFi Socket AM5
- CPU: AMD Ryzen 7 7800X3D
- GPU: ZOTAC Gaming GeForce RTX 4070 Ti AMP Extreme AIRO
- RAM: CORSAIR VENGEANCE RGB DDR5 RAM 64GB (2x32GB) 6000MHz CL30
- Storage: Corsair MP700 2TB PCIe Gen5 x4 NVMe 2.0 M.2 SSD
- PSU: be quiet! Dark Power 13 1000W Quiet Performance Power Supply | 80 Plus Titanium Efficiency | ATX 3.0 | PCIe 5
- OS: Windows 11 Home Edition (although this issue also occurred with W10 before I upgraded).

PLEASE HELP! I am going crazy because the $$$ for this system is causing so much anxiety and I don't know what to do. Whoever solves this, I will buy you lunch ;-)

Thanks in advance.
 
  • Like
Reactions: Randomnessity
Welcome to the forums, newcomer!

Updating to the lastest BIOS
What BIOS version are you currently on for your motherboard? Did you clear the CMOS after verifying that your BIOS was successfully flashed to the latest version?

OS: Windows 11 Home Edition (although this issue also occurred with W10 before I upgraded)
You said upgraded, did you upgrade to Windows 11 using the internal upgrade path? If so, you should fabricate the bootable USB installer for Windows 11 and then format, reinstall the OS in offline mode. Then manually install all (the latest)relevant drivers.

How are you cooling that processor?
 
Thanks folks for the replies!

  • Yes I have the latest BIOS and cleared CMOS
  • I do not imagine it is an OS issue and reinstalling windows will not help (see below)
  • I am cooling with an AIO - and thermals are all fantastic in the green levels (i.e < 50c under extreme load for an extended duration)
  • XMP (in my case, DOCP) is in-fact the issue.
    • I ran a Prime95 blended test for 10 hours with the default BIOS settings (i.e. no DOCP), and there were 0 failures.
    • I ran a Prime95 blended test for 5 hours with DOCP enabled, and 4/8 cores had FATAL ERRORS
With DOCP enabled, the SOC voltages are in a safe range and below the Ryzen CPU threshold of 1.3v, so it is unlikely that the Over Current Protection (OCP) is not being triggered. The Prime95 run not causing a computer crash also corroborates this assumption.

This points me to two possibilities:
  1. Unstable memory chip (need new RAM hardware)
  2. Unstable memory controller (need new CPU hardware)
I purchased some new RAM with EXPO instead of DOCP, and am going to try it out by running the same Prime95 blended tests. If that still results in worker failures, the only thing left that I can consider is to replace the CPU entirely, and then MOBO

Any other thoughts?
 
when xmp proble comes, there are several choices
1 change ram (but you must think that whether the problem is from mainboard, personally I have no confidence on any AMD pruduct ) or add two more, some times filling all slots maybe get more stable, i had met on my Z97 9years ago
2 change mainboard (there are some boards are really not good to support XMP) even change your flatform to intel
3 close DOCP and play it at 4800MHZ
 
  • Like
Reactions: jhelmer25
I have the same problem with the same motherboard, did you fix it
Yes. The issue for me was the RAM when DOCP was enabled. I was able to verify this by doing a Prime95 test with and without DOCP enabled.

I swapped the RAM for G.Skill (and learned that EXPO is far superior to DOCP), and haven't had any issues since.
 
Yes. The issue for me was the RAM when DOCP was enabled. I was able to verify this by doing a Prime95 test with and without DOCP enabled.

I swapped the RAM for G.Skill (and learned that EXPO is far superior to DOCP), and haven't had any issues since.
What really happened here is that your Mobo does not support the RAM you purchased. i tried to lookup the RAM code of the modules you purchased and it does not show up.

https://www.asus.com/motherboards-c...l_memory?model2Name=TUF-GAMING-B650-PLUS-WIFI

CMH64GX5M2B6000C30​


I had the same issue where i bought RAM that's not officially supported. Turns out newer boards are sensitive with RAM compatibility.
 
What really happened here is that your Mobo does not support the RAM you purchased. i tried to lookup the RAM code of the modules you purchased and it does not show up.

https://www.asus.com/motherboards-c...l_memory?model2Name=TUF-GAMING-B650-PLUS-WIFI

CMH64GX5M2B6000C30​


I had the same issue where i bought RAM that's not officially supported. Turns out newer boards are sensitive with RAM compatibility.
That is good to know thank you I will definitely check this list in the future.

That being said, my current G.SKILL RAM is not supported according to that reference, but it still works.

Still, probably better to just stick with something that is known to work.
 
Hello all, I am turning to this community because I am at a total loss.

I recently built a new PC that randomly reboots and/or crashes fairly frequently. It reboots/crashes without warning (i.e. no BSOD, and no Windows Event Viewer logs that point to a problem outside of the "Critical Kernel-Power (41)" error as a result of the crash itself). There is no consistency to when it crashes. I went 5 days with no crash, and just had 2 crashes today.

These crashes do not appear to be related to system load. they will occur when I am gaming, but also when I am just browsing the web (i.e. watching YouTube). I cannot say 100% that I have had it crash at idle, or when there wasn't some form of video playing. This would be difficult to validate because I don't use the PC for other purposes very frequently, or for a long enough usage period to cause a crash.

Things I have tried:
- Disabling automatic restarts on system failure, in hopes to get a better read (still restarts w/o notification)
- Undoing/redoing all cable connections
- Getting all OS + driver updates (latest drivers direct from manufacturer)
- Updating to the lastest BIOS
- Replacing cables and reseating hardware
- Running a complete memtest86 (100% pass with 0 failures)
- Disabling all "performance" features in BIOS for vanilla boot configuration
- Replacing PSU with a brand new unit

I have been actively monitoring the temperatures and load for the CPU, GPU, memory and storage. Everything is running within a reasonable range. I have attached a log file after the latest crash, which occurred after a gaming session (no games were running during the crash). If you go back ~30m you will see the system under load while playing Cyberpunk at max settings. The metrics are still all good, and since the crash occurred in even more optimal ranges, it is even more mysterious.

I used GenericLogViewer (v6.4) to look at the graphs from 1 hour prior to the crash (see screenshot and CSV log file), and nothing looks anomalistic - clicking through every sensor metric.

I am truly hoping that someone in the community can spot something I am not, or offer other suggestions. The only remaining things I can think to do are, replacing the MOBO, GPU and CPU, but I am truly hoping it doesn't come to that.

System specs:
- Motherboard: ASUS TUF Gaming B650-PLUS WiFi Socket AM5
- CPU: AMD Ryzen 7 7800X3D
- GPU: ZOTAC Gaming GeForce RTX 4070 Ti AMP Extreme AIRO
- RAM: CORSAIR VENGEANCE RGB DDR5 RAM 64GB (2x32GB) 6000MHz CL30
- Storage: Corsair MP700 2TB PCIe Gen5 x4 NVMe 2.0 M.2 SSD
- PSU: be quiet! Dark Power 13 1000W Quiet Performance Power Supply | 80 Plus Titanium Efficiency | ATX 3.0 | PCIe 5
- OS: Windows 11 Home Edition (although this issue also occurred with W10 before I upgraded).

PLEASE HELP! I am going crazy because the $$$ for this system is causing so much anxiety and I don't know what to do. Whoever solves this, I will buy you lunch ;-)

Thanks in advance.
what cooler are you using there are some issues with the way these chips are held ive mainly found it in intel systems where i helped a previous user fix similar crashing and it was mainly tied down to the lever system they use on these chips puts a lot of uneven strain on the chip so it doesn't make proper contact with pins sometimes am5 isnt as bad as intel but it still has some issues due to it.
View: https://www.youtube.com/watch?v=tSzKW53FQt8

and while some coolers say there am5 compatible if there using the am5 backplate they may not be.

fix
https://www.amazon.co.uk/Thermalrig...81660&sprefix=am5+thermal+right,aps,93&sr=8-3