Question Strange Crash/Reboot, but only when system is left unattended ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Feb 27, 2024
19
0
20
Summary: After a fresh install of Win11, BIOS update system will crash/reboot but ONLY while not in use/unattended. Fresh install was done because of 3 year old OS and to remove vestiges of my school account from PC. Additional verbiage at bottom of post. I've done a ton of searching and can't locate anything that can give me a direction to go next.

Potentially of note: PC was co-located near my 3D printer and that might have cause my m.2 screw for my OS drive to loosen - unconfirmed however.

Quick Answers:
  • No new hardware installed; no unusual software packages (O365, Steam, Adobe)
  • Pulled drivers from ASRock for Mobo
  • Pulled chipset driver updater from AMD
  • Do not have any DMP files, mini or other, as they aren't being created
Troubleshooting steps:
  • sfc scannow
    • Found and repaired corrupted files
  • Ran an app called Heavy Load to 100% my CPU to check temps and such
    • Nothing to report; CPU stays around 76/77 C and no BSoD or crashes
  • Did a simple GPU test using a tool from matthew-x83 online GPU test
    • Ran this with 1000 objects for 10 minutes with no issues
    • Gaming presents without crashes as well
  • Checked for loose/unseated devices (near my 3D printer which is a huge vibration source)
    • Found the screw for my OS m.2 was loose; tightened
  • Confirmed PSU fan was, in fact, working
  • Tested PC with only 3 USB connections active on mainboard and not add-in (K+M, Headset receiver)
    • No change in issue
  • Used the PC for over 8 hours without incident. Within 25 minutes of walking away, crash/reboot
  • Export of Critical/Severe items from Event Viewer: I can provide this through several means, but it's XLSX right now and I know that can be suspect when it comes to file sharing. I have included text from the two Criticals that I see regularly.
    • WHEA-Logger Event Details:

      WHEA-Logger, Event ID 18
      A fatal hardware error has occurred.
      Reported by component: Processor Core
      Error Source: Machine Check Exception
      Error Type: Cache Hierarchy Error
      Processor APIC ID: 10
      The details view of this entry contains further information.
      WHEA-Logger, Event ID 18
      A fatal hardware error has occurred.
      Reported by component: Processor Core
      Error Source: Machine Check Exception
      Error Type: Bus/Interconnect Error
      Processor APIC ID: 0
      The details view of this entry contains further information.

Some Steps I have not done yet:
  • Pulled CMOS/Reset BIOS to default settings
  • Tested RAM using any tool - not sure what tool to use
  • Testing internal components - not sure what the best method is
  • Removed any internal components
System:
Speccy Snapshot: http://speccy.piriform.com/results/yPnIouF1MUrTpHD2xtImPBD

ASRock X570 Phantom Gaming 4 | BIOS v5.6

AMD Ryzen 9 5950X

64GB XPG D50 3200MHz

MSI GeForce RTX™ 3090 GAMING X TRIO 24G | Driver Version: 27.21.14.5671

500GB XPG Spectrix S40G (m.2)

1TB ADATA SX8200PNP (m.2)

1TB TEAMGROUP TM8FP4001T (PCI-E)

PCI-E USB Add-in Card

Thermaltake Toughpower Grand 750

Windows 11, Version 23H2 (Build 22631.3155)


Decided to wipe system because Windows (I had issues cropping up from their crappy updates). I downloaded latest bios, chipset drivers, and stand-alone drivers from ASRock (my mobo) and reinstalled Windows. Everything seemed great, no issues during flashing nor reinstall. Drivers picked up and I began reinstalling my software. Once I had this finished I noticed that my monitors wouldn't go into standby nor the PC itself into sleep - so I began working on this issue. Found that it was something to do with Steam and noticed some janky behavior as I was trying to change Power plan settings. Ran SFC and it found corrupted files - this fixed the standby/sleep issue (I don't use sleep but for testing purposes I was).

This brings me to the current dilemma; my PC will crash and reboot (or attempt to reboot) when I am not actively using it. I have used the PC for 10 hours straight, gaming, school work, browsing, without so much as a new entry being added to my event viewer yet within 20 minutes of me walking away it will crash/reboot. I have no dump files, minidumps, nothing being created when the machine crashes/reboots yet I know it's happening because my peripherals reset and I have to log back in.
 
Last edited:
Solution
Leaving this here for posterity; some other poor soul may find this useful...



Well, I don't know if I would call it "figured out" yet as I haven't hit 24 hours of stability, but, I'm not far off and so far so good.

So, depending on main board and processor your mileage may vary but here is what I posted in another thread on the Reddits:



For anyone still looking at this thread, I am experiencing WHEA-Logger errors and have replaced my mobo, PSU, primary and secondary m.2 drives so far. Previously, I had used the solution located here (https://www.reddit.com/r/Amd/s/kcu0mkyFbH - Change Power Supply Idle Control to Typical) which removed most of them. I just now disabled Core Performance Boost and Global C-State...
Replaced my motherboard with a B550-A ROG Strix and replaced my PSU with a ROG Strix 800W.

Replaced my OS and second m.2 drives with TEAMGROUP TM8FPW001T0C101 1TB, leaving my existing TEAMGROUP and my SATA Seagate still in the system.

Quite literally the ONLY remaining items from the previous build are those two drives and my RAM, and my processor.

I'm losing my mind and my hope - I need the computer for school among other things and I can't get it stable. Even the info that I got from the reddit post in post #23 was applied to the new MoBo and BIOS and I am getting WHEA errors again and the crash/lockup is still happening.

I'm going to swap back to my old RAM (it was replaced in the last year) and see if that changes anything.

When bringing my PC back online after the mobo swap the only settings I changed in my BIOS were the RAM speed and turning Fast Boot off.

New Speccy Snapshot: http://speccy.piriform.com/results/eSsxUERp8KqCEfTBBsSmrfj
 
Leaving this here for posterity; some other poor soul may find this useful...



Well, I don't know if I would call it "figured out" yet as I haven't hit 24 hours of stability, but, I'm not far off and so far so good.

So, depending on main board and processor your mileage may vary but here is what I posted in another thread on the Reddits:



For anyone still looking at this thread, I am experiencing WHEA-Logger errors and have replaced my mobo, PSU, primary and secondary m.2 drives so far. Previously, I had used the solution located here (https://www.reddit.com/r/Amd/s/kcu0mkyFbH - Change Power Supply Idle Control to Typical) which removed most of them. I just now disabled Core Performance Boost and Global C-State Control so we shall see if that removes the crashes. All this on a fresh install of Win11, fresh chipset drivers and drivers from ASUS.
While these crashes are happening under load or at idle it would seem they are all power delivery related, meaning, how the CPU and BIOS are handling power delivery. Either a core or cores is getting too much or too little voltage and causing momentary "brain farts" that lead to crashing, BSODs, or just simply lock-ups/hangs (of which I was experiencing).

Additional Reading:

https://forums.tomshardware.com/threads/whea-log-event-id-18.3779101/#post-22809514

https://www.techpowerup.com/forums/threads/random-reboots-and-cache-hierarchy-error.313090/

My System (Currently):

ASUS ROG Strix B550-A

AMD Ryzen 9 5950X

64GB XPG D50 3200MHz

MSI GeForce RTX™ 3090 GAMING X TRIO 24G

1TB TEAMGROUP TM8FPW001T (m.2)

1TB TEAMGROUP TM8FPW001T (m.2)

1TB TEAMGROUP TM8FP4001T (PCI-E)

PCI-E USB Add-in Card

ASUS ROG Strix 850w 80+GOLD

Windows 11 Pro, Version 23H2 (Build 22631.3155)

These settings are located under Advanced menu in the BIOS - depending on your motherboard you may have to hunt for them a little. Also, read the other posts that I linked since they are all very similar. I ran prime95 for almost 2 hours and not a single core was off from the others.



EDIT: Just a troubleshooting step that I had used - check to see if you have DMP or Dump files in C:\Windows or Minidump files in C:\Windows\Minidump. This led me to look into hardware cause instead of software or OS. My system wasn't creating them, ever. And be sure to not go nuts trying to solve every problem that may be listed in the Event Viewer - Microsoft breaks their own stuff often.
 
Solution

TRENDING THREADS