Question Unexpected Reboots, Possible Hardware Fault

The_Ginja_Ninja

Reputable
Nov 17, 2020
117
18
4,615
My PC has been running well since built in November 2020. Since this week, I have had complete crashes (straight to reboot, no BSOD). This has only happened to date whilst gaming, but I am experiencing them more and more.

PC specs below. No OC on CPU (nor has there ever been)

AMD Ryzen 9 3900X, MSI Mag X570 Tomahawk, 32GB RAM 3600MHz CL18 Vengeance LPX, Zotac Amp Holo RTX 3080
Antec Earthwatts 750W Gold Pro, Cougar Aqua 240 AIO, 7 x 120mm Fans
Sabrent Rocket Q4 1TB NVMe, WD Blue 250GB NVMe, 2 x TCSunbow 1TB SSD, Seagate Firecuda 1TB SSHD, Segata Barracuda Green 2TB
Soundblast ZxR
Corsair Airflow 275R
Corsair K55, GigaByte G27Q, Dell P2719H, Dell U2419H


To date I have done the following:
  • Latest chipset drivers installed
  • Latest BIOS installed
  • Removed OC from GPU
  • sfc /scannow (all reported as being fine)
I have no idea whether this is CPU, memory, GPU, mobo or PSU. I'm reluctant to start replacing parts randomly, so any advice gratefully received.

I see the following logs in event viewer, but there is nothing immediately preceding the crash:

Warning - WHEA Logger, Event 19
A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

The details view of this entry contains further information

Critical - Kernel-Power (visible after crash)
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Error - WHEA Logger, Event 19
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

The details view of this entry contains further information.

The following log is from 15th Feb and was around the time I experienced the first crash (I have unplugged the headset to rule out). This log has not re-appeared.

Critical - DriverFramework-UserMode
The device HID-compliant headset (location (unknown)) is offline due to a user-mode driver crash. Windows will attempt to restart the device 5 more times. Please contact the device manufacturer for more information about this problem.
 
Last edited:

Eximo

Titan
Ambassador
Just as a point of advice: It is fine to put your build in your signature, but that changes universally when you change it. Best to post your specs in the question for posterity.

Monitor your temperatures, as always. An improperly installed AIO could easily let your CPU get a little too warm and become unstable. Air bubbles can also get in the pump, so shake it around/tilt the case while it is running. (If temperatures are high)
In this case, probably run Memtestx86 to rule out ram instability. Highly possible. I know 3600Mhz isn't exactly blazing fast, but it could still just need a little extra voltage or an adjustment to the timings to be better. (Weirdly, overclocking might help you here as it will get you to touch most of the relevant settings)

Audio drivers are amongst the most prone to causing problems, check with realtek to see if they have anything newer then MSI for the ALC1200.

Given the errors related to the CPU, try re-inserting all PCIe devices. A loose one could be causing an issue. As the system warms up under a gaming load, might be losing contact somewhere.
 

The_Ginja_Ninja

Reputable
Nov 17, 2020
117
18
4,615
Just as a point of advice: It is fine to put your build in your signature, but that changes universally when you change it. Best to post your specs in the question for posterity.

Monitor your temperatures, as always. An improperly installed AIO could easily let your CPU get a little too warm and become unstable. Air bubbles can also get in the pump, so shake it around/tilt the case while it is running. (If temperatures are high)
In this case, probably run Memtestx86 to rule out ram instability. Highly possible. I know 3600Mhz isn't exactly blazing fast, but it could still just need a little extra voltage or an adjustment to the timings to be better. (Weirdly, overclocking might help you here as it will get you to touch most of the relevant settings)

Audio drivers are amongst the most prone to causing problems, check with realtek to see if they have anything newer then MSI for the ALC1200.

Given the errors related to the CPU, try re-inserting all PCIe devices. A loose one could be causing an issue. As the system warms up under a gaming load, might be losing contact somewhere.

Thanks for all your tips, updates below.

I've updated original post with PC specs - I see your point about signature being mutable.

Generally whilst gaming I have MSI Afterburner open on another screen to monitor temps. There has been nothing scary there (CPU max 66, GPU max 70).

I have run memtestx86 for 15 mins - no errors. I have not tried OC'ing RAM as I'm unsure how to do this (other than turning XMP on which has been turned on since day 1).

Onboard audio is disabled in BIOS as I have the Creative Soundblaster ZXR Card.

Regarding PCI cards, I have removed and re-inserted the sound card as I had to move it reset the CMOS after a failed initial attempt to flash BIOS using Dragon Centre. (Subsequent flash from BIOS using USB was successful). I have not re-inserted the GPU - however I have given it a bit of gentle pressure to make sure it's seated. Gently rocking the PC does not seem to cause any issues.

Any further ideas or suggestions?
 

TheJoker2020

Commendable
Oct 13, 2020
219
64
1,690
If the problems are becoming more frequent this strongly suggests that the problem is getting worse e.g. something is failing.

IMHO, this is most likely a drive.


Try disconnecting all drives except your boot drive, if Steam and games are not on there (you only need one game to test with) you can move the data location of a game to the boot drive and test again. Backup data first.

Good luck.
 

The_Ginja_Ninja

Reputable
Nov 17, 2020
117
18
4,615
If the problems are becoming more frequent this strongly suggests that the problem is getting worse e.g. something is failing.

IMHO, this is most likely a drive.


Try disconnecting all drives except your boot drive, if Steam and games are not on there (you only need one game to test with) you can move the data location of a game to the boot drive and test again. Backup data first.

Good luck.
Interesting thought. For reasons I can't really tell (call it intuition), I suspect that my Sabrent Rocket is not a 'good drive'. Just happens to be my boot drive too (and the newest drive in there).
 

TheJoker2020

Commendable
Oct 13, 2020
219
64
1,690
Interesting thought. For reasons I can't really tell (call it intuition), I suspect that my Sabrent Rocket is not a 'good drive'. Just happens to be my boot drive too (and the newest drive in there).
Entirely possible.

Do you have Acronis.? If so, backup another drive, image it to that drive, pull the Sabrent Rocket and see what happens.

Drive problems are a PITA, especially if it is the boot drive.

Personally, I have a small boot drive with my data and programs on, a 2nd SSD with all of my Games on, and a RAID10 array for backups, Acronis images and random stuff. If my games drive dies it will be a pain, but that is backed up to a USB drive (infrequently), losing everything in one go is IMHO a stressful nightmare that I have taken pains to mitigate should that happen.

I have only seen one failed SSD (plus one DOA) in many years and having used and supplied dozens. Compared to Hard Drives, SSD's are incredibly reliable, but the downside is that they are difficult to test for weird problems (like yours perhaps) and they generally just die out of the blue with zero chance of any data recovery. This gives people a false sense of security and also makes regular backups all the more important.
 

The_Ginja_Ninja

Reputable
Nov 17, 2020
117
18
4,615
Entirely possible.

Do you have Acronis.? If so, backup another drive, image it to that drive, pull the Sabrent Rocket and see what happens.

Drive problems are a PITA, especially if it is the boot drive.

Personally, I have a small boot drive with my data and programs on, a 2nd SSD with all of my Games on, and a RAID10 array for backups, Acronis images and random stuff. If my games drive dies it will be a pain, but that is backed up to a USB drive (infrequently), losing everything in one go is IMHO a stressful nightmare that I have taken pains to mitigate should that happen.

Fingers crossed, I have only seen one failed SSD in many years and having used and supplied dozens (plus one DOA). Compared to Hard Drives, SSD's are incredibly reliable, but the downside is that they are difficult to test for weird problems (like yours perhaps) and they generally just die out of the blue with zero chance of any data recovery. This gives people a false sense of security and also makes regular backups all the more important.
Generally, my HDDs aren't in use at all (used for MP3 collection and photos). If an SSD (SATA or NVMe) were failing, would I not expect to see more crashes? I use my PC all day Mon-Fri (light use) and harder gaming use some evenings and weekends. HDDs and documents are all backed up with Google drive. Worst case of failure is a fresh Windows and software install.