Question She was running so well...

Bjg13190

Prominent
Nov 12, 2021
30
1
545
Built a PC in August.

Specs:

MSI B550 Gaming Edge wifi
Ryzen 5600x
16gb corsair vengeance pro
Aorus RTX 3070
Corsair 750w PSU
Samsung m.2 980 ssd
Seagate 2TB HDD

This thing has run like a dream until november 5th when I started getting random crashes while both gaming as well as just perusing the internet. Each crash would give a WHEA-Logger Error code of:

A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

After attempting several fixes:
-default BIOS setting (no OC or PBO)
-update BIOS
-uninstall/reinstall chipset and gpu drivers

The issues persisted to the point now where the system will attempt to boot 3 times and then proceed to recovery mode. When I attempt anything in recovery mode (reset pc, sfc /scannow in CMD) it will attempt those processes and then sometime during those it will restart again...into the same loop..

I've removed and tested each ram stick individually, no change.

I've removed and reseated gpu

I've unplugged HDD and attempted boot

I've ensured all connections are seated properly on mobo on psu

...anyone got anything on this one? Any help is appreciated.
 

Ralston18

Titan
Moderator
Are you able to get into BIOS? If so, set POST to display everything that it is doing and what the results are. Look for a "Verbose" setting.

Will the PC successfully boot into Safe Mode?

Once booted into Safe Mode check Reliability History and Event Viewer. Either one or both may be capturing errors related to the observed crashes.
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
Are you able to get into BIOS? If so, set POST to display everything that it is doing and what the results are. Look for a "Verbose" setting.

Will the PC successfully boot into Safe Mode?

Once booted into Safe Mode check Reliability History and Event Viewer. Either one or both may be capturing errors related to the observed crashes.

I can get in BIOS, though the only POST option I see is to enable the POST "beep" during startup.

When I attempt to boot into safe mode it will OCCASSIONALLY reach the Windows lock screen, and then immediately reboot again into a loop until it gets back to recovery mode.
 

Ralston18

Titan
Moderator
Regarding POST what key(s) are being used?

Reference:

https://smallbusiness.chron.com/bios-msi-motherboard-54604.html

= = = =

Does the following link present the applicable User Guide/Manual for your motherboard?

https://download.msi.com/archive/mnu_exe/mb/E7C91v1.2.pdf

[Do verify that I found the correct User Guide/Manual.]

Double check the current system configuration beginning on physically numbered Page 45.

Any beeps/ beep codes ? Reference Page 66.

You may need to clear CMOS to reset everything to default values. Then reconfigure thereafter.
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
Regarding POST what key(s) are being used?

Reference:

https://smallbusiness.chron.com/bios-msi-motherboard-54604.html

= = = =

Does the following link present the applicable User Guide/Manual for your motherboard?

https://download.msi.com/archive/mnu_exe/mb/E7C91v1.2.pdf

[Do verify that I found the correct User Guide/Manual.]

Double check the current system configuration beginning on physically numbered Page 45.

Any beeps/ beep codes ? Reference Page 66.

You may need to clear CMOS to reset everything to default values. Then reconfigure thereafter.

Yes, that is the correct manual. When I turn on/change the setting to enable POST beeps and then save changes and exit, the system reboots and I get no beeps at start up. Sometimes it reaches the windows login screen, other times it loops to recovery mode still.

Should I pull the CMOS and replace it after 10 minutes?
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
So...from recovery mode I recovered to a restore point from 5 November and the PC was able to boot. However, the intermittent restart issue with the error code:

A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

Still exists. Is this something a clean install of Windows 10 would fix?
 
Nov 4, 2021
59
2
45
I think the issue is related to your CPU. In the BIOS Hardware Monitor, does it show any overheating components or voltage irregularities? Also, have you checked that your CPUs thermal paste is applied properly and fan is seated correctly?
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
I think the issue is related to your CPU. In the BIOS Hardware Monitor, does it show any overheating components or voltage irregularities? Also, have you checked that your CPUs thermal paste is applied properly and fan is seated correctly?

Hardware monitor shows no overheating at all, even using monitor software under gaming load I see no abnormal temps.

I have a Kraken x53 radiator cooler for CPU and it comes with CPU paste pre-applied. It ran smooth and very very good temps over the past 3 months and still seems to not be the culprit

-

I now have the system up and continue to try to run:

sfc /scannow

AND

DISM /Online /Cleanup-Image /Restoreheath

Both commands run for a bit and then crash the pc.

I'm not sure if this is a registry issue? Not well versed here.

EDIT:

I'm not quite sure how to monitor for voltage irregularities (aka what to look for).
 
Last edited:
Nov 4, 2021
59
2
45
Do you mean specifically an external drive with a copy of Windows on it and then try to run in that environment to see if issues persist?
I mean connect a spare drive (if you have one) to the motherboard, then load windows onto it. (Like a fresh install) But leave all other drives disconnected.

Someone had an issue similar to yours a few days ago and it was a bad NVME drive causing the system to crash.
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
I mean connect a spare drive (if you have one) to the motherboard, then load windows onto it. (Like a fresh install) But leave all other drives disconnected.

Someone had an issue similar to yours a few days ago and it was a bad NVME drive causing the system to crash.

Okay I got you. So I ran a disk check on the NVMe and it came back clean. Is that indicative that the drive is fine? Or should I still troubleshoot it. I have a 2gb HDD on the mobo for storage but I'd have to make windows 10 load media to put it on there on a usb or something.
 
Nov 4, 2021
59
2
45
It could still potentially be a drive issue, despite a clean disk check. I was suggesting that so we could try to narrow down the issue a bit.

Since it might be a hassle, see if you can try to repair the system files first. I know you already tried, but try to complete it like the disk check.
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
It could still potentially be a drive issue, despite a clean disk check. I was suggesting that so we could try to narrow down the issue a bit.

Since it might be a hassle, see if you can try to repair the system files first. I know you already tried, but try to complete it like the disk check.

Went ahead and tried a few times again to repair, no joy.

Just did a clean windows install and am currently removing the windows.old directory from the ssd. No issue yet with the clean install, going to let a game idle for a few minutes in the background to see if I can replicate the original issue.
 

Ralston18

Titan
Moderator
While letting the system and the game idle observe system performance using Resource Monitor and Task Manager.

Use both tools but only one at a time.

Then leave the observation window open and continue to observe while actually playing the game.

Try to identify any changes that occur: What resources are being used, to what extent (%), and what may be using any given resource.

Play normally but remember you are playing to troubleshoot and not necessarily to win.
 

Bjg13190

Prominent
Nov 12, 2021
30
1
545
While letting the system and the game idle observe system performance using Resource Monitor and Task Manager.

Use both tools but only one at a time.

Then leave the observation window open and continue to observe while actually playing the game.

Try to identify any changes that occur: What resources are being used, to what extent (%), and what may be using any given resource.

Play normally but remember you are playing to troubleshoot and not necessarily to win.

Currently back into the boot loop, I let the system run over night, no reboots or anything in event viewer.

Today I installed Steam and Hades and launched Hades. The system allowed me to play for about 1.5 hours before the first crash occurred again.

I had task manager open and the only abnormality I could see in the brief second before the shut off was the kb/s of the SSD processing shot to (or beyond) 100%.

All other hardware looked normal. Low temp and usage on CPU and GPU. Plenty of available RAM while playing.

Seems unlikely to be the PSU as nothing fails/stops running in the PC while the reboots occur. It doesnt lose power at any time.

Right now I'm thinking CPU/MOBO/SSD, one of these three, but dont know enough

EDIT: Also of note, the CPU Max Frequency us at or above 100% at all times with no overclocking. Unsure if that is relevant information.

EDIT 2: Attempted to run chkdsk on SSD, but while it runs the system crashes at random points and it starts again at each boot up..super frustrating
 
Last edited:

Bjg13190

Prominent
Nov 12, 2021
30
1
545
Remember that PSUs provide three different voltages (3, 5, and 12) to different system components.

It only takes oen faltering voltage rail to cause all sorts of problem while giving the appearance that "all is well".

Ah okay. Well i installed Win10 on my 2TB HDD and the restarts persisted (even during the win 10 installation), so that probably rules out SSD issue..

I suppose it's down to CPU/MOBO/PSU now..

Adding BIOS hardware monitor screengrab for reasons:

BIOS Hardware Monitor

EDIT: Increased CPU core voltage to 1.3V. System stable at the moment. Going to test running a game once again.
 
Last edited:

Bjg13190

Prominent
Nov 12, 2021
30
1
545
Ah okay. Well i installed Win10 on my 2TB HDD and the restarts persisted (even during the win 10 installation), so that probably rules out SSD issue..

I suppose it's down to CPU/MOBO/PSU now..

Adding BIOS hardware monitor screengrab for reasons:

BIOS Hardware Monitor

EDIT: Increased CPU core voltage to 1.3V. System stable at the moment. Going to test running a game once again.

So, I've only had 1 crash since the voltage increase and that was when I downloaded and attempted to install the latest windows 10 21h1 update. I restored to a point prior to the update and it has been stable since then...so far.
 
  • Like
Reactions: Diflexster12