Question Power-related crashes despite PSU replacement, I'm at a loss!

Sklarlight

Distinguished
May 25, 2014
11
3
18,515
Hi! I think this might be my best port of call, most of my problems have been solved by a thread on Tom's Hardware, and I'm hoping making my own will result in the same! (Apologies if this is in the wrong area.)

As a summary, frequent, random crashes (no BSoD, standard kernel-power failure due to the unexpected crash) - which became much more frequent in the last week. Before, it could be once a month or rarer, now it's alarmingly frequent to the point I cannot use my PC.

I suspected my PSU, (EVGA 750 G2) as it was coming up to the end of its expected lifespan. I replaced it (ROG STRIX 1200 Aura) and the problems are still occuring. I reseated all my components, replaced the thermal paste on my CPU, and tested with one RAM stick at a time in the second slot, to no avail. I used stress testing software for my CPU and GPU, as well as memory diagnostic tests (both Windows' own and third party ones) which came back clear. I used OCCT, and all the tests were clear except for one, the Power test. I'm not sure why it's crashing when I start the Power test, but it does. Any help would be hugely appreciated if possible! Thank you so much. Happy to answer any questions that would help with troubleshooting.

Update 1: I just performed a combined CPU & GPU test and it crashed on starting the test.

Update 2: CPU test now causes crash. Event Viewer highlights the WHEA error as: "A corrected hardware error has occurred. Component: Memory, Error Source: Corrected Machine Check." HWiNFO highlights the issue as a CPU Cache L1 Error. I can't tell if this means the RAM is the problem, or the CPU? Or both?


Build:
MB: ROG STRIX B550-F GAMING
CPU: AMD Ryzen 7 5800X
GPU: MSI GeForce RTX 2070
RAM: 4x16GB Corsair Vengeance DDR4
PSU: EVGA 750G2 (Previous) ROG STRIX 1200 Aura (Current)
HDDs installed: 3
M.2 SSD: 1
 
Last edited:
I'll go through a list of what I would do in a situation like yours, wouldn't matter what order, but I'll top go through the most easy to try things first.

1) Temps, what are the temps of everything? m.2 SSD temps, CPU temps, GPU temps, motherboard temps, all of witch can cause a sudden shutdown in some cases.

2) RAM, This to me would be higher up on the list of things to try, What speed are you running your ram at, since you got 4 sticks, and a 5800x, you may need to set ram to 3200mhz if you are running higher speeds, might even have to go lower, 4 sticks of ram is harder on the memory controller on the CPU. You could even try running with 2 sticks of ram for a while, see if all the problems go away.

3) Reset bios to factory defaults, don't even bother setting the XMP/DOCP and see if tis stable, if so, then play with the ram speed and see if its stays stable.

4) Update bios, though unlikely, a updated bios can help stability is some cases, stay away from the beta bios.

5) Last resort, reinstall windows, can try a different drive just to see if everything is stable after its installed, if you go that route, unplug your main drive that you have windows already on so windows don't try to screw that one up when you go to install onto a different drive, would unplug all of the drives other than the one you want to test a new windows install on.

6) You could be unlucky and got a bad 5800x, seen some of these have a failing memory controller and cause a crash, replacing the CPU should fix it if thats the case.

Good Luck!
 
  • Like
Reactions: Sklarlight
I'll go through a list of what I would do in a situation like yours, wouldn't matter what order, but I'll top go through the most easy to try things first.

1) Temps, what are the temps of everything? m.2 SSD temps, CPU temps, GPU temps, motherboard temps, all of witch can cause a sudden shutdown in some cases.

2) RAM, This to me would be higher up on the list of things to try, What speed are you running your ram at, since you got 4 sticks, and a 5800x, you may need to set ram to 3200mhz if you are running higher speeds, might even have to go lower, 4 sticks of ram is harder on the memory controller on the CPU. You could even try running with 2 sticks of ram for a while, see if all the problems go away.

3) Reset bios to factory defaults, don't even bother setting the XMP/DOCP and see if tis stable, if so, then play with the ram speed and see if its stays stable.

4) Update bios, though unlikely, a updated bios can help stability is some cases, stay away from the beta bios.

5) Last resort, reinstall windows, can try a different drive just to see if everything is stable after its installed, if you go that route, unplug your main drive that you have windows already on so windows don't try to screw that one up when you go to install onto a different drive, would unplug all of the drives other than the one you want to test a new windows install on.

6) You could be unlucky and got a bad 5800x, seen some of these have a failing memory controller and cause a crash, replacing the CPU should fix it if thats the case.

Good Luck!

Thank you for helping! Temps seem good, CPU is resting at 30-40 degrees, GPU and m.2 SSD around the same, idling temps. During the tests in which they'd get hot, all seemed okay, never got to a point in which it seemed like an overheating crash.

Windows was cleanly reinstalled and the BIOS was flashed and fully updated and reset to defaults, sadly no difference in the crashes. As far as I know, I've always had them set to their defaults but I'll look into the RAM speeds.

I've had the 5800x for 3 years now, if it's become a problem more recently, that's genuinely a shame.

I'm going to try and run on just 2 RAM sticks in the second and fourth slots and see how I get on, thank you again!

I've also updated my original post since I've learned more information from HWiNFO and Event Viewer from my OCCT tests.

Edit: I'm currently running a Power test again (CPU & GPU) and it's been going for 6 minutes so far without a crash (before, it would crash at 3 seconds) - I'm not sure what's different, but it's still reporting WHEA errors during the test. (6 as of 6 minutes.)
 
Last edited:
  • Like
Reactions: Viking2121
Ram is a weird thing, when its not stable, it can make things in windows crash making it look like its the program or file, It can make video drivers crash, I recently ran into system where I thought the SSD was going bad, things would just not get written to the SSD even though it showed it was written, and here it was a stick of ram, that problem was a fun one, let me tell ya lol.

I'm still strongly convinced its a ram or memory controller issue myself, since they are 16gb sticks, might even be a good idea to run 1 stick at a time, see if the errors go away or come back depending on the stick or channel.

Thats one thing I hate in the 20 years I've been working on PC's is memory/RAM issues, they always make it look like its something else that has the problem if its not a blatant issue.
 
  • Like
Reactions: Sklarlight
Ram is a weird thing, when its not stable, it can make things in windows crash making it look like its the program or file, It can make video drivers crash, I recently ran into system where I thought the SSD was going bad, things would just not get written to the SSD even though it showed it was written, and here it was a stick of ram, that problem was a fun one, let me tell ya lol.

I'm still strongly convinced its a ram or memory controller issue myself, since they are 16gb sticks, might even be a good idea to run 1 stick at a time, see if the errors go away or come back depending on the stick or channel.

Thats one thing I hate in the 20 years I've been working on PC's is memory/RAM issues, they always make it look like its something else that has the problem if its not a blatant issue.
Hopefully it's just a case of unstable RAM, I'm not sure why it's appeared now out of the blue without any changes, though! I guess that's just the nature of it, and perhaps BIOS & Windows updates indirectly played a part? I'd love to configure the RAM so that these crashes are resolved, that would be much more preferable than replacing a part, which I'm willing to do but would prefer to avoid.

I did another CPU test, since that's what was triggering crashes, this time it succeeded and it came back with just over 200 WHEA errors, all in the "CPU Cache L1" which I've read may be due to undervolting? I have everything set to default but I'll see how to go about adjusting this and see if that helps. It doesn't look like the cause of the crash, though, although I can't be 100% sure. Thank you so much again for your help and support, I appreciate it and hope I can get to the bottom of this! I'll share any other relevant updates just in case others face similar issues and are on the hunt for a solution. More than happy to answer any other troubleshooting-related questions to see if that helps identify the cause as well.
 
Hopefully it's just a case of unstable RAM, I'm not sure why it's appeared now out of the blue without any changes,
Have you run a thorough RAM test using MemTest86+ booted from a USB stick? With 64GB, a complete test could take 3 to 4 hours, but it's worth taking the time to see if the RAM is OK.
https://memtest.org/

If MemTest86+ throws up any errors, reduce the XMP overclock. If the RAM still fails at stock JEDEC 2133MT/s, scrap the offending DIMM(s). If you test all four DIMMs at once, you may need to re-test each stick on its own, to locate the offending part(s). If you change XMP speed, run MemTest86+ again.

I've got four Vengeance DDR4-3000 DIMMs in an old 3800X rig and they're stable at 3000MT/s, but other people on this forum report problems with Vengeance and Ryzens.

Faults on RAM can appear out of the blue if they suffered ESD (static) damage at any time in their life, e.g. during fitting or removal. Problems may not appear for months or years.
 
  • Like
Reactions: Sklarlight
If the issue is RAM related it will almost always cause Windows corruption and this can have all sorts of weird symptoms which eventually leads to crashing. There is a scan that can be done in an admin elevated CMD that checks windows for this type of corruption and attempts to fix it. If you have bad RAM I would say the odds that this scan comes back saying corruption was found is nearly 100%.

1. Run CMD as admin by right clicking the shortcut.
2. Type, or copy paste the following command in quotes and press enter: "sfc /scannow"

If the scan comes back clean this is a strong indication that the RAM sticks are fine, though its not impossible to get a clean scan here with bad RAM. This scan takes about 2-5 minutes to complete.

As @geofelt above suggested, this very well could still be some sort of software/driver related issue rather than a hardware fault. I suggest doing a "clean boot" of windows detailed here to rule out as many software issues as possible. Run your tests after the clean boot to check if there is any improvement. Make sure to revert the changes from the clean boot after the testing if you prefer.

Report back after these short tests!
 
  • Like
Reactions: Sklarlight
Have you run a thorough RAM test using MemTest86+ booted from a USB stick? With 64GB, a complete test could take 3 to 4 hours, but it's worth taking the time to see if the RAM is OK.
https://memtest.org/

If MemTest86+ throws up any errors, reduce the XMP overclock. If the RAM still fails at stock JEDEC 2133MT/s, scrap the offending DIMM(s). If you test all four DIMMs at once, you may need to re-test each stick on its own, to locate the offending part(s). If you change XMP speed, run MemTest86+ again.

I've got four Vengeance DDR4-3000 DIMMs in an old 3800X rig and they're stable at 3000MT/s, but other people on this forum report problems with Vengeance and Ryzens.

Faults on RAM can appear out of the blue if they suffered ESD (static) damage at any time in their life, e.g. during fitting or removal. Problems may not appear for months or years.
I've performed a MemTest86+ check with all four sticks and it's been absolutely fine, however I haven't tried it with them individually, so I can certainly give that ago just in case. As far as I am aware, everything is set to their defaults or Auto in the BIOS, so there shouldn't be any XMP-related adjustments.

What might have changed recently?
What is different from when all was well?

I might expect something in the software arena,
A malware or virus, perhaps?
A windows update?
A new app?
Sadly it's really hard to pinpoint given that the crashes have gone on for a while but they were extremely rare until last week. Outside of a Windows 11 update earlier from Windows 10 and a BIOS update, nothing has really changed hardware wise. A completely clean installation of Windows as well, and the issue still occurs.

If the issue is RAM related it will almost always cause Windows corruption and this can have all sorts of weird symptoms which eventually leads to crashing. There is a scan that can be done in an admin elevated CMD that checks windows for this type of corruption and attempts to fix it. If you have bad RAM I would say the odds that this scan comes back saying corruption was found is nearly 100%.

1. Run CMD as admin by right clicking the shortcut.
2. Type, or copy paste the following command in quotes and press enter: "sfc /scannow"

If the scan comes back clean this is a strong indication that the RAM sticks are fine, though its not impossible to get a clean scan here with bad RAM. This scan takes about 2-5 minutes to complete.

As @geofelt above suggested, this very well could still be some sort of software/driver related issue rather than a hardware fault. I suggest doing a "clean boot" of windows detailed here to rule out as many software issues as possible. Run your tests after the clean boot to check if there is any improvement. Make sure to revert the changes from the clean boot after the testing if you prefer.

Report back after these short tests!
Before I did a clean reinstall of Windows, there were definitely corruption issues. I remember doing sfc and dism checks and the computer would outright crash during them. On the one occasion it did complete an sfc check, there were corruptions and it was unable to fix them and dism couldn't either, it reported an error on the component store being corrupted. After I performed a clean reinstall of Windows, sfc and dism checks were successful and completely fine, but the issue persisted. I can also confirm it still happens on a clean boot, frustratingly. It lasted longer, but it still happened eventually, randomly.
 
I've performed a MemTest86+ check with all four sticks and it's been absolutely fine, however I haven't tried it with them individually, so I can certainly give that ago just in case. As far as I am aware, everything is set to their defaults or Auto in the BIOS, so there shouldn't be any XMP-related adjustments.


Sadly it's really hard to pinpoint given that the crashes have gone on for a while but they were extremely rare until last week. Outside of a Windows 11 update earlier from Windows 10 and a BIOS update, nothing has really changed hardware wise. A completely clean installation of Windows as well, and the issue still occurs.


Before I did a clean reinstall of Windows, there were definitely corruption issues. I remember doing sfc and dism checks and the computer would outright crash during them. On the one occasion it did complete an sfc check, there were corruptions and it was unable to fix them and dism couldn't either, it reported an error on the component store being corrupted. After I performed a clean reinstall of Windows, sfc and dism checks were successful and completely fine, but the issue persisted. I can also confirm it still happens on a clean boot, frustratingly. It lasted longer, but it still happened eventually, randomly.
The only ways to get these types of corruption when you have a known good set of RAM is if the corruption is occurring on a different layer of memory or all of the data being sent to and from different components is not being received. This implicates the CPU, OS drive, and motherboard for being faulty. The issue is likely only one of these pieces of hardware. If I had to guess, I would say the CPU is most likely because of the types of errors being given. Unfortunately the only ways to reasonably test this is to swap suspected components out to see if this solves the issue. If anyone else has a better suggestion I am all ears.
 
  • Like
Reactions: Sklarlight
The only ways to get these types of corruption when you have a known good set of RAM is if the corruption is occurring on a different layer of memory or all of the data being sent to and from different components is not being received. This implicates the CPU, OS drive, and motherboard for being faulty. The issue is likely only one of these pieces of hardware. If I had to guess, I would say the CPU is most likely because of the types of errors being given. Unfortunately the only ways to reasonably test this is to swap suspected components out to see if this solves the issue. If anyone else has a better suggestion I am all ears.
I definitely want to test another CPU, that's my next port of call! I'll have to see if I can borrow one, or if it's worth trying to buy one for testing purposes.
 
Usually if you bring the PC into a local shop they would be able to help you out with that.
Absolutely, I've reached out to one to see if that's something that can be done. I live in quite a small area so it might be difficult to find something relatively local.

New PSU = ROG STRIX 1200 Aura - correct?

Did you use only the cables that came with that PSU?
Yes, absolutely, all cables from the new PSU, including the plug.

My computer has managed to stay on for all day today on just two sticks of RAM, I'm still getting WHEA warnings, but none have resulted in a crash. I'm still expecting it to happen any moment, but it's difficult to say.
 
Absolutely, I've reached out to one to see if that's something that can be done. I live in quite a small area so it might be difficult to find something relatively local.


Yes, absolutely, all cables from the new PSU, including the plug.

My computer has managed to stay on for all day today on just two sticks of RAM, I'm still getting WHEA warnings, but none have resulted in a crash. I'm still expecting it to happen any moment, but it's difficult to say.

update the bios if you never updated it when installing the 5800x it could be using a old bios.

this can cause weird behaviour if its using the wrong micro code.

also before updating bios set the bios to default settings save and restart.

then update your bios.

then turn back on xmp or whatever ram equivilent supported.
 
update the bios if you never updated it when installing the 5800x it could be using a old bios.

this can cause weird behaviour if its using the wrong micro code.

also before updating bios set the bios to default settings save and restart.

then update your bios.

then turn back on xmp or whatever ram equivilent supported.
Sorry if I forgot to mention it in the original post, the BIOS is fully updated, however everything is set to its defaults. Is it still worth enabling XMP (Or DOCP I believe it shows up as in the Asus Bios?)
 
Sorry if I forgot to mention it in the original post, the BIOS is fully updated, however everything is set to its defaults. Is it still worth enabling XMP (Or DOCP I believe it shows up as in the Asus Bios?)

to me its either a failing cpu ( rare or a quirk with motherboard).

personally going to go motherboard as i have a asus board and its been flaky on myself. which has been registering power errors. psu isnt issue as ive ran stress tests on it just fine just randomly reboots for zero reason.

im personally going to be changing my b550 board to asrock or gigabyte as ive had zero issues with these boards there bios seems to be more rock solid then asus and msi etc
 
to me its either a failing cpu ( rare or a quirk with motherboard).

personally going to go motherboard as i have a asus board and its been flaky on myself. which has been registering power errors. psu isnt issue as ive ran stress tests on it just fine just randomly reboots for zero reason.

im personally going to be changing my b550 board to asrock or gigabyte as ive had zero issues with these boards there bios seems to be more rock solid then asus and msi etc
I have not had any issues with any of the ASUS board I have or have used in builds for others besides the occasional BIOS bug soon after a platform launch. I got the x570-f from asus.
 
Sorry if I forgot to mention it in the original post, the BIOS is fully updated, however everything is set to its defaults. Is it still worth enabling XMP (Or DOCP I believe it shows up as in the Asus Bios?)
I forgot to mention that your CPU may still be in warranty from AMD. They have 3 year warranties. Here is the page I am referencing. It may be worth it to put in an RMA request ASAP for the CPU even if its slightly out of warranty. AMD is likely to send you a new one with few questions asked.
 
  • Like
Reactions: Sklarlight
I forgot to mention that your CPU may still be in warranty from AMD. They have 3 year warranties. Here is the page I am referencing. It may be worth it to put in an RMA request ASAP for the CPU even if its slightly out of warranty. AMD is likely to send you a new one with few questions asked.
Thank you! I had fully believed my CPU was put of warranty, I'm hoping it should still just about be covered, I've submitted an RMA request and I'll wait and see what they say.
 
  • Like
Reactions: helper800
With some help, it's believed that this might be a voltage issue with the CPU: https://mr-kayz.github.io/KayZ_TS_Wiki-Current_Tech_Issues/docs/Ryzen-AM4-bug - I've tried adjusting voltages as recommended there but sadly that didn't resolve it, so perhaps that's either only part of the problem, or something else entirely. Just wanted to share this in case it helps anybody else who may have stumbled across this thread in need of support.
 
  • Like
Reactions: helper800