Question Difficult to Replicate Crash - PSU Issue?

Jan 10, 2024
4
0
10
Recently, I upgraded my GPU and CPU and put an additional SSD in my PC and found that my old PSU was not sufficiently powerful for the job. After replacing my old PSU, I was still experiencing sporadic crashes and, after some troubleshooting, concluded I had received a defective PSU. However, I am still experiencing intermittent, instantaneous crashes to black and really just need some guidance as the chances of me receiving two defective PSU's in a row are vanishingly slim compared to the chance of this being an error of some kind on my part.

The crashes began after I upgraded my hardware. Initially, my old PSU was 100% the culprit for my crashes, as it was woefully underpowered for the job and I was receiving crashes to black under strain indicative of such. Afterwards, I upgraded to a new PSU and the consistent crashes under strain vanished. However, I now was dealing with extremely sporadic crashes with no indication, warning, or pattern. As before, they were instant shutdowns with no accompanying BSOD with the only error message to be found in the log being "error 41" - that of an unexpected shutdown. Sometimes, they would happen under strain (gaming, rendering videos) and other times they would happen for seemingly no reason at all (receiving a message notification from discord or telegram). At this point, I began using benchmarking software (OCCT, memtest, Prime95, Furmark, Heaven, Crystaldisk ) to try and see what the cause of my crashes was. Noticing that an OCCT power test always caused a crash to black almost instaneously and things like my CPU temperature were not at critical points and that the crash on power test persisted even after putting in my old GPU, I concluded that my PSU must be defective.

Thus, about two weeks ago, I exchanged my PSU for a new one of the same model. This PSU was undoubtedly immediately "better" than the previous one - less crashes and not instaneously crashing on an OCCT power test. However, I quickly came to realize that some version of the same problem was still at play; my PC would crash to black at random times and would fail an OCCT power test more often than not. As such, I naturally assumed that I was mistaken about the PSU being at fault (since these problems were persisting) and that whatever is going on must be because of some type of installation error or something of the sort on my end. I have been digging through forums and performing every type of fix I could find to no avail. In short, I have:

  1. Monitored my PC's performance using CPUID HWMonitor during regular usage.
  2. Run every type of stress test I could on OCCT
  3. Run GPU stress tests using Heaven Benchmark and FurMark
  4. Run CPU and GPU stress tests using cinebench
  5. Tested my CPU and memory with Prime95
  6. Tested my memory with memtest and the window's memory diagnostic tool
  7. Tested my HDD and SSDs with CrystalDisk
  8. Reseated my RAM, including trying only one stick at a time
  9. Made sure my BIOS was up-to-date (It was running the latest beta version at the time)
  10. Downgraded my BIOS to the latest stable version
  11. Tried undervolting my CPU
  12. Tried running my CPU in ecomode
  13. Tried overvolting my CPU
  14. Tried running my RAM in xmp profile
  15. Enabled Spread Spectrum in BIOS
  16. Set my power management mode to "prefer maximum performance" from NVIDIA control panel
  17. Uninstalled and reinstalled my graphic's drivers
  18. Rewired my PSU several times
  19. Made sure my GPU connection is in accordance with the manufacture's suggestion (really, making sure I was not using any daisychained cables where I shouldn't be)
I think this is the extent of it, though I have really scoured for as many potential fixes as I could find, so I may be forgetting some other BIOS changes I experimented with. Ultimately, running my CPU in ecomode decreased the frequency of the crashes and allows my PC to consistently pass the OCCT power test. Unfortunately, however, this has made the crashes completely impossible to recreate consistently. Now, I will just sometimes randomly crash to black with no warning. It is not under particularly intense stress and there is no indication of an inclimate crash (no slowdown, stuttering, etc.). I have found no conditions which make the crash exceptionally likely, though it tends to happen while playing games with 3D environments more than at any other time (however, it is still unpredictable). Importantly, however, if the PC crashes once, it will often crash on startup if I try powering it on again immediately.

As a last confirmation attempt, I purchased a PSU tester from amazon. According to the instructions, if any of the readings are flashing, it is a sign of a defective unit. When I plugged in my PSU, all of the readings were within expected ranges; all of the voltages were either exactly as they should be or extremely close to what they should be and my PG reading was consistently between 120 and 140 ms; however, my PG light was flickering regardless. Testing on my old PSU to see if this was just an issue with the tester, I found no such flickering.

I think this is a PSU issue but I just really need some confirmation, because at this point, it seems as though it is either my PSU or my motherboard and I just cannot believe the insane luck required to recieve two defective PSUs in a row like this. As well, I have never upgraded my own PC or done this type of troubleshooting before, so I am trying to exhaust all other possibilities to avoid getting a new PSU just to have this problem continue.

Here are my specs:
  • PSU: Corsair RM750x
  • CPU: Ryzen 9 5900X
  • GPU: Zotac Gaming NVIDIA GeForce RTX 4070Ti
  • Motherboard: ASRock B450M/ac
  • Storage: PNY CS900 500GB SSD, Samsung 870 EVO 2TB SSD, Western Digical WDC 1TB HDD
  • OS: Windows 11
  • RAM: 32 GB Dual-Channel DDR 4 (both Corsair)
At any given time, I have a mouse, keyboard, and an audio interface at a minimum in my USB ports.

Please let me know if any further information is needed and thank you in advance.
 
Recently, I upgraded my GPU and CPU and put an additional SSD in my PC and found that my old PSU was not sufficiently powerful for the job. After replacing my old PSU, I was still experiencing sporadic crashes and, after some troubleshooting, concluded I had received a defective PSU. However, I am still experiencing intermittent, instantaneous crashes to black and really just need some guidance as the chances of me receiving two defective PSU's in a row are vanishingly slim compared to the chance of this being an error of some kind on my part.

The crashes began after I upgraded my hardware. Initially, my old PSU was 100% the culprit for my crashes, as it was woefully underpowered for the job and I was receiving crashes to black under strain indicative of such. Afterwards, I upgraded to a new PSU and the consistent crashes under strain vanished. However, I now was dealing with extremely sporadic crashes with no indication, warning, or pattern. As before, they were instant shutdowns with no accompanying BSOD with the only error message to be found in the log being "error 41" - that of an unexpected shutdown. Sometimes, they would happen under strain (gaming, rendering videos) and other times they would happen for seemingly no reason at all (receiving a message notification from discord or telegram). At this point, I began using benchmarking software (OCCT, memtest, Prime95, Furmark, Heaven, Crystaldisk ) to try and see what the cause of my crashes was. Noticing that an OCCT power test always caused a crash to black almost instaneously and things like my CPU temperature were not at critical points and that the crash on power test persisted even after putting in my old GPU, I concluded that my PSU must be defective.

Thus, about two weeks ago, I exchanged my PSU for a new one of the same model. This PSU was undoubtedly immediately "better" than the previous one - less crashes and not instaneously crashing on an OCCT power test. However, I quickly came to realize that some version of the same problem was still at play; my PC would crash to black at random times and would fail an OCCT power test more often than not. As such, I naturally assumed that I was mistaken about the PSU being at fault (since these problems were persisting) and that whatever is going on must be because of some type of installation error or something of the sort on my end. I have been digging through forums and performing every type of fix I could find to no avail. In short, I have:

  1. Monitored my PC's performance using CPUID HWMonitor during regular usage.
  2. Run every type of stress test I could on OCCT
  3. Run GPU stress tests using Heaven Benchmark and FurMark
  4. Run CPU and GPU stress tests using cinebench
  5. Tested my CPU and memory with Prime95
  6. Tested my memory with memtest and the window's memory diagnostic tool
  7. Tested my HDD and SSDs with CrystalDisk
  8. Reseated my RAM, including trying only one stick at a time
  9. Made sure my BIOS was up-to-date (It was running the latest beta version at the time)
  10. Downgraded my BIOS to the latest stable version
  11. Tried undervolting my CPU
  12. Tried running my CPU in ecomode
  13. Tried overvolting my CPU
  14. Tried running my RAM in xmp profile
  15. Enabled Spread Spectrum in BIOS
  16. Set my power management mode to "prefer maximum performance" from NVIDIA control panel
  17. Uninstalled and reinstalled my graphic's drivers
  18. Rewired my PSU several times
  19. Made sure my GPU connection is in accordance with the manufacture's suggestion (really, making sure I was not using any daisychained cables where I shouldn't be)
I think this is the extent of it, though I have really scoured for as many potential fixes as I could find, so I may be forgetting some other BIOS changes I experimented with. Ultimately, running my CPU in ecomode decreased the frequency of the crashes and allows my PC to consistently pass the OCCT power test. Unfortunately, however, this has made the crashes completely impossible to recreate consistently. Now, I will just sometimes randomly crash to black with no warning. It is not under particularly intense stress and there is no indication of an inclimate crash (no slowdown, stuttering, etc.). I have found no conditions which make the crash exceptionally likely, though it tends to happen while playing games with 3D environments more than at any other time (however, it is still unpredictable). Importantly, however, if the PC crashes once, it will often crash on startup if I try powering it on again immediately.

As a last confirmation attempt, I purchased a PSU tester from amazon. According to the instructions, if any of the readings are flashing, it is a sign of a defective unit. When I plugged in my PSU, all of the readings were within expected ranges; all of the voltages were either exactly as they should be or extremely close to what they should be and my PG reading was consistently between 120 and 140 ms; however, my PG light was flickering regardless. Testing on my old PSU to see if this was just an issue with the tester, I found no such flickering.

I think this is a PSU issue but I just really need some confirmation, because at this point, it seems as though it is either my PSU or my motherboard and I just cannot believe the insane luck required to recieve two defective PSUs in a row like this. As well, I have never upgraded my own PC or done this type of troubleshooting before, so I am trying to exhaust all other possibilities to avoid getting a new PSU just to have this problem continue.

Here are my specs:
  • PSU: Corsair RM750x
  • CPU: Ryzen 9 5900X
  • GPU: Zotac Gaming NVIDIA GeForce RTX 4070Ti
  • Motherboard: ASRock B450M/ac
  • Storage: PNY CS900 500GB SSD, Samsung 870 EVO 2TB SSD, Western Digical WDC 1TB HDD
  • OS: Windows 11
  • RAM: 32 GB Dual-Channel DDR 4 (both Corsair)
At any given time, I have a mouse, keyboard, and an audio interface at a minimum in my USB ports.

Please let me know if any further information is needed and thank you in advance.
I would rather suspect VRM not being enough for that processor and losing power. Which CPU did it replace ? Power consumption is up to 125W at least. 6 and 8 core earlier models way under 100W. 3600x for instance barely 60W.
 
I would rather suspect VRM not being enough for that processor and losing power. Which CPU did it replace ? Power consumption is up to 125W at least. 6 and 8 core earlier models way under 100W. 3600x for instance barely 60W.
It was a very large CPU upgrade - previously, I had a 2800x. Without eco mode, you are right that the CPU wattage was about 125W. With eco mode, at 100% usage, it goes to about 90W (according to OCCT, at least).

Thanks for the insight. Is there any way I could further narrow down that the VRM is at fault? I'm inclined to believe it given the circumstances, but I'm unsure what it would entail.

If it helps at all, I will note that at no point during this process were any of the CPU stress tests enough to induce a crash on their own.
 
It was a very large CPU upgrade - previously, I had a 2800x. Without eco mode, you are right that the CPU wattage was about 125W. With eco mode, at 100% usage, it goes to about 90W (according to OCCT, at least).

Thanks for the insight. Is there any way I could further narrow down that the VRM is at fault? I'm inclined to believe it given the circumstances, but I'm unsure what it would entail.

If it helps at all, I will note that at no point during this process were any of the CPU stress tests enough to induce a crash on their own.
I didn't know 2800x was ever released , I had 1600x, 1700x, 2700x and 3700x at one time or other and were not a burden on any B and X chipset MBs but A320 were a bit tight. First signs are just like that, instability during fast power requirement changes, not enough reserve power to keep voltage stable. By it's nature it's always slightly slower to react at any change. Next sign is VRM chips heating more than usual and chain reaction it causes. Under steady load it may not be as apparent but it's BIOS that regulates voltage and power and may overreact in lowering voltages.