Question AMD pc crashes on idle, then fails to post but under load is stable.

licenceforlove

Honorable
May 3, 2019
8
2
10,515
Hey everyone,

I'm turning to you regarding a very strange issue I keep getting. I've been building PCs for 15 years and not once has this occurred.

What's been happening:
My pc occasionally reboots when I leave it alone for 30 minutes or more. After the reboot it gets stuck at POST with the GPU debug light lit on the motherboard. From this point on the pc won't post until I long-press the shutdown button. If I press reboot it won't post either it will just try to do the post check and stay with the gpu light lit. While gaming or AI workloads or rendering workloads, the pc is completely fine, never had a crash in that scenario. It's only when I leave it alone and go get lunch or something when this happens (mostly I have the browser open, steam and maybe vs-code or something, nothing that would put serious load on the cpu/gpu). Although coming to think of it, all of these apps can use a ton of ram, so maybe it's a memory issue? Anyway, even when I leave it to idle it would only rarely crash, I would say it's fine 96% of the times.

Errors before the crash in EventViewer:
I mostly get these 2 errors in eventviewer that are happening around the crash, none are really helpful IMO:
  • Event 41: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
    • this comes after the crash, when I finally do the long-press shutdown button and the system is able to post
  • Event 6008: The previous system shutdown at 8:31:26 PM on ‎6/‎29/‎2024 was unexpected. (last time this happened, again, not helpful)

I seem to get this error a few mins before the crash:
  • Event 7000: The AMDRyzenMasterDriverV20 service failed to start due to the following error:
    Cannot create a file when that file already exists.
    • I'm not sure about this either, because I do have this error in the event log for every day almost, and like I've said, the crash-fail to post error is rare.


We are talking about an all-AMD build, here are the specs:

Ryzen 7900x3d (factory settings)
Asrock PG riptide b650m motherboard
2x 16 gb Patriot VIPER VENOM RGB DDR5-6200 DIMM CL40-40-40-76 Dual Kit
240 AIO water cooler
Radeon RX 7900xt (base model manufactured by POWERCOLOR)
1000 Watt Corsair RMe Series RM1000e Modular 80+ Gold PSU​
I think I also have 2 hdds, a 2.5 ssd and 2x nvme ssd-s connected with about 4 case fans.
What I've done so far:
Replaced a previous 750W Endorfy PSU with the current Corsair version, did nothing.
Before the 7900x3d I had a ryzen 7700 installed which was having this issue a lot more often. Actually I experienced this issue first with that cpu after about 6 months of use. It started out rarely but then it got more frequent. I finally go angry enough and upgraded the CPU which seems to have solved the issue for 4-5 months or so, but now I have it again. AMD did RMA the 7700 for me but that's not relevant now.
I also updated the bios, it's not the latest now, but with the 7700 I had the latest version, no effect.
The only OC on my system is the ram, which runs at 6200mt/s with EXPO (which is the ram's advertised speed). I think with the 7700 I did clock the ram at 6000 becuase that cpu was not stable with 6200.
I've disassembled and re-assembled the pc multiple times, re-seated everything very carefully, no change.

The situation now:
So the thing is I'm a bit stomped. I have no idea what could be wrong, I originally suspected the cpu, but it's almost brand new, I had bought it in february this year I think and it's not oc-ed.
My other suspect was the mobo, but then why did the issue disappear with the new cpu for months?
My next guess is it's either the gpu or maybe the ram, but both seem to work fine when under load. I also hadn't reinstalled windows yet, but this is a fairly new install (the pc is about 16 months old, and before these errors started happening only the ram and the gpu was changed [GPU swap was from AMD RX 6800 -> RX 7900xt and I did it with DDU. I actually tried to ddu 3 times already, with gpu-reseats but no effect]).

Any thoughs or help would be greatly appreciated.
 
Are you sure the mobo doesn't push settings too far?! This sounds like the settings are on auto and on low load either volts or amps get too high (because watts are low) and crash the system.

Or it could be the expo slowly chipping away at the ram controller which is part of the CPU.
 

licenceforlove

Honorable
May 3, 2019
8
2
10,515
Are you sure the mobo doesn't push settings too far?! This sounds like the settings are on auto and on low load either volts or amps get too high (because watts are low) and crash the system.

Or it could be the expo slowly chipping away at the ram controller which is part of the CPU.
I'm not sure if it's not pushing things too far. It's factory stock apart from the ram, but yeah maybe the ram settings are slowly killing the memory controller.
With the 7700 it seemed like a degradation issue, first crash was about after 6 months of use, then nothing for months, then by one year I had 1 crash/week.

Problem is I have no way of knowing if this is a degradation issue and I have no recourse but wait for it to get worse and then RMA the cpu. Then I can lower the memory speed with the next one.

It's also strange that after the crash the system would not post. I'm not sure what's going on there.
 

TRENDING THREADS