Question New unexpected Power-Offs

Triss_

Reputable
Mar 3, 2017
8
0
4,510
Hello.

Let me start by saying I'm an IT professional and I've dealt with the windows and linux PCs since 1998.
I tried to analyze, test and troubleshoot the problem the best I could, given the available software and hardware, and now I turn to you, the professionals, for help.

Problem: New unexpected power-offs
They started today, the system just powered-off 3 times while playing Far Cry 5, though I've played it for 2 weeks without any issues, so it could be a coincidence. I also have played much more resource-demanding games successfully.

On each of the 3 times the system would run for 167 min, 82 min or 105 min respectively, before it powered off.

Windows Event Viewer only reported the Kernel-Power "system has rebooted without cleanly shutting down first", nothing else, as expected.

No new software since over a month ago, no new drivers for months.
No new Windows Updates since October 2019, no important updates left to get.

In terms of safety, I'm super conscious about:
  • anything I run, the temp folder is on a RAMrive and any unsigned programs run in sandbox without any elevation. UAC always set to max (always notify).
  • the case is always closed, no cables obstructing the airflow, clean fans, good ambient temps, dry air.
Done so far:
  • CPU/GPU Stability tests passed over 15 minutes time, to put max load on the PSU
  • temps normal as seen here, system monitoring (Aida64, Afterburner, Radeon Software) report no issues
  • HDD/SSD error/S.M.A.R.T. checks passed
  • Performed full deep scan for virues
Currently moving home - no pendrive at hand to run memtest (ran it half a year ago OK though).
I do not have another PSU or motherboard to test.

System:
Since November 2015 it is:
  • OS: Windows 7 Ultimate SP1 x64
  • Board: Asus Sabertooth 990FX R2.0
  • CPU: AMD FX-8350, 4 GHz
  • RAM: HyperX Savage 16 GB (2 x 8 GB) 1866 MHz DDR3 CL9
  • PSU: Antec True Power 550W PSU 80 Plus Gold
  • SSD: Samsung SSD 850 EVO 250GB
  • HDD: TOSHIBA P300 1TB
  • Case: Zalman Z3 Plus ATX
and since October 2019:
  • GPU: ASUS DUAL-RX580-O4G ROG Radeon RX 580 OC 4 GB
In July 2020 I dusted the computer again and replaced the battery on the motherboard.
Other than that no hardware changes in last 10 months.
The system was never overclocked.

Suspicions:
Since this has never happened before, only started today and happened 3 times while I played the same game, I suspect either:
  • Power source unstable (though connected spare router to the same splitter and it didn't restart, but the capacitor could hold it if the power interruption lasted a split second)
  • PSU can't keep up, likely if worn out (though GPU used 150W, CPU 60-120W, the rest used no more than 50W, so 320W on a 550W PSU)
  • Motherboard current regulation fails (VRM1 peaked at 75C and VRM2 at 65C, but it's always been like that. No capacitor deformation found
  • Bad game incompatibility with the GPU driver would cause the driver to shut down the system? Huh?
Additional info:
Since about 2 years ago the system had to wait 5-10 minutes before I could power it on.
After I replaced the battery last month, I need to wait 1-2 minutes only before I can power it on.
This suggest a PSU issue, am I wrong?

Questions:
  1. What else can I do to narrow down the source of the problem?
  2. What are your suspicions (please tell why)?


Thanks!
 
Let me start by saying I'm an IT professional and I've dealt with the windows and linux PCs since 1998.
I tried to analyze, test and troubleshoot the problem the best I could, given the available software and hardware, and now I turn to you, the professionals, for help.
Well, I'm not a professional in any means, but hopefully I can still provide some thoughts as I read your problem.

I have had my shares of GPU and motherboards turning faulty over the years. In some cases, the benchmark programs somehow doesn't make the computer to fail, but games does. Therefore, there is cases where you may have faulty mainboard or psu and still being able to run complete benchmarks. That is very weird and I have no explanation for this.

Anyway - for a PSU being used on a daily basis, I'll say just by playing a number game should point out PSU as a plausible source of failure. Can you spot any bad capacitors inside? (please don't open, that is dangerous).

It can be the motherboard as well. If it's clean (i.e. not full of dusts collected) try to look for any visible damage to the capacitors (tend to leak at bottom if faulty).

If you still are not able to figure out if mobo or PSU are faulty (i.e. have a replacement PSU to test over time) you should run OCCT stresstest, and watch the voltages. See if any of the voltages seem off in any way.

Also, you should run Memtest as soon as you can, just to clear ram from suspicious list. But I think it's most likely either motherboard or PSU that cause this issue.
 

Triss_

Reputable
Mar 3, 2017
8
0
4,510
It took me a while to respond as I thought I discovered something and wanted to try it out first.

Can you spot any bad capacitors inside
Nope, not really.

It can be the motherboard as well. If it's clean
It is. I do not let my PC collect dust.

damage to the capacitors
None found as stated in the first post: "No capacitor deformation found".

OCCT stresstest
Was going to run it but then I discovered something and put it on hold.

So, what I discovered is strange, as the temps I had were not too bad, in gaming they were pretty much like this and I did not record any worse temps.
79wWyZT.png


But when I cranked the GPU dual fans to 100%, the PC stopped powering-off.

I played many more hours of Far Cry 5 with no single power-off, then I changed it back to the fan curve, but more aggressive one and played many many more hours in both Far Cry 5 and Far Cry New Dawn - not a single power-off occurred.

Today the computer powered off once, the first time since long ago, and the ambient temp today feels like 22-25C, and the PC temps were as follows, seconds before the power-off:
QfMz6Ng.png


Conclusion:
It appears that somewhere in my computer there's an element that becomes too warm for its comfort and it causes a power-off, as when I crank up the GPU fans to maximum, everything is fine during intense gaming sessions.
I don't even have to do it if the room is chilly.

My system records the temps from 17 different sensors in various areas within the case every 2 seconds, I de-cluttered the above table and removed most of the temps below 40-45C.

Any ideas what to do next, how to find out more? I do not have access to a thermal camera.

Thanks!