Question Computer shuts off without apparent cause

David_676

Honorable
Apr 6, 2017
65
0
10,530
I have had problems with the system freezing up, the DRAM led lights and the power switch on the PSU is the only way to reset it (Holding the power button doesn't work). I decided to ignore it as it only happened once a week or so, but now the computer is shutting off completely. No blue screen or error, it's as if the power switch was flipped off and then back on again. I just got a new UPS and assumed it was the issue as it started happening after I installed it, but after plugging it into a separate outlet, it happened again. I am now suspecting the PSU is at fault, is there any way to confirm this? All the voltages seem to be normal from what I can see, I will attach a screen shot of the sensor readings. I'll keep the logger running and hopefully catch something when it shuts down next time. I tested the RAM with memtest and got no errors. I have reached the limit of my diagnostic abilities, any help would be much appreciated.

System info:
CPU: AMD Ryzen 5950X
RAM: G.skill Trident Z Royal 3600 cl14 (2 x 16GB)
MB: ASUS ROG X570 Crosshair VIII Hero (Wi-Fi)
GPU: EVGA 3080 FTW3
PSU: be quiet! Straight Power 11 Platinum 1000W, BN644
Storage: 2 HDDs, 1 SATA SSDs, 1 Samsung 980 Pro 2TB Nvme M.2 Gen4
OS: Windows 10

Voltage Screenshot
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Does it shut off when idle or only under load?

If it were a hardware issue I would say PSU or motherboard. Have you only just built this machine?

Idle as well, I have built at least 5 pcs, so I'm not a newbie, but I don't have the spare parts to swap and test so I'm stuck guessing at the moment.
I should add that the 3080 is new and could be the source as well.
 
Last edited:

Ralston18

Titan
Moderator
Look in Reliability History and Event Viewer.

Either one or both tools may be capturing some error codes, warnings, or informational events that occur just before or at the time of the shut-offs.

If you have a multi-meter and know how to use it then you can do some voltage testing on the PSU.

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a full test because the PSU is not underload. However, any voltages out of tolerance make the PSU suspect.
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Look in Reliability History and Event Viewer.

Either one or both tools may be capturing some error codes, warnings, or informational events that occur just before or at the time of the shut-offs.

If you have a multi-meter and know how to use it then you can do some voltage testing on the PSU.

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a full test because the PSU is not underload. However, any voltages out of tolerance make the PSU suspect.

I will link the error reports, I didn't notice anything specific, but maybe you will. I will have to get my multimeter back from my brother to give that a go. Can I rule out RAM if it passed multiple memtest runs? I am trying to eliminate hardware as best I can. The 3080 was purchased used on ebay, I hope it's not that as I don't have any warranty ability. It does appear to be a power issue. Can I also rule out CPU? From what I understand, that is kind of an all or nothing kind of item. I also noticed in HWiNFO that corsair had voltages that were reporting out of spec, 3.3 up to 5 and down to 0, but I assume that is just poor reporting from corsair. I use their AIO and Fan/RGB hub.

Here are the Error Logs
 
Idle as well, I have built at least 5 pcs, so I'm not a newbie, but I don't have the spare parts to swap and test so I'm stuck guessing at the moment.
I should add that the 3080 is new and could be the source as well.
Has this system always behaved like this or did this only start recently?

I will link the error reports, I didn't notice anything specific, but maybe you will
Doesn't seem to say much other than the PC has rebooted. If you press Start>>>Reliability>>>Click view reliability history. It may tell you a little bit more around the time the system shutdown. The Event Viewer is stuffed with so many log entries it can be trickier to see what's going on.

From my experience hardware errors don't often leave much of a paper trail.

Can I also rule out CPU?
Damaged pins on the CPU can cause problems, it's also possible to get a faulty CPU. However CPU hardware problems are quite rare relative to those of other components. I've never had a faulty CPU personally.

Can I rule out RAM if it passed multiple memtest runs?
It doesn't sound like it is RAM if it's passed memtest but sometimes it can take a long time to fail. Your RAM is relatively low latency, you could try disabling XMP to see if that makes any difference just in case your CPU/motherboard is a bit picky.

I've had to troubleshoot these sorts of issues before, unfortunately it can be very difficult without spare components.
 

Ralston18

Titan
Moderator
This:

"I also noticed in HWiNFO that corsair had voltages that were reporting out of spec, 3.3 up to 5 and down to 0, but I assume that is just poor reporting from corsair. I use their AIO and Fan/RGB hub."

Multi-meter recommended. If you do not have one or know how to use it then find a knowledgeable family member or friend who can help.

= = =

Event Viewer is not all that user friendly. Just to help a bit...

FYI:

http://www.tomshardware.com/faq/id-3128616/windows-event-viewer.html

Take a look in Reliability History. Much easier to use and understand. The timeline may prove revealing.

= = = =

Just as a matter of routing and elimination:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Use a bright flashlight to inspect for signs of damage.

If nothing is found amiss then is does indeed become a matter of swapping in known working components and/or trying existing components in other known working systems.

Hopefully, as suggested, you can obtain some spare components to swap in.

Key is careful and methodical troubleshooting and making only one change at a time. Plus allowing time between changes.

Some problems do not immediately appear and may be a function of temperature, time, some combination of apps running, etc..

= = = =

And for the record regarding that new UPS: make, model, how connected? What other devices are connected to the UPS? Any separate surge protectors, extension cords, power strips, and so forth?
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Has this system always behaved like this or did this only start recently?


Doesn't seem to say much other than the PC has rebooted. If you press Start>>>Reliability>>>Click view reliability history. It may tell you a little bit more around the time the system shutdown. The Event Viewer is stuffed with so many log entries it can be trickier to see what's going on.

From my experience hardware errors don't often leave much of a paper trail.


Damaged pins on the CPU can cause problems, it's also possible to get a faulty CPU. However CPU hardware problems are quite rare relative to those of other components. I've never had a faulty CPU personally.


It doesn't sound like it is RAM if it's passed memtest but sometimes it can take a long time to fail. Your RAM is relatively low latency, you could try disabling XMP to see if that makes any difference just in case your CPU/motherboard is a bit picky.

I've had to troubleshoot these sorts of issues before, unfortunately it can be very difficult without spare components.


The system is about 8 months old, I believe the first lock up issue started 4 months ago, and persisted through Clear CMOS, and Fresh Windows, and I believe a BIOS update, but I can't quite remember, I did just update the BIOS today tho, maybe that will help, but I suspect not. The shutdown problem started when I installed the new UPS but as I said, I tested it off of the UPS and it still happened.

I did check the reliability history, that report on the right came from there, and it was listed as a hardware error and the Event viewer named the source as "KernelPower" I checked the keywords listed and they seem to indicate it was a power issue as well, but then listed a bunch of fixes like turning off Fast Boot, updating the graphics drivers and such, so who knows really lol.

Here is the Reliability History

Again, I don't see anything that stands out as a cause, but there are other apps that stop working at other times, I never noticed any issues at those times tho. The first 2 images are the complete shutdown, and you can see "Hardware Error" and the last image is when the system locked up but didn't shut down. No hardware error, just a note that the system
didn't shutdown properly as I had to hit the switch to restart it. Not sure what to make of that exactly, but I agree, I don't think there is much of a paper trail here.

As for the CPU, I have never had one go bad either, as I said, I assumed it would be all or nothing, but I'm probably wrong about that. I do however inspect the pins carefully upon install and use extreme care, so I don't believe that to be the problem anyway, but I may have to pull it and check as a last resort.

For the RAM I did triple check compatibility, I did get the low latency, but it was listed on the QVL for the MB and other lists, so unless I lost the silicon lottery really bad and the CPU can't handle it?. Also, I believe I did try running it at at base settings before and the lockup still happened, I will give it a shot tho as that was before this new shutdown problem. And when the lockup happens, the DRAM Q-LED is lit indicated a RAM issue, that is what prompted me to do multiple memtest runs, but since those came back fine, I started to suspect the MB or the PSU, and now I am leaning PSU after this new problem started.

Yes it is very difficult, I normally can fix most problems myself, but for this I figured I would appeal to better experts than myself. I use to fix computers for friends and considered doing it for a living until I realized the pay doesn't match the skill needed. But I have fixed desktops, laptops, replaced phone screens (the glued on ones, now the screw on ones) and only failed once when I was tired and made a sloppy mistake lol. But all this to say I am not new to this, I just have reached a point where "phone a friend" seemed like a good idea whilst still troubleshooting to both confirm my thinking and get some different views on it.

I do appreciate your input
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
This:

"I also noticed in HWiNFO that corsair had voltages that were reporting out of spec, 3.3 up to 5 and down to 0, but I assume that is just poor reporting from corsair. I use their AIO and Fan/RGB hub."

Multi-meter recommended. If you do not have one or know how to use it then find a knowledgeable family member or friend who can help.

= = =

Event Viewer is not all that user friendly. Just to help a bit...

FYI:

http://www.tomshardware.com/faq/id-3128616/windows-event-viewer.html

Take a look in Reliability History. Much easier to use and understand. The timeline may prove revealing.

= = = =

Just as a matter of routing and elimination:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Use a bright flashlight to inspect for signs of damage.

If nothing is found amiss then is does indeed become a matter of swapping in known working components and/or trying existing components in other known working systems.

Hopefully, as suggested, you can obtain some spare components to swap in.

Key is careful and methodical troubleshooting and making only one change at a time. Plus allowing time between changes.

Some problems do not immediately appear and may be a function of temperature, time, some combination of apps running, etc..

= = = =

And for the record regarding that new UPS: make, model, how connected? What other devices are connected to the UPS? Any separate surge protectors, extension cords, power strips, and so forth?


Thank you, I do have a decent multimeter and I know how to use it, I just have to get it back from my brothers place ( I had to diagnose a faulty seat motor and left it there to diagnose his dryer when I got around to it) I am disabled and need a wheelchair to get around, so it usually takes me time to get things done unfortunately. I will use it as soon
as I get it back tho.

I posed the Reliability History in the previous post, but it didn't seem to point to anything specific either, other than "Hardware Error", the other errors are at very different times and seem unrelated to me.

I will absolutely give everything a good check once I get the Multimeter and get a chance to pull it apart. I think it will come down to getting spare parts, but I did a quick look and everything is so expensive right now, I am really starting to regret getting such high end components. Even the cheap options are up there in price, I will probably have to wait a month or more before I can afford to buy spares, so hopefully something becomes more apparent lol.

The UPS is the CyberPower CP1500PFCLCD, it is a 1000W model that I upgraded to as the 450W APC that I had for my old computer let my new computer die instantly when power was lost, it didn't even stand a chance. Anyway, it is connected directly to the wall outlet and the PC is connected directly to one of the Battery protected outlets. I checked the UPS logs for power problems at the time of shut down but found nothing out of spec, I have it set to intervene under the tightest possible constraints, so it should have saved it even if there was some power problem. I did test it by pulling the fuse for the room and it performed as it should. Other than the System, I just have my 2 monitors plugged into it, my echo show, and my cpap machine which is very low draw, and the monitors only pull 60W combined.

Some final thoughts, these problems mostly happen at idle loads or web browsing loads. I set the BIOS to return to last state upon power loss, so if the MB truly sees it as a power loss then it should boot back up. I may be wrong in my thinking there tho, as any problem of that nature might look like power loss to the MB? Temps all seem fine. Also I forgot to mention that previous to all of these issues, I had my Samsung 980 Pro go bad on me. I RMA'd it and the new one is running fine, but that may be a clue that power issues could have fried it. They didn't really say what happened to it, other than the controller failed.

I really appreciate the input, thank you.
 
As for the CPU, I have never had one go bad either, as I said, I assumed it would be all or nothing,
I've heard of instances with new CPU's where it's DOA or the system is never stable and a replacement of the same CPU fixed it. I still think it's the least likely source of your issues though.

I've had RAM compatibility issues cause freezing on the desktop and motherboard issues cause intermittent restarts. I've never had an issue with the CPU or PSU, unfortunately the KernelPower event tells you nothing because anything that causes the PC to restart will also show a KernelPower event in the Event Viewer.

Do you have any components at all like an old GPU that you could try? I would unplug all non essential devices such as any peripherals and any drives other than the boot drive. I've had a dodgy SATA connection cause freezing before, it's not that I think it's got anything to do with your issue but trial and error says you should eliminate as many variables as possible.

I had my Samsung 980 Pro go bad on me. I RMA'd it and the new one is running fine, but that may be a clue that power issues could have fried it
Unfortunately impossible to determine.
 

Ralston18

Titan
Moderator
Varying errors and increasing numbers of errors (error codes) are a sign of a faltering PSU.

Especially if of the "windows was not properly shutdown" type and that happened on its' own or you were forced to power off/reset or unplug.

However, loose connections can mimic a power/PSU problem : i.e. , some connector "making and breaking" in response to temperature and vibrations.

Reliabilty History's timeline can be revealing with respect to the errors themselves and any patterns. Look at all of the errors. Clicking any given error will provide more details. The details may or may not be helpful.

Delve into Event Viewer a bit more.

Other things you can do in the interim:

Look in Update History for any failed or problem updates. Maybe one from 4 months ago.

Run the built in Windows troubleshooters. The troubleshooters may find and fix something.

Run "sfc /scannown" and "dism".

https://www.lifewire.com/how-to-use-sfc-scannow-to-repair-windows-system-files-2626161

How to use DISM command tool to repair Windows 10 image | Windows Central

No harm in eliminating other possibilities.
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
I've heard of instances with new CPU's where it's DOA or the system is never stable and a replacement of the same CPU fixed it. I still think it's the least likely source of your issues though.

I've had RAM compatibility issues cause freezing on the desktop and motherboard issues cause intermittent restarts. I've never had an issue with the CPU or PSU, unfortunately the KernelPower event tells you nothing because anything that causes the PC to restart will also show a KernelPower event in the Event Viewer.

Do you have any components at all like an old GPU that you could try? I would unplug all non essential devices such as any peripherals and any drives other than the boot drive. I've had a dodgy SATA connection cause freezing before, it's not that I think it's got anything to do with your issue but trial and error says you should eliminate as many variables as possible.


Unfortunately impossible to determine.

Yeah, I kinda figured KernelPower wouldn't tell me much. Unfortunately I don't have a GPU that I can borrow for a significant amount of time. I do have a lot of peripherals tho, I did plan on unplugging all non essentials, I think it's just a matter of taking the time to do some trial and error. Sometimes it will go for a few days without having any issues, so it's gonna be a long haul.
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Varying errors and increasing numbers of errors (error codes) are a sign of a faltering PSU.

Especially if of the "windows was not properly shutdown" type and that happened on its' own or you were forced to power off/reset or unplug.

However, loose connections can mimic a power/PSU problem : i.e. , some connector "making and breaking" in response to temperature and vibrations.

Reliabilty History's timeline can be revealing with respect to the errors themselves and any patterns. Look at all of the errors. Clicking any given error will provide more details. The details may or may not be helpful.

Delve into Event Viewer a bit more.

Other things you can do in the interim:

Look in Update History for any failed or problem updates. Maybe one from 4 months ago.

Run the built in Windows troubleshooters. The troubleshooters may find and fix something.

Run "sfc /scannown" and "dism".

https://www.lifewire.com/how-to-use-sfc-scannow-to-repair-windows-system-files-2626161

How to use DISM command tool to repair Windows 10 image | Windows Central

No harm in eliminating other possibilities.

I am leaning more toward PSU as well, I'll have to get that Multimeter back as soon as possible. I did notice some failed updates in the logs, I will check some more and try those commands as well, thank you for the advice, at this point it's just gonna take time to go through the steps.
 

Ralston18

Titan
Moderator
Double-edged question. :)

For most situations the answer is no.

For the rest, likely so with the right testing equipment (beyond a multi-meter; e.g., having an osciliscope) along with specific system schematics and verifed/trustworthy manufacturer test procedures. The procedures including, of course, measurement values, ranges, tolerances, etc..

Along with having the proper test bench, power sources, clips/connectors, lighting, and skill set.

Good eyes and a steady hand.

Bottom line from my viewpoint - no probing while pc or any component is on. Or even plugged in nowadays.
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Double-edged question. :)

For most situations the answer is no.

For the rest, likely so with the right testing equipment (beyond a multi-meter; e.g., having an osciliscope) along with specific system schematics and verifed/trustworthy manufacturer test procedures. The procedures including, of course, measurement values, ranges, tolerances, etc..

Along with having the proper test bench, power sources, clips/connectors, lighting, and skill set.

Good eyes and a steady hand.

Bottom line from my viewpoint - no probing while pc or any component is on. Or even plugged in nowadays.

That's is what I thought, I've been meaning to get a scope but good ones are so expensive and it would take a long time of using it before I would be comfortable poking at my computer with it.

Thanks for the info tho, now I gotta finish getting this thing back together, I'm cleaning up the wiring in the back as I had it packed in a very not ideal way and it was putting pressure on my corsair hubs., I think that is the reason for the weird logs it was reporting.
 

David_676

Honorable
Apr 6, 2017
65
0
10,530
Double-edged question. :)

For most situations the answer is no.

For the rest, likely so with the right testing equipment (beyond a multi-meter; e.g., having an osciliscope) along with specific system schematics and verifed/trustworthy manufacturer test procedures. The procedures including, of course, measurement values, ranges, tolerances, etc..

Along with having the proper test bench, power sources, clips/connectors, lighting, and skill set.

Good eyes and a steady hand.

Bottom line from my viewpoint - no probing while pc or any component is on. Or even plugged in nowadays.

I found the problem with the shutdowns, the SATA Power cable for the Corsair Commander hub was damaged and likely not fully connected as a result, only the "L" part of the connecter broke off, so it is still usable and after rewiring everything, it's not likely to come undone or get more damaged. I also set my RAM back to the default profile of 2133... The program crashes have almost stopped, only got one failed update and one app crash in 3 days, where as before I had many apps crashing in a day. I thought I had tested it at the lower
speed and still had the problem, That would have been one of my guesses at first as well, but I gave it a try anyway and it seems to be doing better, so maybe I didn't. Either way, it didn't have the problem when the computer started. Mid way through typing this to link my exact components, I found that I may have made the error of not checking all the QVL's.
My RAM lists my MB with 5000 series CPUs as compatible. But upon checking the 5950x QVL, my ram is not listed, but a very very similar version is, The Trident NEO has the exact same timings as my Trident Royal, but my exact model number is not listed. I further checked the MB QVL which I really remember checking and thought that it was listed... Is not not listed. Furthermore, the MB QVL doesn't list any RAM at 3600 CL14. I fear I have made a bad mistake.

From this point forward, I assume that G.Skill will not stand behind it's QVL and RMA my RAM? Can I just pop in lower timings, say 3600 cL15? or 3200 CL15? Did I damage my RAM or MB doing this? As I said, it worked flawless for at least 4 months. I just now also noticed that only one stick is being recognized at the moment, I will have to reseat the RAM and check the BIOS, I really hope I didn't damage it. As always, I really appreciate the help.

EDIT:
Just wanted to add the links to the QVL sources:
Motherboard ROG x570 Crosshair VIII Hero (WI-FI)
RAM F4-3600C14D-32GTRG
CPU Ryzen 9 5950x

UPDATE:
The other stick came back after reseating it, hopefully the errors don't come back. Now just need to monitor and figure out what to do about the speeds, I really don't want to run it at 2133, but I haven't messed with RAM settings beyond applying the built in XMP/DOCP so I don't know what is possible or even advisable at this point. I apologize for if I'm bugging, I know a lot of this can be googled at this point, I just had a rough day and was hoping to offload some stress, I really appreciate all the advice, after months of issues, I think it finally helped me isolate the problems.
 
Last edited: