Question Help diagnose my old dying rig problem ?

nick1232

Reputable
Jul 16, 2020
50
3
4,545
I have had a PC for about 7 years already.

Specs:
Mobo: ASUS Prime Z270M-Plus
CPU: Intel Core i7 -7700 Quad Core
GPU: GeForce GTX 1070
RAM: 2 sticks with XMP enabled
PSU: Chieftec Polaris 1250W (it was not meant to be used in this PC, but used anyway)

Symptoms:
1. Sudden system instability, video signal is lost, although other processes such as music can still play and whole system seems working for a couple of minutes when it happens.
2. Sometimes it just moves to this mode without symptom 1. All coolers work at full speed or unstable, system does not respond to anything including long press of power button.

So I can only turn it off by power switch on PSU. When I do that, I can't make my system to start. It is in this unresponsive state with coolers working unstable, sometimes on full speed sometimes in normal mode.

I can switch off and on for a couple of times while watching on Mobo's lights. In many cases there is no indication of power on PCIe ports neither on LAN. In some cases they are lit but systems is still unresponsive.

It does provide power on my keyboard and mouse all the time when PC is on.

On third-fourth try it starts, I can enter UEFI, but it freezes. Next series of switching off and on finally starts a system normally and I can work, until next instability issues appears.

What could that be? My guess it is somehow related to power, either capacitors on Mobo are dying, either is PSU issue, what do you think of that?
 

Ralston18

Titan
Moderator
Look in Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing error codes, warnings, and even informational events just before or at the times of the described problems.

Use both tools but only one tool at a time.

Reliability History/Monitor is end user friendly and the timeline format may reveal patterns.

Event Viewer requires more time and effort to navigate and understand. To help with Event Viewer:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

= = = =

Is all important data backed up at least 2 x to locations away from the computer in question? Verify that the backups are recoverable and readable.

= = = =

Other things you can do:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all cards, connectors, RAM, jumpers, and case connections are fully and firmly in place.

Install a fresh CMOS battery by following the instructions provided by the motherboard's User Guide/Manual.

Use a bright flashlight to inspect for signs of damage, loose or missing screws, cracks, pinched or kinked wires, bare conductor showing, swollen components, corrosion etc..

Overall, my thought is that something is loose. After three or four startup attempts, the system warms up, something expands from the heat, and some loose connection tights up. Loosens again at the next cool down.

Windows should only be shutdown via the power icon. When you are forced to power down via the PSU switch Windows does not get the opportunity to do any "housecleaning" etc. in preparation for the boot up.

Which can and does corrupt files and makes it all even worse...

Run "dism" and "sfc /scannow" to address file related problems.

https://www.windowscentral.com/how-use-dism-command-line-utility-repair-windows-10-image

https://www.lifewire.com/how-to-use-sfc-scannow-to-repair-windows-system-files-2626161
 

nick1232

Reputable
Jul 16, 2020
50
3
4,545
Look in Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing error codes, warnings, and even informational events just before or at the times of the described problems.

Use both tools but only one tool at a time.

Reliability History/Monitor is end user friendly and the timeline format may reveal patterns.

Event Viewer requires more time and effort to navigate and understand. To help with Event Viewer:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

= = = =

Is all important data backed up at least 2 x to locations away from the computer in question? Verify that the backups are recoverable and readable.

= = = =

Other things you can do:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all cards, connectors, RAM, jumpers, and case connections are fully and firmly in place.

Install a fresh CMOS battery by following the instructions provided by the motherboard's User Guide/Manual.

Use a bright flashlight to inspect for signs of damage, loose or missing screws, cracks, pinched or kinked wires, bare conductor showing, swollen components, corrosion etc..

Overall, my thought is that something is loose. After three or four startup attempts, the system warms up, something expands from the heat, and some loose connection tights up. Loosens again at the next cool down.

Windows should only be shutdown via the power icon. When you are forced to power down via the PSU switch Windows does not get the opportunity to do any "housecleaning" etc. in preparation for the boot up.

Which can and does corrupt files and makes it all even worse...

Run "dism" and "sfc /scannow" to address file related problems.

https://www.windowscentral.com/how-use-dism-command-line-utility-repair-windows-10-image

https://www.lifewire.com/how-to-use-sfc-scannow-to-repair-windows-system-files-2626161

I do some physical maintenance on weekend. Thanks. An idea with heating is interesting, but it does not explain why instability appears during work. For example I started my system today at 9 AM, everything was fine, until about 13:49. My system had to be warm. Then it crashed with no video signal with no obvious reason. It had to be still warm when I tried to power it on again. And it started, I have seen Asus logo which was frozen and no boot was followed. Switching off and on helped to make it boot further.


There are a lot of errors in Event manager, but they seems to be a result of system instability. For example Virtual box could not do something etc. Device manager sent a lot 'Metadata staging failed'. This seems to be more like a result of global system failure than a cause of it.
 

Ralston18

Titan
Moderator
Lot of errors and varying errors are a sign of a faltering/failing PSU.

How old is that Chieftec PSU? History of heavy gaming use, video editing, or even bit-mining?

Remember that PSUs provide three different voltages (3.3, 5, and 12) to various system components. A problem with any given voltage can cause problems while other components seem to be working.

Do you have access to another known working PSU that can be swapped in? (Remember to use only the cables that come with the swapped in PSU.)

FYI:

https://www.tomshardware.com/reviews/best-psus,4229.html


Not with the intent to immediately purchase another PSU.

One purpose is to learn more about PSUs and how important they are. You can easily fiind other links and tutorials about PSUs. You may note other symptoms etc. that make the PSU even more suspect.

Simply read the review and perhaps apply two or three of the calculators. Hopefully 1250 watts is more than sufficient.

However, that does not matter if the PSU is no longer able (for whatever reasons) to meet sudden changes in power demands.
 
When did this problem start?
Could it have started with a windows update?
Or, possibly some sort of malware.

As a test
Run memtest86+
It boots from a usb stick and does not use windows.
You can download it here:

If you can run a full pass with NO errors, your ram should be ok.

Running several more passes will sometimes uncover an issue, but it takes more time.
Probably not worth it unless you really suspect a ram issue.

All that said, I am guessing a psu issue.
Find a known good psu of sufficient power to test with.