Need help narrowing down failing hardware

Darvolin

Reputable
Nov 30, 2015
9
0
4,510
Just like the title says I have been struggling for months to determine the cause of my computer lockups.

First off, the symptom: regularly while under heavy load my computer will suddenly completely lock up ie: whatever was on the display at the time gets frozen there, audio from the speakers is stuck in a very short stuttered loop, all peripherals are also frozen (lights on my keyboard and mouse stop changing and are locked in whatever state they were in at the time of crash. Just like the monitors).

Occasionally, anywhere from 5 to 30 seconds into this, it manages to bounce back. Normally however it stays in this state until I manually power cycle the machine.

Event viewer shows a critical Kernel power 41 event.

Ive done a lot of research trying to fix this and am hoping that the symptoms sound like the failure of some component to someone so that I can start replacing parts intelligently.

Things I've tried...

Updating all drivers with driver easy (even paid for the license so I wouldn't muck it up)

Monitoring temperatures at crash times (always within safe limits)

Replacing a very old HDD as the likely cause

Dusting and refaceting all cables, cards, ram sticks within the case interior

Disconnecting the cases front usb panel in case of a short (I know one of the usb ports doesn't work there)

Replacing old RAM

Checking hard drive health

Completely reinstalled Windows twice

My case is very old so I am replacing that. I will list my components below. It is worth noting that I replaced my R9 380 GPU with the current GTX 970 that I have. My best educated guess at this problem is that I have added too power hungry of a GPU as well as an SSD without upgrading my PSU. I was confident my 750W gold standard Corsair was sufficient enough but at this point I am trying anything.

I have a 1000w Rosewill PSU on the way but I want to hear what the community thinks. I am hoping the issue is specific enough that it's a lightbulb moment for someone lol!

2x Monitors (one with freesync Display Port other HDMI)
750W GOLD cert Corsair PSU
AMD FX-8350 8 core 4.0 ghz
NVIDIA GTX 970
MSI Gaming 970 mobo
4x 4GB Kingston HyperX Blu RAM
128 GB SSD (OS installed here)
250 GB SSD
1 TB HDD
Lots of Razer peripherals (most have been unplugged while troubleshooting)
 
Solution
I had that board, the VRMs and Southbridge got VERY hot, I have to install extra heatsinks on the Southbridge and fitted a fan to specifically blow air over it.

The board has a reputation for poor thermal glue between the VRMs and the heatsink; some are fine, others hit over 80C almost straight after boot up on a default GHZ cpu set up.

What cpu cooling are you running, because if it is water, you definitely need to run extra airflow over those VRM heatsinks.

Eximo

Titan
Ambassador
More power wasn't really needed, so I hope you can return that Rosewill. If you do have a power supply problem it is almost certainly still under warranty and you can get Corsair to send you a new unit. Assuming it is faulty.

When you say you monitored all the temperatures, what were they? FX chips use an offset so the max recommended temperature is usually like 63C. VRM temperatures would be the only other thing to keep a close eye on.

Only other thing might be to check the power situation in your dwelling. A faulty appliance on the same circuit could be tripping the power supply. Or just dirty power coming from your supplier. Something like a UPS/power conditioner could help with that if the problem can't be resolved upstream.
 

CaptainCretin

Respectable
Jul 18, 2016
625
0
2,160
I had that board, the VRMs and Southbridge got VERY hot, I have to install extra heatsinks on the Southbridge and fitted a fan to specifically blow air over it.

The board has a reputation for poor thermal glue between the VRMs and the heatsink; some are fine, others hit over 80C almost straight after boot up on a default GHZ cpu set up.

What cpu cooling are you running, because if it is water, you definitely need to run extra airflow over those VRM heatsinks.
 
Solution

Darvolin

Reputable
Nov 30, 2015
9
0
4,510
Thanks for the responses!

@Eximo Don't worry I've got a return label already, I appreciate the heads up. The temperatures I'm seeing are honestly even lower than expected. My processor seems to stay under 50C while running Prime95 Blend test but I will run it for a longer period of time tonight.

Honestly I'm thinking you might be right about the power situation in my home. Recently the landlord had to jack up support columns that I believe were sinking. During this process we have seen cracks in the plaster around doorways which was to be expected but we have also seen that some of the overhead lights in the apartment will alternate between very dim and properly bright every so often. I could see there being a correlation... I will look into a UPS/power conditioner as well as speaking to the landlord. (I will also try to eliminate faulty appliances on the same circuit as the cause)

@CaptainCretin My temps seem okay on the mobo and the CPU. Currently I have a CRYORIG H7 heat-sink on my processor.

 

Darvolin

Reputable
Nov 30, 2015
9
0
4,510
So after removing some appliances that could be causing the issue and replacing a very old surge protector the issue still persists.

It could still be the bad power in my home but now I am thinking CPU overheating could possibly be the issue as you guys have suggested.

I am not sure why the stress tests didn't get a high temp reading but while playing World of Warcraft I noticed another hiccup but it didn't completely lock up. I quickly opened HWMonitor to check the temps and the CPU was cruising at 78C, having recently hit a Max of 81C. This was after only about twenty minutes of playing while watching NETFLIX.

I am going to reapply thermal paste to my heatsink and refasten it. I will report back if there is any change.
 

Darvolin

Reputable
Nov 30, 2015
9
0
4,510
Turns out it was a huge amount of dust build up on my CPU fan. I did not notice it because it is a sideways blowing heatsink. Figures it was something so simple! All I can say to anyone reading this thread is to very closely monitor your temperatures close to crash times even if you think your chassis is nice and clean. My thermal paste wasn't cashed but could also use replacing so I went ahead and did that as well.