Random shutdowns on thermal events

Cliffsta

Commendable
May 26, 2016
14
0
1,510
Hi folks,

I have a dell tower T3500 series with a XEON quad core W3565 chip, running on windows 10.
I have been getting thermal shutdowns, at first labelled as such by the system, but more recently more frequent, without the warning prestartup on screen.

I have run fan tests (They are all working) and have used HWmonitor and also speedfan. With speedfan, I have set up fan controllers so that the temperature rarely goes beyond 50 degree Celsius. The fans happily kick in and do their job, but I am still getting the same shut down behaviour.

There is no record in Event Viewer, but there is a record within the Dell BIOS logs, showing a CPU/Core thermal event. I have also taken the precaution of updating to the latest Dell Bios.

I have opened up the cage and the dust is quite minimal. I also had a search for the above chip, and found that people have had it all the way up at 80 Celsius, quite happily. Admittedly, a lot of them were using apple macs - which are no doubt better designed to handle strain.

It just seems like I am getting these damn over eager shutdowns on very moderate temperature of about 50-55 Celsius. From what I can understand, this is very much in the expected range of the chip.

I admit that I haven't pulled the chip and reapplied thermal grease. Reasons twofold, - I don't think I am getting that hi a temperature to warrant it (The machine is only 3 years old) and I once fried another machine completely by accidentally dropping the heat-sink on the CPU plate, and bent some pins, so I am forever careful!

Any knowledge / experience greatly appreciated.
Cliff



 
Solution
If it is in the BIOS, it may well be a different part of the motherboard and it may be a failing component such as a VRM.

I assume that, if you have been inside the case, the case is clean and dust-free? If not, that would be my first port of call.

After that, a visual inspection of all components on the motherbaord for signs of damage, bulging capacitors, or burn/scorch marks.

Confirm that all fans are working and that the symptoms cannot be resolved by improving airflow (leave case covers off).

After that, in consultation with Dell Customer support, evaluate whether the machine is repair-worthy or due for a replacement. Your warranty status may be a factor.

If it is in the BIOS, it may well be a different part of the motherboard and it may be a failing component such as a VRM.

I assume that, if you have been inside the case, the case is clean and dust-free? If not, that would be my first port of call.

After that, a visual inspection of all components on the motherbaord for signs of damage, bulging capacitors, or burn/scorch marks.

Confirm that all fans are working and that the symptoms cannot be resolved by improving airflow (leave case covers off).

After that, in consultation with Dell Customer support, evaluate whether the machine is repair-worthy or due for a replacement. Your warranty status may be a factor.

 
Solution
---- Folks - I may actually have a solution for this! ---- :bounce:

I was dubious of the connection to a thermal event, because I know Dells and Intels are reasonably intelligent.
Had a look at the event viewer on my machine and noted a critical event happening on each restart under the System events thread:

Kernal-Power
Event ID: 41
Task Category: 63

I jumped on to google and had a quick mooch around and instantly came across a whole bunch of folk complaining of PSUs, power loss etc. Some folk had mentioned dodgy cables.

I put two and two together and realised that the events had started at the same time I purchased a 2500w oil heater. I had thought that the ambient heat generated from the heater had messed things up, and moved the PC far away from the heater, yet the events continued. I should have realised earlier that the extention lead I was using was only rated for 3.5Kw of which the new heater was using almost 70% of available power. Therefore, I was getting just plain old simple power shortages masked as thermal events.

I suggest if you have thermal shutdown events, without the corresponding high CPU core temperatures - Have a look at how your PC is powered, and whether its sharing any loads with any new hungry appliances.