Question What's killing my CPUs?

Dec 22, 2020
2
0
10
I've had two CPUs die in the last 4 months. Is part of my system frying my CPUs? Or is it just bad luck?

I built my first system in April 2019 (specs below) and it ran without a hitch for over a year. Never overclocked anything except bumping the RAM to 2933MHz (it’s rated for 3000MHz). August 2020, it starts to get occasional BSOD with message “WHEA UNCORRECTABLE ERROR,” or when starting up it would give blank screen and the mobo’s CPU error light. Restarting fixed both issues for a time, but eventually every startup had a blank screen with the CPU light. I went through all of Intel’s normal troubleshooting steps (cycling a RAM stick through each RAM slot, removing the GPU, resetting the BIOS) but always the CPU light and no display. In September, Intel decided OCing the RAM above 2600MHz (their recommended max for the 9600K) had damaged the CPU, and they exchanged the CPU under warranty. I put in the new CPU, everything worked perfectly (RAM at 2600MHz), and I thought my troubles were over. Fast forward to yesterday, and it’s back to not starting up at all. Same as before – the case fans and lights go on and stay on, but the mobo’s CPU error light goes on and there is no display. I’ve tried Safe mode, removing the GPU, cycling the RAM, and resetting the BIOS, but nothing has worked.

If I need a new CPU, I need a new CPU, I can get on board with that. My question is, is it possible something in my system is responsible for two CPUs dying in close succession? What’s the most likely culprit? It seems unlikely to just be coincidence. I obviously don't want to replace the CPU if it's just going to die again.

System:
CPU – i5-9600K
Cooler - DeepCool Gammaxx GT
Mobo – MSI Z390-A Pro
GPU – MSI GeForce RTX 2060 VENTUS 6G OC
RAM – G.SKILL Aegis 2x8G, F4-3000C16S-8GISB
PSU – SeaSonic M12II EVO Edition 620 W 80+ Bronze Fully Modular ATX
SSD – Crucial MX500 500 GB
 
Dec 22, 2020
15
6
15
Is it possible? Yes. Most likely motherboard, but could also be from certain software and/or drivers.

Motherboard would have been my first guess based on the WHEA errors, especially considering you weren't overclocking. Did Intel ever confirm that your RMA'd CPU was actually faulty or damaged? I doubt that there was anything wrong with it, or with your current CPU. PSU would be my second guess as to a hardware culprit, though I would look at software first, RMA the mobo second if no change, and PSU would be last if the cycle repeats itself a 3rd time.

A question: If you're not getting video, and the mobo threw a code, how are you booting into safe mode or doing a bios reset? Is it booting intermittently? You should uninstall all overclocking/hw monitoring/rgb control apps in safe mode, including afterburner. Do this after loading optimized defaults in the bios, and make sure to leave the RAM at whichever low settings it is defaulting to.
 
Dec 22, 2020
2
0
10
Thanks for the suggestions.

I tested the 24-pin from the PSU, and the only notable one was pin 8 (PWR_ON), which rested at 5V most of the time but dropped to 2-3V every few seconds (I don't know what this pin voltage is supposed to look like, it just says "power good" on the chart). All the other pins were rock solid and within the tolerance. Do I need to check the 8-pin CPU power cable as well?

Intel never told me anything about the RMA'd CPU, but everything instantly worked the moment I installed the new one, so I assumed the CPU was the issue. The bios can be reset physically on this mobo by shorting a specific pair of pins or removing the CMOS battery (I did both of these). And I tried and FAILED to boot in Safe mode – sorry I wasn't clear. So there's nothing I can do about software then, right?
 
Dec 22, 2020
15
6
15
Gotcha. I don't know how else to help you troubleshoot this one, maybe someone else has an idea?

Personally I would RMA the motherboard. Ideally, you'd want to test the CPU in another known working z370/z390 system first.

Back to your original question: voltage and heat can kill a CPU. A faulty/damaged or aggressively over-volted(by the MFG) motherboard could damage the CPU. Same for the PSU, though it more commonly happens just when a PSU fails. Statistically speaking heat is probably the most common CPU killer whether it be from an incorrect cooler mount, bad/no case airflow, OC/high voltage, or a combination of all of those things.
 

Karadjgne

Titan
Ambassador
Motherboard VRM's. If one mosfet has failed or gone out of spec, it puts the entire load through the remaining mosfets, and you'll end up with cascade failures over time as they aren't really spec'd for that abuse. If one is out of spec, it'll apply a funky voltage, which can drop voltages or not supply enough voltage under demands on a core suddenly and you'll get whea errors.

Eliminate the improbable, rma the mobo, if that doesn't fix the errors, you've at least seriously narrowed down the possible culprits.
 

TRENDING THREADS