[SOLVED] Can a faulty CPU really cause this ? (images+videos inside) (long post)

Sep 3, 2021
8
1
15
So this might get rather long, just a warning. I also hope I picked the right sub forum!

Yesterday my 2-year old gaming PC just broke down out of nowhere. Game I was playing crashed and almost every running app (steam etc.) reported errors and closed down. Windows Defender also complained that something was wrong (“can’t update defender definitions”).. which made my first guess maybe some virus or malware.

I also couldn’t launch any programs anymore, chrome would crash before showing a website, windows explorer loaded and showed me an (intact) filesystem but then also crashed after 30-40 seconds.

This did not change after restarts, reboots in safe mode - restoring from restore point also resulted in an error and so did trying to repair windows.

I also got weird “filesystem errors” when trying to load files like pictures - but after trying the same file 1-2 more times it suddenly worked. Basically everything behaved completely erratically .. but the system would boot up every time at least to the login screen.

So I thought duh guess I have to do a clean reinstall. Because I still had this virus scenario in mind I decided to just wipe my C: (boot SSD, M.2) since I keep all my data and applications on other drives. Install went through until the first restart and then, when configuring windows the screen suddenly started glitching (see pics+videos) and it threw an error message. The error messages I got randomly when restarting were “Windows install could not be finished, please restart the installation” “Windows cannot be configured for this hardware.” and occasionally just a bluescreen.

So I guess no virus, my next guess - M.2 SSD faulty. Removed it, installed on one of my other drives instead. Same for both of them. So now I can exclude faulty drives and I guess also faulty SATA cables since the M.2 does not use SATA.

Next guess - maybe the GPU? Graphical glitches after all. Thank god my MB comes with DisplayPort port. Nope, same. Still glitching. GPU is fine. Note I am doing a fresh install every time just to be sure.

Next guess - maybe RAM? I use 2x8GB so I tried with just one of them alternating (next day also with a completely different set of ram) - same.

So now my conclusion (maybe to early) was that it has to be the Mainboard because a non-overclocked i9 breaking .. never heard of that before.

Fast forward next day, friend comes around with a motherboard with almost the same chipset (370Z) and bunch of spare hardware components(as well as more knowledge about building PCs than me)

We put everything on the new motherboard (heck of a lot of work because those beQuiet coolers are a pain to assemble+disassemble..) .. fresh install .. same .. :S

Then we do some more tests, the extra RAM test, different power supply unit, even a 4th SSD that has never been in contact with the system. Same.

The only thing we are not able to test is trying a different CPU since alas no-one has a 1151 CPU laying around and neither do the hardware stores close to me.

So I have no proof that something is wrong with my i9, but I can exclude all other possibilities .. or can I?

The error just seems really really weird in connection with a faulty CPU. Wouldn’t a faulty CPU cause very different symptoms? I am completely at a loss. 20 years of PC building and I feel totally helpless right now.

Thanks for reading all of this!



Videos + Images:

This is how my windows behaved before I wiped the boot partition. I could get to the login screen and (sometimes) log in, but then nothing else worked. https://streamable.com/z9wcej

Occasionally I could open my explorer, but then clicking any files gave me this. Twice. Then the image opened. Completely random. https://ibb.co/8YGqfbY

My login screen - had to censor the full name for obvious reasons, but you can see how graphical elements are glitched out similar to what I then saw during boot-ups after clean installs https://ibb.co/jb3D40m

One of many attempts to reinstall - it always works when running the install from the stick until the first restart, somewhere between “Configuring windows” and this screen the glitching starts and then it crashes with an error like this https://ibb.co/Bw6gqzW

And here the glitching in full video glory. This was taken after we swapped mainboards (in vain) to rule that out, thats why you see the Gigabyte logo. The MSI logo of my other mb behaved the same https://streamable.com/igxrtw



Specs:

CPU: Intel i9 9900K 3.6Ghz (socket 1151)

Mainboard: MSI MPG Z390 GAMING PRO CARBON

Cooler: beQuiet Dark Rock PRO 4

RAM: Corsair Vengeance DDR4-3000 (2x8GB)

PSU: 600 Watts beQuiet PurePower 11 CM Modular 80+

GPU: ASUS ROG STRIX NVIDIA GeForce RTX 2070

SSD1 (Boot Volume): Intel 660P M.2 (2TB)

HD1 (Data): 4TB Western Digital

SSD2 (Gaming): Slightly older Samsung model with 1TB

Screen: ASUS ROG Swift P278 G-Sync-something



No other components except for a case obviously (also beQuiet) and periphery
 
Solution
Wouldn’t a faulty CPU cause very different symptoms? I am completely at a loss.
  1. Z370 board requires BIOS update to support i9 9900K. Did you have BIOS updated on Z370 board to required version?
  2. Did you have single drive connected, while reinstalling windows? It appears, you had multiple drives connected.
  3. Did you check CPU and GPU temperatures?
  4. Did you have any cpu or ram overclocking enabled? You should disable those for time of testing.
  5. BTW - was your 2 years old pc operating in OC mode all the time (before problems started)?

revodo

Proper
BANNED
Jun 10, 2021
241
35
120
I'm not going to lie, that's a massive problem you've got. I'm quite old and well versed in computers myself, and I'm stumped as well. Especially with the same issues presenting not only in Safe Mode, but during an attempt at a fresh install.

The first video you linked shows Explorer crashing. That's not a visual glitch, however the other videos show visual glitches. It very well may be your CPU, as you've performed a lot of other troubleshooting steps.

It really doubt it's the source of the problems, but have you tried running a memory test on the RAM?
 
Sep 3, 2021
8
1
15
It really doubt it's the source of the problems, but have you tried running a memory test on the RAM?
Thanks for reading through my post. We have not done a memory test due to a lack of tools (guess we would need a mem checker on a bootable USB stick?) However, we tested 3 times, with each of my RAM modules alone (its 2x8GB) and then with a 3rd RAM module that never was in this system before.

Also yeah I know its tricky. Not expecting a magical fix, but rather some last things to try before going ahead and ordering a new i9. Imagine installing a new CPU and its still doing this ... maybe the case is cursed
 
Wouldn’t a faulty CPU cause very different symptoms? I am completely at a loss.
  1. Z370 board requires BIOS update to support i9 9900K. Did you have BIOS updated on Z370 board to required version?
  2. Did you have single drive connected, while reinstalling windows? It appears, you had multiple drives connected.
  3. Did you check CPU and GPU temperatures?
  4. Did you have any cpu or ram overclocking enabled? You should disable those for time of testing.
  5. BTW - was your 2 years old pc operating in OC mode all the time (before problems started)?
 
Solution
Sep 3, 2021
8
1
15
  1. Z370 board requires BIOS update to support i9 9900K. Did you have BIOS updated on Z370 board to required version?
  2. Did you have single drive connected, while reinstalling windows? It appears, you had multiple drives connected.
  3. Did you check CPU and GPU temperatures?
  4. Did you have any cpu or ram overclocking enabled? You should disable those for time of testing.
  5. BTW - was your 2 years old pc operating in OC mode all the time (before problems started)?
Thanks for some good new points!

1. I just realized I made a mistake, we actually used a Gigabyte Z390 M (I dont know why I thought it was a different chipset, maybe I am going crazy a bit over this) and that one supports the i9-9900K out of the box. Also symptons were exactly the same as with the MSI, we only used the Gigabyte for the final few test runs after everything else failed

2. We always cut all drives despite the one we are installing Windows on, just to be sure

3. Yes, BIOS worked fine and we could monitor temperaturs there

4. No

5. Never. It was a (mostly) prebuilt system from mindfactory.com, they do not tamper with OC settings and neither did I during those 2 years. I am not a "PC tweaker", just someone who wants a reliable good machine.
 
Sep 3, 2021
8
1
15
Reading this, I ask this question - can this issue be related to a Windows update somehow?

Can you get hold on a slightly older windows iso image and try install - without the computer having any internet access ?
Thanks for your reply! We also had this hunch (although I thought it was unlikely since my PC didnt break after a windows update or reboot but while playing a game and the system running for a few hours already)

We also did an unsuccessful install attempt (during day 2 with the gigabyte MB) using a stick with an older windows version from 2019 - which was around the time the whole PC was assembled.
 
Ok. If all this are correct, it then has to be either one of:
  • Very rare problem with your replacement components as well, or that two components have failed simultaneously for some reason.
  • Very rare kind of fault that makes a component suddenly incompatible with some specific components (RAM-motherboard-CPU), can be due to a physically malfunction inside a IC.
  • An outside factor.
The first and second points can potentially explain why it doesn't help swapping components. My first point also include the posibility for a fauty component become the reason for why the second component fails. If that isn't enough, there may be a delay in this so that when you swap (for example) RAM, if let say there are a faulty motherboard that also causes RAM to fails - when switching RAM sticks, those new ones may not take damage at fist, maybe because of an intermittent fault.

Because of that, I'll strongly recommend that you TEST the replacement components afterwards to ensure they haven't got any sort of damage (i.e. causes instability in some or another way).


And when I talk about outside factors, I mean issues related to the local electrical grid or voltage spikes in the mains that somehow make it near impossible for the PSU to deliver a stable output voltage. It shouldn't be possible, but I don't think the manufacturers of PSU can make it to withstand any kind of disturbances on the input voltage. Therefore I have to ask if there are other equipment that tend to fail in the same room/house?
What happens if you move your computer to a different location (to where you can find a similar computer that does not have problems) ?

If there are a line to ground fault in your house or somewhere else (at your neighbour or outside the house) that can lead to issues when connecting different equipment to each other like computer to monitor.

ESD damage is also on that list. But if you move your computer to another location and it works perfect over there - and you repeat several times with consistent result, I believe that ESD damage cannot be the cause of this. Anyway, have you done everything to ensure that all computer parts was protected while handling all the times ?
 
Sep 3, 2021
8
1
15
Ok. If all this are correct, it then has to be either one of:
  • Very rare problem with your replacement components as well, or that two components have failed simultaneously for some reason.
  • Very rare kind of fault that makes a component suddenly incompatible with some specific components (RAM-motherboard-CPU), can be due to a physically malfunction inside a IC.
  • An outside factor.
The first and second points can potentially explain why it doesn't help swapping components. My first point also include the posibility for a fauty component become the reason for why the second component fails. If that isn't enough, there may be a delay in this so that when you swap (for example) RAM, if let say there are a faulty motherboard that also causes RAM to fails - when switching RAM sticks, those new ones may not take damage at fist, maybe because of an intermittent fault.

Because of that, I'll strongly recommend that you TEST the replacement components afterwards to ensure they haven't got any sort of damage (i.e. causes instability in some or another way).


And when I talk about outside factors, I mean issues related to the local electrical grid or voltage spikes in the mains that somehow make it near impossible for the PSU to deliver a stable output voltage. It shouldn't be possible, but I don't think the manufacturers of PSU can make it to withstand any kind of disturbances on the input voltage. Therefore I have to ask if there are other equipment that tend to fail in the same room/house?
What happens if you move your computer to a different location (to where you can find a similar computer that does not have problems) ?

If there are a line to ground fault in your house or somewhere else (at your neighbour or outside the house) that can lead to issues when connecting different equipment to each other like computer to monitor.

ESD damage is also on that list. But if you move your computer to another location and it works perfect over there - and you repeat several times with consistent result, I believe that ESD damage cannot be the cause of this. Anyway, have you done everything to ensure that all computer parts was protected while handling all the times ?
3. I think I can dismiss the third point. Ive been in this apartment for 10 years now, always having at least 1 gaming PC and there have never been any power troubles, I also have a couple of consoles and other electronic devices that never showed any weird behaviour. The apartment is in the center of Berlin with a very stable power grid - I think I remember one single power outage that lasted a few minutes ever since I moved here.

I do get your point though, as a student I lived in a crappy place where even turning on the vacuum cleaner could cause other devices to fail or lose voltage.

2. That would sound like a motherboard IC defect though right? We tested the CPU with 2 different motherboards and 3 different bars of RAM (also in different combinations)

1. Now that sounds terrifying. Are you saying this could be the case IF the CPU is not faulty, or do you consider this more likely than "just" faulty CPU? I have no other system to "single-test" the differnet components available, but on the first day of diagnosis we had a test run where - except for my MB+CPU - all other components were from outside. (RAM, Harddisk, GPU, PSU)
 
1 I'm not able to put out any odds for this, but given the complexity it can't be ruled out either. Any microscopic error can cause small errors that is in practic impossible to foresee all kind of practical issues it may cause.

2 It would help if you could make a list over any combinations you have tried, and the result for each test.

3 Yes, and have the setup tested at a friends apartment could make it possible to take that away from the list of possible causes.
 
Sep 3, 2021
8
1
15
Combinations we tried:

CPU + MSI Mainboard + My RAM + My GPU + My SSD+HDs (the original setup that had been working so far)

3x CPU+MSI Mainboard+My RAM+My GPU+each of the disks (M.2 SSD, HD, SSD) alone to rule out disk faults, then kept using the M.2 for most further tests as it formats/installs/boots fastest)

CPU+MSI MB+My RAM + My SSD (system also runs without GPU)

3x CPU+MSI MB+My SSD+each of my 2 RAM modules + 1 Test RAM module alone to rule out RAM faults

CPU+MSI MB+Test RAM+ Test HD (that was our final test after which we came to the apparently false verdict its a motherboard problem)

Next day:

CPU+Gigabyte MB+My RAM+My SSD

3x CPU+Gigabyte MB+My SSD+each of my 2 RAM modules + 1 Test RAM module (we repeated the RAM test)

CPU+Gigabyte MB+My SSD+Test RAM+Test PSU (so far we had always used mine, just another test to rule out power)

All of these tests yielded the same results. For the Gigabyte test runs refer to the video I posted, the MSI test runs looked roughly the same except there was the distorted MSI logo in the background during windows configuration. Occasionally there also was a blue screen instead, but that occurred randomly with setups that also showed the error in the video.
 
After I reading through this again, I realize one question haven't being asked yet. Was the Gigabyte replacement motherboard in complete working order prior to this ?

Also, you use the term "3x CPU", please explain. Have you 3 different CPU to test?

If in your test report, a component are not listed, can we assume that component was not mounted ?
 
Sep 3, 2021
8
1
15
I meant that we tested 3 times with the setup "CPU+Gigabyte MB+SSD+RAM" - with 3 different RAM modules.

In any case, today we tried a different CPU, same model (9900K) - that immediately fixed the issue, we reinstalled windows, runs fine again.

First time ever I had a CPU break (out of the blue as well) - but oh well.

Really appreciate your time and ideas @Grobe !
 
  • Like
Reactions: Grobe
I meant that we tested 3 times with the setup "CPU+Gigabyte MB+SSD+RAM" - with 3 different RAM modules.

In any case, today we tried a different CPU, same model (9900K) - that immediately fixed the issue, we reinstalled windows, runs fine again.

First time ever I had a CPU break (out of the blue as well) - but oh well.

Really appreciate your time and ideas @Grobe !
So I read through your post and am glad you figured it out. Some things to note about intel CPU's and motherboard manufacturers. Most motherboard manufacturers will allow intel CPUs to take in more power than is the minimum spec for prolonged bursts even without any OC options in the motherboard BIOS set on. Motherboard manufacturers for Intel CPUs are allowed to do this because Intel has allowed them to use it as a way to differentiate themselves with their competitors. I am not saying that this has anything to do with your specific problem with that 9900k. One can speculate that if the MSI motherboard you had allowed this, which is highly likely, then there is a logical connection between repeated prolonged power inrushes for 50-60 seconds at a time over years of time causing such a result. This is doubly true if there was an existing flaw with the CPU that was insignificant enough to be found in typical QA processes, but enough that the extra wear and tear of turbo boosting over time exacerbating a small flaw till it broke.
 
Sep 3, 2021
8
1
15
So I read through your post and am glad you figured it out. Some things to note about intel CPU's and motherboard manufacturers. Most motherboard manufacturers will allow intel CPUs to take in more power than is the minimum spec for prolonged bursts even without any OC options in the motherboard BIOS set on. Motherboard manufacturers for Intel CPUs are allowed to do this because Intel has allowed them to use it as a way to differentiate themselves with their competitors. I am not saying that this has anything to do with your specific problem with that 9900k. One can speculate that if the MSI motherboard you had allowed this, which is highly likely, then there is a logical connection between repeated prolonged power inrushes for 50-60 seconds at a time over years of time causing such a result. This is doubly true if there was an existing flaw with the CPU that was insignificant enough to be found in typical QA processes, but enough that the extra wear and tear of turbo boosting over time exacerbating a small flaw till it broke.
Very interesting. I had no idea about this (but I am also not following hardware developments as closely anymore as I used to in the past) - that sounds like a quite likely scenario then indeed.

I am currently doing a lot of testing with the "repaired" setup and it seems I am getting considerably better performance benchmarks. Very intrigueing - maybe the CPU already had some kind of fault or "weak point" in the beginning which simply never manifested in hands down crashing things before last week. Still need more testing, unfortunately I have no saved benchmarks from before to compare.
 
Very interesting. I had no idea about this (but I am also not following hardware developments as closely anymore as I used to in the past) - that sounds like a quite likely scenario then indeed.

I am currently doing a lot of testing with the "repaired" setup and it seems I am getting considerably better performance benchmarks. Very intrigueing - maybe the CPU already had some kind of fault or "weak point" in the beginning which simply never manifested in hands down crashing things before last week. Still need more testing, unfortunately I have no saved benchmarks from before to compare.
Here is some more information on how 9th generation CPUs handle boosting and power usage during that boosting. Pay close attention to the PL1 in this case.
 
  • Like
Reactions: preaCor