Question Potential CPU Cache problem?

Feb 25, 2020
5
0
10
UPDATE: The bulk of my issue was caused by by out of date SSD firmware, not my CPU cache. However, I am still trying to fix windows elements like context menus, task manager, some text in explorer/control panel, etc flashing like crazy and being illegible after playing some games.

Original post below.
---------------------------

Hi,
Apologies if this is in the wrong subforum, I didn't see a general "support" forum, but I think my problem is my CPU.

A little background:
I recently built my first computer (primarily for gaming):
ASRock Phantom Gaming 9 mobo,
Intel i9 9900KF cooled by a corsair H150i Pro water cooler,
Corsair RGB Pro (4x8GB),
and an old(er) NVidia 1080 from EVGA (until the Ampere GPUs come out :) ).

Old hard drives in use:
Crucial CT250MX200SSD4 250GB in the M2-1 slot,
Samsung 860 Evo 1TB (Boot partition of a completely fresh install of Windows 10, plus other partitions).

There is also a Creative Soundblaster Z in there but I don't think it has anything to do with my problem.

Anyway, whenever I load up some CPU intensive load, ie, benchmarks, some games (WoW is the biggest culprit), the windows UI will essentially break. Explorer freezes, task manager will open but it will flash uncontrollably and be basically unresponsive, all other windows UI features will crawl and/or flash like task manager, and game performance will take a huge dive. I have tried to use applications like ProcExp to see what's going on, and while taht works better than task manager does, they both seem to report my CPU usage as zero after this occurs.
I have tried using only one of the memory sticks in each of DIMM slots, no change. I've disabled all peripherals i could, no change, and even tried to disable as many services and close as many programs as possible, no change.

Interestingly, the computer works great unless i launch something that triggers the problem, and most of the culprits seem to be DirectX12 games, although it doesn't appear to be exclusive to DX12, and I've even had it happen after walking away from my computer for a few hours with nothing running besides Firefox.

Another development happened recently when I tried to extract a rather large zip file (1.5GB or so) onto my M2 drive. Twice this would freeze my computer entirely and I would have to force a reboot. The third and fourth times, I actually received a lovely BSoD, causing me to shake my fist angrily at Microsoft for their butchered modern "error reporting". However, luckily event viewer still exists.

The first issue was a DPC watchdog violation, and I guess that points towards my M2 drive. While it could be, I don't think it's the drive as the main issue isn't exclusive to workloads where the data source is on that drive.
The second issue was a WHEA uncorrectable error, of the type "Cache Hierarchy Error" on processor APIC ID 0, which would be the i9.

The next thing that I did was run a CPU stress test using OCCT. The first stress test caused a hard computer freeze, but the next 3 or 4 did not. One thing that I noticed from the ones that didn't freeze my computer was that my CPU temperature would jump within a second or two to 80+ C, sometimes even up to 88 or so. Upon ending the test, they would just as quickly jump back down to normal operating temperature, around 38C. I'm not sure if the rate of change here is normal or not, but I was concerned by the quick and excessive temperature spike, so I underclocked my CPU, limiting it to 3.5GHz and tried again. Temperatures still almost instantly reported ~80 C, and quickly jumped down to 40 upon conclusion of the test.

So, that brings us to today, where the only troubleshooting left for me to try is a CPU reseat and/or a reinstall of Windows. I don't have any thermal paste at the moment, but some is on the way and I will receive it by the end of the week. But. as this is my first computer build, I am a little naive, and I am interested in seeing what other people think I should check.

Has anyone seen this kind of thing before and have any ideas for me before I do that? I would like to exhaust my options before going nuclear, but I am worried that I have a faulty CPU.

Thanks for any advice!

EDIT: Additional information:
My power supply is a Corsair CX750, and my BIOS is the most recent version.

CPU/GPU temperatures are both around 35-40C when idle, and my GPU never exceeds around 60 when maxed out. I've never seen the CPU go above 60C unless I run a stress test as mentioned, however I can't seem to gather the CPU temp when the main bug I've described is triggered, as all apps I use to measure it either stop responding (CPUtemp, ASrock's software, corsair's iCue)) or start to flash like crazy making it illegible(ie. task manager).

I have also tried a complete video driver uninstall/reinstall, of both a previous version Nvidia driver and current, which did not remedy the problem.
 
Last edited:
Feb 25, 2020
5
0
10
latest bios?
make and model of the psu?
cpu/gpu temp when idle and on load?

  1. yup, I've updated the bios to the latest version.
  2. Oops, I forgot about that. The power supply is a Corsair CX750, from my old computer.
  3. They are both around 35-40C when idle, and my GPU never exceeds around 60 when maxed out. I've never seen the CPU go above 60C unless I run a stress test as mentioned, however I can't seem to gather the CPU temp when the main bug I've described is triggered, as all apps I use to measure it either stop responding (CPUtemp, ASrock's software, corsair's iCue)) or start to flash like crazy making it illegible(ie. task manager).
 
Last edited:
Feb 25, 2020
5
0
10
I also forgot to mention that I have also tried a complete video driver uninstall/reinstall, of both a previous version Nvidia driver and current, which did not remedy the problem.
 

jodo_kast2

Prominent
Mar 5, 2018
49
4
545
Try reseating your M.2 first and re-check their SMART status , if available. Try Prime 95 for stress tests since it has way better options to do so. Hwinfo should be able to give you a rundown on the system while prime95 gets to work. And yes, some tests can spike the CPU temp to 80 but should never get close to (-20 or lets be a bit more generous -15) TjMax. That should also alliviate any doubts of the L cache themselves being defective.
 
Feb 25, 2020
5
0
10
It looks like you OS is on the Samsung drive, have you tried unplugging your M.2 drive to see if the issue continues.

While the drive might not be bad it could be a bad socket or driver with the motherboard.
Try reseating your M.2 first and re-check their SMART status , if available. Try Prime 95 for stress tests since it has way better options to do so. Hwinfo should be able to give you a rundown on the system while prime95 gets to work. And yes, some tests can spike the CPU temp to 80 but should never get close to (-20 or lets be a bit more generous -15) TjMax. That should also alliviate any doubts of the L cache themselves being defective.
I reseated the M2 drive, and found a dog hair in the port. Woops! Still, it didn't fix the issue.
If I move the culprit applications off the M2 drive, they don't seem to cause an issue.
If I try the other two M2 lanes, the problem is still there.
SMART is reporting 99% health for the drive.

Interesting that faalin would mention a bad driver; this particular M2 drive is not on the list of supported M2 devices in the mobo manual, although its more recent iteration is. That's a bad look for this m2 drive. A little irritating that it would be "unsupported" but that's probably the problem.

I ran Prime95, default settings. Temperatures averaged out around 85 and peak 88, but I also noticed my cooler was set to "quiet" instead of "balanced" or even "extreme." I set it to balanced and the temperatures went down to 70-75. Prime95 never once brought my computer down like OCCD did. Temperatures dropped to 35/baseline within a second of ending the test.

Looks like it is more likely a problem with this drive than my CPU. The thermal paste I ordered will arrive tomorrow and I will be able to see if a pesky dog hair found its way into my CPU bus, but I don't have high hopes for that. For now I will shuffle some data around and remove this M2 drive since it appears to have some issues. I'll get a replacement soon.

Thank you for your replies. If anyone can think of anything else I can try to confirm that this drive is the culprit, I am all ears.
 
Last edited:
Feb 25, 2020
5
0
10
Reseating the CPU of course changed nothing, but I learned that there were firmware updates for my SSDs which, once installed, eliminated windows explorer breaking and the general slowdown/inability to use the computer when the described bug is triggered. However, the flashing UI elements still remains. Not sure how to fix that, but at least my computer is usable now.

I had thought that the firmware would be handled by the mobo, so it's good to know that you need to install updates to these things manually.

1/2 way there!

Does anyone have any idea on how to approach the flashing UI elements (task manger, context menus, etc)?