Question Unstable system - at my wits end

Mar 21, 2025
5
0
10
Hello everyone

I'm really hoping someone can help me. I've been dealing with an unstable system now for over a year. Constant BSOD, random crashes and the system just becoming totally unresponsive.

It all started February last year. I put my PC to sleep and left for a few hours to do some shopping, I came home and woke my system up and the monitor would not display a signal, at the time I tried a new HDMI and that made no difference. Fast forward a few days later and with the original HDMI cable everything seemed fine which was odd.

However, ever since that my system has got worse and worse, crashes related to nvlddmkm.sys mostly also DPC_WATCHDOG_VIOLATION (133). I've tried reinstalling Windows, DDU display drivers multiple times and I have used memtest to test my ram and it came back fine. I have also updated the BIOS, which seemed to make everything go smooth for a week but just started even worse than before.

Also had a random reboot today with a new error from WHEA Logger, I'll paste it below.

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information.

My system specs are:
Ryzen 5 3600
Gigabyte x570i Pro Wifi
MSI RTX 2080 Super
Corsair SF750 Platinum PSU
Kingston HyperX Fury 2x8gb RAM

I'd really appreciate any support you can give me.
 
It might be worth trying sfc/DISM but that new install should have taken care of that.

Do you happen to have any other branded RAM available to try, even a single stick?

Are you monitoring temps?

What are you seeing in Event Viewer?

edit- not for nothing, but several of the YT hardware channels have come across a high rate of failures with the 3600. I am not saying that is for sure what you have going on, but...
 
Have also done the sfc/DISM multiple times, it has only ever changed a bluetooth driver and found no other issues.

I have no other RAM to hand unfortunately.

I am monitoring temps, CPU goes to 70c while gaming, GPU to 80c. Maybe worth mentioning another forum asked me to run furmark yesterday which I did, it only ran for 2 mins and the GPU reached 77c before the whole system totally locked up. I know when a crash is coming as the GPU fans slow right down and the screen flickers before ramping up again, crashing totally and slowing down again.

Event viewer just brings up nvlddmkm errors and kernal power errors etc.

This is my second Ryzen 5 3600 as my first died in 2021 along with my mobo, luckily they were still under warranty then. Back in 2020 along with being a PC newbie and a bit of stupidity I would mine with the CPU (which is probably what killed it) and this GPU too which could now be coming back to bite me.
 
Try something, if possible.

Take a regular house fan and set it up where it can blow directly into the case with the side panel or glass off and try some of these tests. Let us know how that goes.

Alongside that, if you are on a long extension cord, or a plug strip with lots of things plugged in, try to limit or turn off everything on that circuit that you can.
 
Lots of updates - none of them positive.

I reset the bios and changed the settings, no change.

I used Furmark to try a stress test, this caused a crash.

Random reboot which spat out this error in event viewer regarding WHEA Logger.

A fatal hardware error has occurred.

"Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information."

I stress tested the CPU using prime 95. This gave no errors.

I have booted the system with a clean boot, at first I had no crashes but then they returned.

Got a 133 DPC WATCHDOG VIOLATION again. Then randomly lost signal and locked up again. Had to hold power to shut down as usual. Rebooted and the screen refused to show anything but the system was fully responsive. Logged in and heard case fans respond to login as well as number lock light respond and LEDs react and set themselves as per software. Shut down the system, switched back on a picture came up fine. During this GPU fans were still running.

Then had a new error code - 0x144: BUGCODE_USB3_DRIVER.

Then had new errors in event viewer and reliability monitor.

Event viewer: Dwminit - The Desktop Window Manager process has exited. (Process exit code: 0x0000042b, Restart count: 1, Primary display device ID: NVIDIA GeForce RTX 2080 SUPER)

About 30 seconds before that crash was Application Hang error. Saying
The program explorer.exe version 10.0.26100.3323 stopped interacting with Windows and was closed. To see if more information about the problem is available, check the problem history in the Security and Maintenance control panel.

In reliability history, a new Live Kernal Dump.

VIDEO_MINIPORT_BLACK_SCREEN_LIVEDUMP (1b8)

This time caused by dxgkrnl.sys.

I thought it would be worth mentioning that both my RAM and GPU are the oldest original components still in my system. These are now over 5 years old and my system is used everyday. My CPU and motherboard died mid 2021 and were replaced. I'm not sure what the lifespan on these components are but perhaps they are wearing out?

Another thing that has just popped into my head. I can't remember if it was 2022 or 2023 but one day I switched my system on and it was near the RAM sticks as soon as I pushed the power button there was a spark and the system immediately shut down. However I pushed the power button and the system was perfectly fine.

Attempted to open a game today, noticed a big slow down in the mouse performance and the system totally locked up after a couple of minutes. Had to hold the power button as per usual. Checked event viewer and reliability monitor but no errors showed this time.

I tried using a fan and plugging the system straight into a power socket and saw no change.