Question Win11 frequently freezes / crashes - but not when VM is up

Sep 13, 2023
4
0
10
Hey all,

I'm running into very weird problems and before pulling off the great reset, wanted to ask whether anyone could figure out the root cause - it seems rather unique from the symptoms...

So, running Win 11 on a ryzen 9500x here (full specs at very end). System is freezing more and more frequently. It started when the system was idle for longer: it would just go dark on all screens (regular monitor power saving after 15 mins) and after some time one monitor would come on, with a black screen (so background lighting on, but only black signal). PC not responsive, requires hard reset.
It became more often now, with freezing also during idle when monitors were still on.
Some times the screen would just freeze and show a black / inverted frame.
Then bluescreens came on top (DPC watchdog violation…)

As it all seemed to be connected to graphics output and even sometimes distorted output, I swapped out the graphics card to another 3080. Didn't change anything to the better. So taking the GPU from the list of problems.

Now the odd thing: Whenever I really utilize the machine (or alternatively have a VM running in the background (VMWare Workstation), it never crashes. Never. Seems the CPU virtualization somehow seems to prevent it?

Stress tested RAM in memtest86 for 8 hrs straight - 0 errors.

Stress tested CPU in Win - as long as it was running, 0 errors - once it was idle, crashed after a while.

So I was pretty sure my Win installation was just messed up. I booted into a live linux from an USB stick and also there - after some idle time, it froze. --> so a hardware issue after all?

I would just reinstall win and see how far I get with a fresh system, but I have quite the set of extensively configured software here and while I do have backups of course, I currently don't have the time of reinstalling just to find out it's a hardware issue after all. And Linux from USB freezing too doesn't look promising, either.

Any tips on how to proceed ideally? Or to nail down the root cause: Software / Hardware? Of course, I updated all drivers, BIOS as well, everything is up to date, Hardware settings look clean, no overclocking (took out the XMP on RAM, too). I took out *all* USB devices but keyboard / mouse. Didn't help. I took out all additional drives which are non-OS. Same issue.

System Event Viewer doesn't show any helpful events before the machine dies... at least couldn't identify any consistent bad actors.

Basically to me that leaves as potential bad actors: CPU, Board (and maybe PSU), maybe Win itself.

Looking forward to any tips and hints in the right direction. Many many thx!

Hunchi

Specs:
R9 9500x
Gigabyte B550 Aorus Pro V2 (latest and greatest firmware and drivers...)
64 GB DDR4-3200 (G.Skill RipJaws V)
OS is running on a M.2 SSD... Samsung 980 pro
RTX3080
PSU: Fractal Design Ion+ 860P
 

Lutfij

Titan
Moderator
Welcome to the forums, newcomer!

As it all seemed to be connected to graphics output and even sometimes distorted output, I swapped out the graphics card to another 3080. Didn't change anything to the better. So taking the GPU from the list of problems.
Another step that you could try is using DDU to remove all GPU drivers from your platform, then manually reinstalling the latest driver for your RTX3080 sourced from Nvidia in an elevate command, i.e, Right click installer>Run as Administrator.

For the sake of relevance, what BIOS version are you on for your motherboard?

As for your OS, if you do eventually end up reinstalling it, make sure to recreate the installer using Windows Media Creation Tools to rule out a corrupt installer.

PSU: Fractal Design Ion+ 860P
How old is the PSU in your build?
 

Ralston18

Titan
Moderator
Also look in Reliabilty History/Monitor for any error codes, warnings, or even informational events being captured just before or at the time of the freezes/crashes.

Much more end user friendly than Event Viewer and much easier to navigate and understand.

Plus the timeline format may reveal some pattern to the crashes.

How old is that Fractal Design PSU? History of heavy use for gaming, video editing, or event bit-mining.

Increasing numbers of errors and varying errors make the PSU a likely suspect.
 
Sep 13, 2023
4
0
10
Hi all,

thanks for your comments so far. Took me a bit to find the time to follow them all….

Bios Version: F16f (latest), earlier F15something, same issues

Reliability History: Thx! Didn‘t know the tool. No info though, except one driver (MuseFX Hub) failing regularly. Deinstalled. No improvment though.

PSU: It‘s max 2 years old (Think from Nov 2021). Used to mine a while, but stopped long time ago. Swapped it out vs another PSU, BeQuiet, runs fine on another system. Issues persist. Also RAM load test / memtest run fine, so guess not a power issue

DDU / Graphics driver: Tried to reboot into safe mode which completly f***‘d the system up (once it arrives in safe mode it freezes. If it arrives. Mostly already caught in a boot loop between trying to repair the OS and freezing).

Hence….

Took out all SSDs but the original OS SSD which i formatted finally… Win installer (newly created USB stick) doesn‘t want to install (varying error messages: This PC is not fit for Win 11 (wtf?), then after taking the SSD out and in again, it did said it can‘t create the partition on the drive. Played around with boot sequence, swapped SSDs multiple time and put the same one in again - worked in the end to install on the SSD.
BUT: As soon as the installer is done and i boot from SSD, it crashes after 1-5 mins. Mostly already when still booting up.

Took out all SSDs and back in an old trusty HDD. Same story. Installs, but once booting from HDD crashes. HDD works fine in other systems.

Tried to boot live linux (UEFI) from USB stick. Sometimes crashes while booting, But latest few seconds after X has come up.

With all these issues…. I‘m really a bit lost. If i had to take a wild guess without any technical clue: Some PCIe lanes fried? North bridge fried? Anything wrong with UEFI?

What bugs me most is that the system runs normal for hours in BIOS, just monitoring health, as well as performs memtest for hours without any freezes or even Errors.

Cannot throw out much more, only thing left is CPU and board i guess?

Building systems since 2000 and would consider myself rather tech-savy, but for this one i really struggle to find a root cause…

Any other hints?
 
Sep 13, 2023
4
0
10
So… after receiving a new mainboard: Same story. Ultimately swapped out the CPU versus an old 1700x —> all fine. It was the bloody ryzen 5900x in the end.
Claiming with AMD now, it‘s only 2 yrs old… so fingers crossed.

It was the component i least expected to fail.… How is this even possible? It was never under outrageous conditions and had a 360 AIO on top. Yes, i was giving it a lot of work, that‘s what you have a 9 series for, but nothing really out of the ordinary.
 
Sep 13, 2023
4
0
10
Short feedback for the one(s) who care(s) ;-)

RMA request with AMD successful, CPU was exchanged after testing. RMA time from sending in to return ca 2 weeks.