Question My workstation crashes ALMOST randomly.

Apr 23, 2025
1
0
10
Hello all,
My PC crashes and I don't know how to proceed.

Suddenly, monitors go black, the fans start to spin at default speed. The pc doesn't shut down entirely, but no input is possible, no output is visible or audible (except the fans), even the power up button on the front of the case doesn't respond. I always have to lean to the back of the case, turn off and on the power switch there and then start the PC anew.
This happens every other day (on average, sometimes 2x a day, sometime few days are fine) for some time now. It occurs without any obvious reason. It can happen when I'm working. It can happen during gaming. But also many times when just the web browser was running, or even when I havent yet starting to do anything, just looking on a desktop right after booting up. It seems to happen more during work, but that could be some sort of observation bias, since most of the time this pc runs, softwares like 3dsMax, CAD, etc are open.
I've spent some time with tools a computer layman like me can wrap his head around:
- I've run antivirus and antimalware tests
- I checked my disk(s) health via diagnostic tools
- I've run system file check
- I've updated my drivers. As it happens for months now, some more than once.
- I've updated UEFI drivers. Something I didn't know was a thing one should do, until then.
- I've cleaned the registry
- I've checked the Event Viewer. No log there, that I could link to the crashes (except the one after the reboot, which points out that system was shut down incorrectly last time).
- I've run memtest from usb flash disc
- The tools checking the CPU temperature do not show any spikes, max temp is at 69°C (156 F), usually 55-60°C (130-140 F)
- I do not have log of GPU temperatures, but since this often happens with nothing GPU draining going on, during those crashes I assume it stays at the usual 37°C (98 F).
- during this period I once cleaned re-installed the system (Win10) and some time later updated from Win10 to 11.
- at this point I considered if something isn't wrong with power supply, but here starts the territory I don't know anything about. Voltages, watts, how to log them when system suddenly shuts down,...
Here onward I'd ask for help of people who knows what steps to take to identify the culprit.

For a longest time, I wasn't able to reproduce this crash. Until recently. One particular work file triggers this every time, shortly after I open it. It is an Archicad file (CAD software, for civil engineering, architecture documentation), heavy one, provided by client. This file doesn't trigger this on any other of 2 pcs of my friends I tested it on. Even with this option now in hands, the points stated above (about event viewer, temperature,...) stand.

Is there a way for me identify what source of this trouble is, before I delve into switching each part one by one with spares (which I currently do not have)? A step I somewhat fear, since last time I've had to put apart/together PC was 20 yaers ago and that one was cheap old thing, not a source of my livelihood. And since any good service repair shop isn't anywhere close by...
Thank You.

CPU : AMD Ryzen 9 5950X 16-Core Processor
Memory : 2x DIMM PATRIOT Viper 4 Blackout DDR4 64GB,(=128GB), 130981MB(3333)
Motherboard : ROG CROSSHAIR VIII DARK HERO
OS : Microsoft Windows 11 Pro
GPU : NVIDIA GeForce GTX 1650 SUPER
Hard drives : Samsung SSD 970 EVO Plus 1TB(931GB,SCSI), Samsung SSD 980 PRO 1TB(931GB,SCSI)
Power source: CORSAIR ATX 750W RM750x (2018)
BIOS version: 4402
 
You can download Cpuid Hwmonitor, https://www.cpuid.com/softwares/hwmonitor.html it will show you your voltages, next to the expected voltages . A little over is ok but lower is probably bad.
I know you said you ran Memtest,but you can try pulling one stick of ram at a time and see if this behavior continues to happen and then try the next stick of ram.
You could try running different Windows power settings to see if this affects the behavior at all.
Do you have any peripheral items plugged into the usb ports? You can try unplugging one at a time to check for any difference.
 
  • Like
Reactions: Ralston18
Also look in Reliability History/Monitor for error codes, warnings, and even informational events being logged just before or at the time of the crashes.

Reliability History/Monitor is much more end user friendly and the timeline format may reveal patterns.

Good that you looked at Event Viewer. However, Event Viewer does indeed require more time and effort to navigate and understand.

However, do take another look there.

To help:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)
 
Memory : 2x DIMM PATRIOT Viper 4 Blackout DDR4 64GB,(=128GB), 130981MB(3333)
I've been trying to locate a 128GB (2 x 64GB) Patriot Viper 4 Blackout memory kit online and so far I've not found any data sheets. They all seem to be 2 x 16GB (32GB) or 2 x 32GB (64GB).

If possible, can you pull a stick of RAM and post the exact part number on this forum.

It might be an an idea to check in your RAM is in the Qualified Vendors List for your mobo.

I understand why you need 128GB, but I was slightly surprised to learn that 64GB DDR4 DIMMs are available. Perhaps I'm just not keeping up with the times.

From your description, my first thoughts are the problem is RAM related. Did MemTest pass with zero errors?

Are you running your memory at stock DDR4 speed or do you have XMP memory overclocking enabled?

If your RAM is currently running XMP at 3600MT/s or similar, switch off XMP and see if the system becomes stable at the JEDEC default DDR4 RAM speed (probably 2133 or 2400MT/s).

If any of your friends are prepared to let you "borrow" RAM from their PCs, it might be worth swapping out your RAM to see if their RAM is more stable. This presupposes they have compatible DDR4, not DDR3 or DDR5.

Although your PSU is roughly 7 years old, the RM750X series should be capable of supplying a 5950X and GTX1650 with ease. It might be worth checking the spec for your particular RM750X to see if it came with a 7 or a 10 year warranty.

GPU : NVIDIA GeForce GTX 1650 SUPER
This is quite an old card (I've got a standard GTX1650 in my old multimedia rig). Are you running NVidia's Game or Studio Drivers?

In theory the Studio Driver (if available for your card) might be more stable than the Game Driver. I run NVidia Studio drivers on the GPU in my 7950X editing rig.

I assume you have a decent (big) CPU cooler on your 5950X? Something like a Thermalright Peerless Assassin 120 or a 240/360/420mm AIO?

As this review shows, it's not difficult to push the 5950X up past 200W dissipation in some circumstances.
https://www.techpowerup.com/review/amd-ryzen-9-5950x/19.html

I see 190W to 200W on my 7950X (Noctua NH-D15 cooler) at 85 to 92°C all cores during Handbrake conversions in HWMonitor.
https://www.cpuid.com/softwares/hwmonitor.html

If you're using a really small CPU cooler, your 5950X might be throttling or overheating.
 

TRENDING THREADS