Question PC Freezes went from rare to constant.

Feb 4, 2025
7
1
15
Hello! I have quite the problem and looking for help from some hardware experts. All signs point to harware but having some perplexing stuff going on.

HARDWARE
Built Late 2021
CPU, AMD Ryzen 9 5900X
RAM, G.SKILL Trident Z Neo DDR4 3600
MOBO, GIGABYTE B550 VISION D-P AM4 AMD B550 ATX
M2, Western Digital WD BLACK SN750 NVMe M.2 2280 1TB (Windows)
Western Digital Blue 2TB SSD (Linux)
Graphics, GIGABYTE Eagle GeForce RTX 3080 Ti 12GB
PSU, Corsair HX750 model rps0074
CPU Fan, Noctua NH-U12S SE-AM4


PROBLEM
Constant freezes (seems to happen quicker using linux -- max 10~15 minutes of use), have to hard boot machine rarely the computer will just restart on its own. This even happens while using a Kubuntu Live CD, I've tried reinstalling linux thinking it could be OS related. OS install always fails around 50%.

HISTORY
I use two separate hard drives for linux & windows, I do not dual boot.
Linux started freezing on me about 3 months ago, but not very often and not while in use. Maybe once every 5 days that would require a hard boot. However, my children use windows to play games and never ran into freezing issues.

Hard drives
The freezes began happening more regularly and started happening while in use, it is now to the point where the freezes happen almost instantly. At first this only happened in linux but happened a few times while in windows while I was looking at logs (windows seems to go longer without freezing??). I figured it was a hard drive issue so I swapped my linux HD to another machine and ran SMART tools & bad. block scan (took 2 hours) and everything was fine; the fact it didn't freeze within the first 10 minutes was eye opening. I did find a failing sandisk SSD spitting out bad sectors for the last 6 months, I was able to back up important files and that SSD has now been retired.

RAM
(I re-seated this) HD seemed clear so I ran memtest on my ram for approximately 3 hours with no errors. I've also done the Windows Memory Diagnostics Tool 3x, no issues.

Drivers/Bios
My bios version was pretty outdated so I updated from F11 -> F18.5
My AMD chipset drivers were out dated so I updated from 3.x to 8.x (I think 8, maybe 5?)

GPU
I've ran furmark twice for approximately 15 minutes each time and everything looked fine.

CPU
I ran a bunch of stress-ng tests while in linux and everything looked fine (linux didn't freeze for awhile so I locked the screen and went upstairs and it was frozen 15 minutes later -- no load).

ERRORS I'VE SEEN:
Forgive me, I do not know the order in which this all took place. But here are some error codes I've seen the past few days trouble shooting after this:

After a freeze that restarted the computer, only seen this once
[0.814896] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: baa000000000080b
[0.814905] mce: [Hardware Error]: TSC 0 MISC d0120001000000000 SYND 5d000000 IPID 1002e00000500
[0.814915] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1738635997 SOCKET 0 APIC 0 microcode a20102b

Went down this rabbit hole, nvidia proprietary drivers vs not, don't think issue.
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005200] Failed to grab modeset ownership


NEXT Steps & Random thoughts
This is my work PC and having it out of comission is putting me behind. Does anyone have any ideas I test next? What the next likely culrpit might be?

My PSU fans spin, my GPU fans spin, my CPU fan spins.
I took out CMOS battery for awhile while inspecting machine.
I've reapplied thermal paste on CPU (previous job was kind of shoddy, whoops).
I can run intense stress tests while on windows and everything seems okay. I've yet to be doing anything intense and have the system freeze, seems to always happen during low usage (even on linux).
Linux consistently freezes after a max usage of 15 or so minutes.
Kubuntu Live CD freezes near 50% on install (I've tried 6x).
I'm having a hard time getting windows to freeze again.

Any ideas?
Thank you!


EDIT:
I've never overclocked. Hardly ever messed w/ bios settings. Most I've ever done in there is switch my RAM profile.

MISC:
Here is my: inxi -Fxz

System:
Kernel: 6.8.0-52-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
Desktop: Budgie v: 10.9.1 Distro: Budgie 24.04.1 LTS (Noble Numbat)
base: Ubuntu
Machine:
Type: Desktop System: Gigabyte product: B550 VISION D-P v: -CF
serial: <superuser required>
Mobo: Gigabyte model: B550 VISION D-P serial: <superuser required>
UEFI: American Megatrends LLC. v: F18d date: 09/02/2024
CPU:
Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP arch: Zen 3+
rev: 0 cache: L1: 768 KiB L2: 6 MiB L3: 64 MiB
Speed (MHz): avg: 2962 high: 4584 min/max: 2200/4950 boost: enabled cores:
1: 2873 2: 3383 3: 2200 4: 2904 5: 4584 6: 3466 7: 2200 8: 2200 9: 2200
10: 2864 11: 4008 12: 3592 13: 3445 14: 2200 15: 2200 16: 2200 17: 3667
18: 2200 19: 2200 20: 3339 21: 3589 22: 2200 23: 3360 24: 4034
bogomips: 177254
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
Graphics:
Device-1: NVIDIA GA102 [GeForce RTX 3080 Ti] vendor: Gigabyte driver: nvidia
v: 550.120 arch: Ampere bus-ID: 52:00.0
Display: x11 server: X.Org v: 21.1.11 driver: X: loaded: nouveau
unloaded: fbdev,modesetting,vesa failed: nvidia gpu: nvidia,nvidia-nvswitch
resolution: 1: 3840x2160~60Hz 2: 2560x1440
API: EGL v: 1.5 drivers: nvidia,swrast platforms:
active: x11,surfaceless,device inactive: gbm,wayland,device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 550.120
glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 3080 Ti/PCIe/SSE2
Audio:
Device-1: NVIDIA GA102 High Definition Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 52:00.1
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 54:00.4
API: ALSA v: k6.8.0-52-generic status: kernel-api
Server-1: PipeWire v: 1.0.5 status: active
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel
port: f000 bus-ID: 4e:00.0
IF: enp78s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus-ID: 4f:00.0
IF: wlp79s0 state: down mac: <filter>
Device-3: Intel Ethernet I225-V vendor: Gigabyte driver: igc v: kernel
port: N/A bus-ID: 51:00.0
IF: enp81s0 state: down mac: <filter>
IF-ID-1: docker0 state: down mac: <filter>
Bluetooth:
Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB
bus-ID: 1-6.1:6
Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
lmp-v: 11
Drives:
Local Storage: total: 8.19 TiB used: 370.73 GiB (4.4%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS100T3X0C-00SJG0
size: 931.51 GiB temp: 53.9 C
ID-2: /dev/sda vendor: Western Digital model: WD Blue SA510 2.5 2TB
size: 1.82 TiB
ID-3: /dev/sdb vendor: Western Digital model: WD6003FZBX-00K5WB0
size: 5.46 TiB
Partition:
ID-1: / size: 78.19 GiB used: 24.74 GiB (31.6%) fs: ext4 dev: /dev/sda3
ID-2: /boot/efi size: 1.05 GiB used: 6.1 MiB (0.6%) fs: vfat
dev: /dev/sda1
ID-3: /home size: 1.56 TiB used: 334.12 GiB (20.9%) fs: ext4
dev: /dev/sda5
ID-4: /opt size: 124.93 GiB used: 11.86 GiB (9.5%) fs: ext4 dev: /dev/sda4
Swap:
ID-1: swap-1 type: partition size: 32 GiB used: 0 KiB (0.0%) dev: /dev/sda2
Sensors:
System Temperatures: cpu: 47.6 C mobo: N/A gpu: nvidia temp: 54 C
Fan Speeds (rpm): N/A gpu: nvidia fan: 0%
Info:
Memory: total: 32 GiB available: 31.24 GiB used: 4.14 GiB (13.2%)
Processes: 461 Uptime: 13m Init: systemd target: graphical (5)
Packages: 2457 Compilers: gcc: 13.3.0 Shell: Zsh v: 5.9 inxi: 3.3.34
 
This is my work PC and having it out of comission is putting me behind. Does anyone have any ideas I test next? What the next likely culrpit might be?
I'm in the fortunate position of having amassed a collection of PCs over the years, so I wouldn't suffer with one out of commission.

It also means I can start swapping components around until the fault goes away, or I've replaced everything and am still none the wiser as to the exact cause.

The most awkward part to replace is the motherboard, because it's such a big investment. Next comes the CPU if it's a high end device. It's easier to change RAM, GPU, SSD, etc. Even a low end GPU is a valid test.

I'd be inclined to source the cheapest compatible CPU you can find, perhaps second hand on eBay, then substitute it for the 5900X.

You've tested your RAM with MemTest (86 or 86+ ?) and it's passed, but if you're overclocking it with XMP/EXPO up near 3600MT/s, try setting it back to JEDEC 2133 or 2300MT/s (XMP off) for additional stability. Sometimes you need to run several complete MemTest runs to uncover a fault.

Finally it's motherboard swap time. Again, I'd suggest looking on the likes of eBay for the cheapest, lowest spec microATX board you can find that's "guaranteed" working. Check the seller's feedback. I've bought dozens of second hand boards and not had any major disappointments. You've also got eBay's guarantee if the description is inaccurate. My cheapest working mobo/CPU combo (AMD FM2) cost me $7. It's now in a server.

A second hand mobo/CPU combination is probably best. That way you get a spare CPU for testing your B550 Vision mobo. Provided the eBay purchase works, you could test your 5900X in the second board if the BIOS supports it.

Bearing in mind it's your work PC and just over 3 years old, if you're losing significant business or annoying clients, now might be the time to consider a new computer, or a new DDR5 mobo/CPU/RAM upgrade. If you can't justify the expense of either, get a second hand office PC to tide you over.

We're assuming here that your HX750 PSU is still OK, but the Corsair.com web site shows it (or a similar model) is out of stock in the US?, EU and UK. Did you buy the HX750 brand new in 2021, or bring it over from an older build? If it's more than 5 years old, treat it as suspect.

Power supplies don't last forever, even if they have a 7 or 10-year warranty. It's yet another component you need to swap out for a few hours, if you can "borrow" a spare PSU from a friend or local repair shop.

To sum up, replace components one at a time, until (hopefully) the fault goes away. Good luck.
 
  • Like
Reactions: RodroX
Thank for your well thought out reply!

I do have a few computers around here but unfortunately all Intel.

RAM I used memtest86+, I am not over clocking it -- I'm so inept in that I don't even know what any of those settings mean, ha!

"My cheapest working mobo/CPU combo (AMD FM2) cost me $7. It's now in a server." Wow, this sounds kind of appealing as I've wanted to get a reliable always on postgres server locally.

"We're assuming here that your HX750 PSU is still OK, but the Corsair.com web site shows it (or a similar model) is out of stock in the US?, EU and UK. Did you buy the HX750 brand new in 2021, or bring it over from an older build? If it's more than 5 years old, treat it as suspect."

(US)

Good call, I finally remembered to search it up in my email and I RMA'ed a defective one back in July 2018. So, I've been using this replacement since.

"Power supplies don't last forever, even if they have a 7 or 10-year warranty. It's yet another component you need to swap out for a few hours, if you can "borrow" a spare PSU from a friend or local repair shop."

My wife has a janky 550watt something I got off amazon... I can throw in and do light tasks like programming (day job) and see what happens.

Just so perplexing. I've had windows up for 1hr30 minutes so far, streaming you tube, twitch, music and randomly running GPU benchmarks. No issues.

I wonder if there is something in the linux kernel that isn't playing nice with my PSU? That would explain the quick freezes in linux and USB Live CD...

My next steps are to leave windows running w/ a bunch of tasks overnight and see if it ever freezes.

If no, I'll try using linux (daily driver) again and see if I have freezes.

If yes, I'll swap out the PSU with my wife's PSU and report back.

If I still have issues w/ swapped PSU I'll start looking at mobo/cpu combo replacements, ugh!
 
Thank for your well thought out reply!

I do have a few computers around here but unfortunately all Intel.

RAM I used memtest86+, I am not over clocking it -- I'm so inept in that I don't even know what any of those settings mean, ha!

"My cheapest working mobo/CPU combo (AMD FM2) cost me $7. It's now in a server." Wow, this sounds kind of appealing as I've wanted to get a reliable always on postgres server locally.

"We're assuming here that your HX750 PSU is still OK, but the Corsair.com web site shows it (or a similar model) is out of stock in the US?, EU and UK. Did you buy the HX750 brand new in 2021, or bring it over from an older build? If it's more than 5 years old, treat it as suspect."

(US)

Good call, I finally remembered to search it up in my email and I RMA'ed a defective one back in July 2018. So, I've been using this replacement since.

"Power supplies don't last forever, even if they have a 7 or 10-year warranty. It's yet another component you need to swap out for a few hours, if you can "borrow" a spare PSU from a friend or local repair shop."

My wife has a janky 550watt something I got off amazon... I can throw in and do light tasks like programming (day job) and see what happens.

Just so perplexing. I've had windows up for 1hr30 minutes so far, streaming you tube, twitch, music and randomly running GPU benchmarks. No issues.

I wonder if there is something in the linux kernel that isn't playing nice with my PSU? That would explain the quick freezes in linux and USB Live CD...

My next steps are to leave windows running w/ a bunch of tasks overnight and see if it ever freezes.

If no, I'll try using linux (daily driver) again and see if I have freezes.

If yes, I'll swap out the PSU with my wife's PSU and report back.

If I still have issues w/ swapped PSU I'll start looking at mobo/cpu combo replacements, ugh!
3600 on memory is considered a overclock even on xmp.

Dial it back to 3200 under ram speed keep the xmp on but just clock it back to 3200 from 3600.

As the supported speed is 3200 anything higher is a overclock.

Also your CPU cooler isn't up to cooling a Ryzen 5900x.

For context I'm using a 7 heat pipe cooler.
And 3 fans hooked on that.

Also set a temp target in bios to 85c.

Max temperature is 90c. But dialing it to start clocking down when it hits 85c will prolong the life of the CPU.
 
  • Like
Reactions: RodroX
Gotcha. I enabled XMP but clocked it back to 3200.

I was actually able to successfully install Kbuntu so I had hope. However, about 10 minutes into using the fresh install I'm back to freezing.

"Also your CPU cooler isn't up to cooling a Ryzen 5900x."

Thanks, I'll look into this.

"Also set a temp target in bios to 85c."

I tried to find this in bios but to no avail. Not sure if my bios has that option.
 
I ran across this: https://www.reddit.com/r/linuxhardware/comments/dk1g44/system_freezes_with_a_ryzen_cpu/

Apparently my CPU might have some wonkiness w/ linux. What's really weird is I ran linux w/ this CPU for roughly a year before I started having issues. Perhaps older kernels handled this better? Not really sure.

In BIOS swapping the "Power Supply Idle Control" to "Typical current idle" has granted me 26minutes in linux, a record for the past few days!

Currently running some stress-ng tests to simulate a moderate amount of stress. I'll continue that off and on throughout the day if I don't run into any freezes.
 
  • Like
Reactions: RodroX
A small change in the kernel can cause huge problems, but whats striking me is that you said Windows has also got unstable, and now you mention this really old (legacy) setting for PSUs in the BIOS...

Could this really be related tot he PSU maybe ?

Maybe I missed the data, how old is your PSU?
 
Yeah, it's all kind of a blur because of stress (work computer and deadlines, haha). Let me try to recall the windows freezes....

I think I had two while doing really light activities in windows, I believe I was just looking at system events for issues regarding hardware.

Maybe windows is more power intense so it hardly ever triggered whatever a 'Low' power state would be?

I've had my PSU since July 2018, I had to RMA the previous one.

Found the original ticket w/ Corsair

"Early this year (2018) on January 13th I purchased a CORSAIR HX Series HX750 powersupply from newegg.com. Three days ago my computer would not boot fully. I did some troubleshooting with a friend and when we swapped out the powersupply for one in his computer it solved the problem. I am fearful the powersupply has died only after approximately 5 months of use."

So I've had the same PSU roughly since July 2018.
 
Yeah, it's all kind of a blur because of stress (work computer and deadlines, haha). Let me try to recall the windows freezes....

I think I had two while doing really light activities in windows, I believe I was just looking at system events for issues regarding hardware.

Maybe windows is more power intense so it hardly ever triggered whatever a 'Low' power state would be?

I've had my PSU since July 2018, I had to RMA the previous one.

Found the original ticket w/ Corsair

"Early this year (2018) on January 13th I purchased a CORSAIR HX Series HX750 powersupply from newegg.com. Three days ago my computer would not boot fully. I did some troubleshooting with a friend and when we swapped out the powersupply for one in his computer it solved the problem. I am fearful the powersupply has died only after approximately 5 months of use."

So I've had the same PSU roughly since July 2018.

mmm... so about 6 and a half years old + the previous issue when they give you a new one because original one was not working as intended.

Im guessing trying a different PSU may be a good idea, and be sure to return the psu setting in the BIOS to the defualt one.

In fact, as I rule of mine, every time I do a BIOS update I always load the default/optimized settings when Im done. Then go back right into the BIOS and set the XMP, FTP, FPTm, PBO, and any other setting you need before loading into Windows,
 
I have a M.2 that I use as external hard-drive.
If I connect it via USB and the M.2 gets hot, the whole computer freezes. The computer unfreezes when I unplug the external drive.

Therefore, if you have a problematic drive connected to the computer, even if the drive is not being used, it will freeze the computer.
 
  • Like
Reactions: RodroX
I'll kind of keep tossing what I've done in here in case anyone else eventually runs across it.

All times EST:

FEB 4, 2025, 10:55AM
I swapped my power supply idle control to "typical current idle"

2/4/25 12:41PM
Froze clicking around settings in Dolphin.

FEB,4 2025: 1:40PM
processor.max_cstate=1

In grub menu.

FEB, 4, 2025, 1:45PM
Swapped RAM back to OC 3600 XMP profile (saw on reddit that this can help ?)

FEB 4, 2025, 2:00PM
Installed nvidia drivers.

Freezes since 12:41PM: 0

Notes, Ryzen issues with linux?

https://www.google.com/search?q=amd...Z_aMSM4FBDy0wN6BAgEEAQ&biw=1555&bih=917&dpr=2
 
I have a M.2 that I use as external hard-drive.
If I connect it via USB and the M.2 gets hot, the whole computer freezes. The computer unfreezes when I unplug the external drive.

Therefore, if you have a problematic drive connected to the computer, even if the drive is not being used, it will freeze the computer.
I did see some errors about interacting w/ the hard drive that was failing. I've had it disconnected for awhile.
 
Gotcha. I enabled XMP but clocked it back to 3200.

I was actually able to successfully install Kbuntu so I had hope. However, about 10 minutes into using the fresh install I'm back to freezing.

"Also your CPU cooler isn't up to cooling a Ryzen 5900x."

Thanks, I'll look into this.

"Also set a temp target in bios to 85c."

I tried to find this in bios but to no avail. Not sure if my bios has that option.
Yeah we solved half the problem the other half is the weak cooler under advanced CPU.


Full instructions below

Gigabyte AM4 motherboard, access your BIOS by pressing "Delete" during boot, navigate to the "Tweaker" section, then go to "Advanced CPU Settings" where you can find the "Turbo Power Limits" option to set your desired temperature limit; usually, you'll need to enable this option and adjust the "Package Power Limit" values to set the desired temperature threshold.

Key steps:
  • Enter BIOS: Turn on your PC and press "Delete" repeatedly to access the BIOS.

  • Navigate to Tweaker: Use arrow keys to go to the "Tweaker" section.

  • Access Advanced CPU Settings: Select "Advanced CPU Settings".

  • Find Turbo Power Limits: Locate the "Turbo Power Limits" option and enable it.

  • Set Temperature Limit: Adjust the "Package Power Limit" values to set your desired CPU temperature limit.

Important points to consider:
  • Consult your motherboard manual:
    Specific BIOS options may vary slightly depending on your Gigabyte AM4 motherboard model, so always refer to your manual for exact settings.

  • Thermal throttling:
    Setting a CPU temperature limit will trigger thermal throttling if the CPU reaches the set temperature, reducing performance to prevent damage.