Hello! I have quite the problem and looking for help from some hardware experts. All signs point to harware but having some perplexing stuff going on.
HARDWARE
Built Late 2021
CPU, AMD Ryzen 9 5900X
RAM, G.SKILL Trident Z Neo DDR4 3600
MOBO, GIGABYTE B550 VISION D-P AM4 AMD B550 ATX
M2, Western Digital WD BLACK SN750 NVMe M.2 2280 1TB (Windows)
Western Digital Blue 2TB SSD (Linux)
Graphics, GIGABYTE Eagle GeForce RTX 3080 Ti 12GB
PSU, Corsair HX750 model rps0074
CPU Fan, Noctua NH-U12S SE-AM4
PROBLEM
Constant freezes (seems to happen quicker using linux -- max 10~15 minutes of use), have to hard boot machine rarely the computer will just restart on its own. This even happens while using a Kubuntu Live CD, I've tried reinstalling linux thinking it could be OS related. OS install always fails around 50%.
HISTORY
I use two separate hard drives for linux & windows, I do not dual boot.
Linux started freezing on me about 3 months ago, but not very often and not while in use. Maybe once every 5 days that would require a hard boot. However, my children use windows to play games and never ran into freezing issues.
Hard drives
The freezes began happening more regularly and started happening while in use, it is now to the point where the freezes happen almost instantly. At first this only happened in linux but happened a few times while in windows while I was looking at logs (windows seems to go longer without freezing??). I figured it was a hard drive issue so I swapped my linux HD to another machine and ran SMART tools & bad. block scan (took 2 hours) and everything was fine; the fact it didn't freeze within the first 10 minutes was eye opening. I did find a failing sandisk SSD spitting out bad sectors for the last 6 months, I was able to back up important files and that SSD has now been retired.
RAM
(I re-seated this) HD seemed clear so I ran memtest on my ram for approximately 3 hours with no errors. I've also done the Windows Memory Diagnostics Tool 3x, no issues.
Drivers/Bios
My bios version was pretty outdated so I updated from F11 -> F18.5
My AMD chipset drivers were out dated so I updated from 3.x to 8.x (I think 8, maybe 5?)
GPU
I've ran furmark twice for approximately 15 minutes each time and everything looked fine.
CPU
I ran a bunch of stress-ng tests while in linux and everything looked fine (linux didn't freeze for awhile so I locked the screen and went upstairs and it was frozen 15 minutes later -- no load).
ERRORS I'VE SEEN:
Forgive me, I do not know the order in which this all took place. But here are some error codes I've seen the past few days trouble shooting after this:
After a freeze that restarted the computer, only seen this once
[0.814896] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: baa000000000080b
[0.814905] mce: [Hardware Error]: TSC 0 MISC d0120001000000000 SYND 5d000000 IPID 1002e00000500
[0.814915] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1738635997 SOCKET 0 APIC 0 microcode a20102b
Went down this rabbit hole, nvidia proprietary drivers vs not, don't think issue.
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005200] Failed to grab modeset ownership
NEXT Steps & Random thoughts
This is my work PC and having it out of comission is putting me behind. Does anyone have any ideas I test next? What the next likely culrpit might be?
My PSU fans spin, my GPU fans spin, my CPU fan spins.
I took out CMOS battery for awhile while inspecting machine.
I've reapplied thermal paste on CPU (previous job was kind of shoddy, whoops).
I can run intense stress tests while on windows and everything seems okay. I've yet to be doing anything intense and have the system freeze, seems to always happen during low usage (even on linux).
Linux consistently freezes after a max usage of 15 or so minutes.
Kubuntu Live CD freezes near 50% on install (I've tried 6x).
I'm having a hard time getting windows to freeze again.
Any ideas?
Thank you!
EDIT:
I've never overclocked. Hardly ever messed w/ bios settings. Most I've ever done in there is switch my RAM profile.
MISC:
Here is my: inxi -Fxz
System:
Kernel: 6.8.0-52-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
Desktop: Budgie v: 10.9.1 Distro: Budgie 24.04.1 LTS (Noble Numbat)
base: Ubuntu
Machine:
Type: Desktop System: Gigabyte product: B550 VISION D-P v: -CF
serial: <superuser required>
Mobo: Gigabyte model: B550 VISION D-P serial: <superuser required>
UEFI: American Megatrends LLC. v: F18d date: 09/02/2024
CPU:
Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP arch: Zen 3+
rev: 0 cache: L1: 768 KiB L2: 6 MiB L3: 64 MiB
Speed (MHz): avg: 2962 high: 4584 min/max: 2200/4950 boost: enabled cores:
1: 2873 2: 3383 3: 2200 4: 2904 5: 4584 6: 3466 7: 2200 8: 2200 9: 2200
10: 2864 11: 4008 12: 3592 13: 3445 14: 2200 15: 2200 16: 2200 17: 3667
18: 2200 19: 2200 20: 3339 21: 3589 22: 2200 23: 3360 24: 4034
bogomips: 177254
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
Graphics:
Device-1: NVIDIA GA102 [GeForce RTX 3080 Ti] vendor: Gigabyte driver: nvidia
v: 550.120 arch: Ampere bus-ID: 52:00.0
Display: x11 server: X.Org v: 21.1.11 driver: X: loaded: nouveau
unloaded: fbdev,modesetting,vesa failed: nvidia gpu: nvidia,nvidia-nvswitch
resolution: 1: 3840x2160~60Hz 2: 2560x1440
API: EGL v: 1.5 drivers: nvidia,swrast platforms:
active: x11,surfaceless,device inactive: gbm,wayland,device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 550.120
glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 3080 Ti/PCIe/SSE2
Audio:
Device-1: NVIDIA GA102 High Definition Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 52:00.1
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 54:00.4
API: ALSA v: k6.8.0-52-generic status: kernel-api
Server-1: PipeWire v: 1.0.5 status: active
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel
port: f000 bus-ID: 4e:00.0
IF: enp78s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus-ID: 4f:00.0
IF: wlp79s0 state: down mac: <filter>
Device-3: Intel Ethernet I225-V vendor: Gigabyte driver: igc v: kernel
port: N/A bus-ID: 51:00.0
IF: enp81s0 state: down mac: <filter>
IF-ID-1: docker0 state: down mac: <filter>
Bluetooth:
Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB
bus-ID: 1-6.1:6
Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
lmp-v: 11
Drives:
Local Storage: total: 8.19 TiB used: 370.73 GiB (4.4%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS100T3X0C-00SJG0
size: 931.51 GiB temp: 53.9 C
ID-2: /dev/sda vendor: Western Digital model: WD Blue SA510 2.5 2TB
size: 1.82 TiB
ID-3: /dev/sdb vendor: Western Digital model: WD6003FZBX-00K5WB0
size: 5.46 TiB
Partition:
ID-1: / size: 78.19 GiB used: 24.74 GiB (31.6%) fs: ext4 dev: /dev/sda3
ID-2: /boot/efi size: 1.05 GiB used: 6.1 MiB (0.6%) fs: vfat
dev: /dev/sda1
ID-3: /home size: 1.56 TiB used: 334.12 GiB (20.9%) fs: ext4
dev: /dev/sda5
ID-4: /opt size: 124.93 GiB used: 11.86 GiB (9.5%) fs: ext4 dev: /dev/sda4
Swap:
ID-1: swap-1 type: partition size: 32 GiB used: 0 KiB (0.0%) dev: /dev/sda2
Sensors:
System Temperatures: cpu: 47.6 C mobo: N/A gpu: nvidia temp: 54 C
Fan Speeds (rpm): N/A gpu: nvidia fan: 0%
Info:
Memory: total: 32 GiB available: 31.24 GiB used: 4.14 GiB (13.2%)
Processes: 461 Uptime: 13m Init: systemd target: graphical (5)
Packages: 2457 Compilers: gcc: 13.3.0 Shell: Zsh v: 5.9 inxi: 3.3.34
HARDWARE
Built Late 2021
CPU, AMD Ryzen 9 5900X
RAM, G.SKILL Trident Z Neo DDR4 3600
MOBO, GIGABYTE B550 VISION D-P AM4 AMD B550 ATX
M2, Western Digital WD BLACK SN750 NVMe M.2 2280 1TB (Windows)
Western Digital Blue 2TB SSD (Linux)
Graphics, GIGABYTE Eagle GeForce RTX 3080 Ti 12GB
PSU, Corsair HX750 model rps0074
CPU Fan, Noctua NH-U12S SE-AM4
PROBLEM
Constant freezes (seems to happen quicker using linux -- max 10~15 minutes of use), have to hard boot machine rarely the computer will just restart on its own. This even happens while using a Kubuntu Live CD, I've tried reinstalling linux thinking it could be OS related. OS install always fails around 50%.
HISTORY
I use two separate hard drives for linux & windows, I do not dual boot.
Linux started freezing on me about 3 months ago, but not very often and not while in use. Maybe once every 5 days that would require a hard boot. However, my children use windows to play games and never ran into freezing issues.
Hard drives
The freezes began happening more regularly and started happening while in use, it is now to the point where the freezes happen almost instantly. At first this only happened in linux but happened a few times while in windows while I was looking at logs (windows seems to go longer without freezing??). I figured it was a hard drive issue so I swapped my linux HD to another machine and ran SMART tools & bad. block scan (took 2 hours) and everything was fine; the fact it didn't freeze within the first 10 minutes was eye opening. I did find a failing sandisk SSD spitting out bad sectors for the last 6 months, I was able to back up important files and that SSD has now been retired.
RAM
(I re-seated this) HD seemed clear so I ran memtest on my ram for approximately 3 hours with no errors. I've also done the Windows Memory Diagnostics Tool 3x, no issues.
Drivers/Bios
My bios version was pretty outdated so I updated from F11 -> F18.5
My AMD chipset drivers were out dated so I updated from 3.x to 8.x (I think 8, maybe 5?)
GPU
I've ran furmark twice for approximately 15 minutes each time and everything looked fine.
CPU
I ran a bunch of stress-ng tests while in linux and everything looked fine (linux didn't freeze for awhile so I locked the screen and went upstairs and it was frozen 15 minutes later -- no load).
ERRORS I'VE SEEN:
Forgive me, I do not know the order in which this all took place. But here are some error codes I've seen the past few days trouble shooting after this:
After a freeze that restarted the computer, only seen this once
[0.814896] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: baa000000000080b
[0.814905] mce: [Hardware Error]: TSC 0 MISC d0120001000000000 SYND 5d000000 IPID 1002e00000500
[0.814915] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1738635997 SOCKET 0 APIC 0 microcode a20102b
Went down this rabbit hole, nvidia proprietary drivers vs not, don't think issue.
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005200] Failed to grab modeset ownership
NEXT Steps & Random thoughts
This is my work PC and having it out of comission is putting me behind. Does anyone have any ideas I test next? What the next likely culrpit might be?
My PSU fans spin, my GPU fans spin, my CPU fan spins.
I took out CMOS battery for awhile while inspecting machine.
I've reapplied thermal paste on CPU (previous job was kind of shoddy, whoops).
I can run intense stress tests while on windows and everything seems okay. I've yet to be doing anything intense and have the system freeze, seems to always happen during low usage (even on linux).
Linux consistently freezes after a max usage of 15 or so minutes.
Kubuntu Live CD freezes near 50% on install (I've tried 6x).
I'm having a hard time getting windows to freeze again.
Any ideas?
Thank you!
EDIT:
I've never overclocked. Hardly ever messed w/ bios settings. Most I've ever done in there is switch my RAM profile.
MISC:
Here is my: inxi -Fxz
System:
Kernel: 6.8.0-52-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
Desktop: Budgie v: 10.9.1 Distro: Budgie 24.04.1 LTS (Noble Numbat)
base: Ubuntu
Machine:
Type: Desktop System: Gigabyte product: B550 VISION D-P v: -CF
serial: <superuser required>
Mobo: Gigabyte model: B550 VISION D-P serial: <superuser required>
UEFI: American Megatrends LLC. v: F18d date: 09/02/2024
CPU:
Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP arch: Zen 3+
rev: 0 cache: L1: 768 KiB L2: 6 MiB L3: 64 MiB
Speed (MHz): avg: 2962 high: 4584 min/max: 2200/4950 boost: enabled cores:
1: 2873 2: 3383 3: 2200 4: 2904 5: 4584 6: 3466 7: 2200 8: 2200 9: 2200
10: 2864 11: 4008 12: 3592 13: 3445 14: 2200 15: 2200 16: 2200 17: 3667
18: 2200 19: 2200 20: 3339 21: 3589 22: 2200 23: 3360 24: 4034
bogomips: 177254
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
Graphics:
Device-1: NVIDIA GA102 [GeForce RTX 3080 Ti] vendor: Gigabyte driver: nvidia
v: 550.120 arch: Ampere bus-ID: 52:00.0
Display: x11 server: X.Org v: 21.1.11 driver: X: loaded: nouveau
unloaded: fbdev,modesetting,vesa failed: nvidia gpu: nvidia,nvidia-nvswitch
resolution: 1: 3840x2160~60Hz 2: 2560x1440
API: EGL v: 1.5 drivers: nvidia,swrast platforms:
active: x11,surfaceless,device inactive: gbm,wayland,device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 550.120
glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 3080 Ti/PCIe/SSE2
Audio:
Device-1: NVIDIA GA102 High Definition Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 52:00.1
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel bus-ID: 54:00.4
API: ALSA v: k6.8.0-52-generic status: kernel-api
Server-1: PipeWire v: 1.0.5 status: active
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel
port: f000 bus-ID: 4e:00.0
IF: enp78s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus-ID: 4f:00.0
IF: wlp79s0 state: down mac: <filter>
Device-3: Intel Ethernet I225-V vendor: Gigabyte driver: igc v: kernel
port: N/A bus-ID: 51:00.0
IF: enp81s0 state: down mac: <filter>
IF-ID-1: docker0 state: down mac: <filter>
Bluetooth:
Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB
bus-ID: 1-6.1:6
Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
lmp-v: 11
Drives:
Local Storage: total: 8.19 TiB used: 370.73 GiB (4.4%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS100T3X0C-00SJG0
size: 931.51 GiB temp: 53.9 C
ID-2: /dev/sda vendor: Western Digital model: WD Blue SA510 2.5 2TB
size: 1.82 TiB
ID-3: /dev/sdb vendor: Western Digital model: WD6003FZBX-00K5WB0
size: 5.46 TiB
Partition:
ID-1: / size: 78.19 GiB used: 24.74 GiB (31.6%) fs: ext4 dev: /dev/sda3
ID-2: /boot/efi size: 1.05 GiB used: 6.1 MiB (0.6%) fs: vfat
dev: /dev/sda1
ID-3: /home size: 1.56 TiB used: 334.12 GiB (20.9%) fs: ext4
dev: /dev/sda5
ID-4: /opt size: 124.93 GiB used: 11.86 GiB (9.5%) fs: ext4 dev: /dev/sda4
Swap:
ID-1: swap-1 type: partition size: 32 GiB used: 0 KiB (0.0%) dev: /dev/sda2
Sensors:
System Temperatures: cpu: 47.6 C mobo: N/A gpu: nvidia temp: 54 C
Fan Speeds (rpm): N/A gpu: nvidia fan: 0%
Info:
Memory: total: 32 GiB available: 31.24 GiB used: 4.14 GiB (13.2%)
Processes: 461 Uptime: 13m Init: systemd target: graphical (5)
Packages: 2457 Compilers: gcc: 13.3.0 Shell: Zsh v: 5.9 inxi: 3.3.34