Question PC freezes when idle and CPU fails Prime95 "Small FFTs" test ?

fakedad

Distinguished
Jan 20, 2014
11
0
18,510
Problem:

I am trying to diagnose and fix a problem that has been happening since I built this PC back in November. After the PC is idle for some (varying) amount time, the system freezes completely. I can't use the keyboard or mouse, and SysRq magic doesn't work. The display is frozen. I can't SSH in, and no new USB devices get power when I plug them in. But the power is on. The fans are all spinning. LEDs are still lit up, etc. It almost always happens after around 12 hours of idle time, but it's also happened in as little as 45 minutes or as long as five days.

I ran PCMemtest-64 (a fork of memtest86) for 18 passes (100 hours on this system), and it came up with no errors. I tried to run Prime95's small FFTs test to see if it could be a CPU issue, and Prime95 showed that there was a hardware error on Self-test 200K after around five minutes. The results are at https://pastebin.com/Jbpqc0qA .

I mostly see that people are encountering issues with Small FFTs when they are overclocking, but the only things I've changed in BIOS are enabling fTPM, resizable BAR, and hardware virtualization support.

This PC runs Ubuntu 21.10 almost exclusively. I've twice tried to reproduce the problem on Windows 11. The system didn't freeze within 12 hours either time, but I know it sometimes takes longer. I will try testing on Windows again later today.

The journal on Linux doesn't show anything useful. I've tried setting up netconsole and a syslog-ng receiver to possibly catch messages that couldn't be written to the disk due to kernel panic, but the remote logs don't show anything useful either. At this point, I don't know if it's even a software issue.

Could the freezing be caused by the issue detected by Prime95? Is there anything I can do to troubleshoot or fix either issue?

The specs are below.

Specs:

Motherboard
: Asus PRIME X570-PRO
CPU: AMD Ryzen 9 3950X@3.5 GHz
PSU: EVGA SuperNOVA 850 G3 850W
RAM: 2 x Team T-Force Zeus 32 GB DDR4-3200
GPUs: ZOTAC Gaming GeForce GTX 1650 OC 4GB GDDR6, Nvidia GeForce RTX 3080 Ti FE 12GB GDDR6X
SSDs: Samsung 970 EVO Plus 1TB , Samsung 970 EVO Plus 2TB
HDDs: HGST Travelstar 1TB (HTS721010A9), Seagate Barracuda 4TB (ST4000LM024-2AN1)
Operating systems: Ubuntu 21.10 (Impish Indri), Windows 11 Professional 21H2

The BIOS is the most recent (non-beta) version, 4021.

Here's a screenshot from CPU-Z. I've uploaded the detailed report to https://pastebin.com/GKBUzbkQ .

aLnclAN_d.webp
 

fakedad

Distinguished
Jan 20, 2014
11
0
18,510
It turns out there is some problem with Zen 2 processors that can cause systems to freeze when the processor enters the C6 ("deep idle") state. I'm not sure why this sort of thing isn't built into Linux, but C6 can be disabled by building and installing amd-disable-c6. Since I installed that 8 days ago, my system has not hung.

I think the Prime95 errors are an unrelated problem; I'm still not sure why I get errors despite the CPU not being overclocked.
 

TRENDING THREADS