Question System instability: Random errors proportional to CPU use - New AM5 build

Oct 15, 2023
2
0
10
Hello, everyone

Recently I made my first build, an AM5. Now I think I didn't commit any rookie mistake: RAM is installed according to manual, stock BIOS settings, no overclocks and up-to-date BIOS. POSTs and installs the OS / runs fine for the most part. Specs below:

CPUAMD Ryzen 9 7900X
GPUMSI NVIDIA GeForce RTX 3060 Ti
DIMMs2x16GB DDR5 Kingston CL32 6000Mhz Fury Renegade Silver (From Mobo Ramlist)
MotherboardROG STRIX B650-A Gaming Wifi
PSUEVGA 750W Gold
SSDM.2 2280 WD Blue SN570 500GB
OSArch Linux
BootEFI GRUB

So what's the issue? Why am I making this thread? Games. Or rather, games seem to expose an issue of this build - random crashes of programs when under high CPU-load.
I have run GPU benchmarks which perform quite well so this is not an issue of the GPU. RAM performs flawlessly on memtest (stock settings) so this should not be the issue as well.
As the crashes also happen with my browser and other applications over a long enough time, and given how crash frequency seems to be proportional to CPU use in my experience, I am guessing that this is indeed random - and more CPU use simply means that this random issue happens faster.

How do I troubleshoot this? Do you have any guesses?
 
Hi, and thanks for the help.

I am using the DIMMs in slots A2 and B2 (as recommended in the B650-A manual).
The BIOS is at the most recent version (1807, released 2023/10/11) updated through BIOS flashback.
This issue existed before me updating the BIOS so I don't think it is to blame.

For the CPU cooler I'm using the Cooler Master 212, CPU temp monitor shows it stable at 40ºC when idling. lm-sensors only gives me a few sensors, here is a composite with the all-time max values for everything:
amdgpu-pci-0c00
Adapter: PCI adapter
vddnb: 1.01 V
vddgfx: 1.47 V
edge: +45.0°C
PPT: 65.11 W

nvme-pci-0200
Adapter: PCI adapter
Composite: +41.9°C (low = -5.2°C, high = +79.8°C)
(crit = +84.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +69.4°C
Tccd1: +76.0°C
Tccd1: +74.8°C


Not sure what the k10temp-pci-00c3 ones are but the values only get this high when initially loading the game assets. Gameplay has much more normal values so I don't think overheating is the cause: Around 63ºC for Tctl and either 60 or 40ºC for each of the Tccd's.

Hope this can help :)