Question System instability (13700k, RTX4080, 1000w)

Aug 1, 2024
2
0
10
Hi guys,

To start off I'm a tech guy myself, so I've done a lot of checks already and I am mainly looking for a second opinion here.

A couple months back I started having an issue with my main rig which has been running beautifully for a year and a half now:

OS: Windows 11
CPU: 13700K
Cooler: Rog Ryuo III 360
RAM: 64GB (2x32GB Gskill Trident Z5 Neo)
Mobo: ROG Maximus Z790 Hero
GPU: MSI Suprim X 4080
SSD1: Firecuda 530 1TB
SSD2: Firecuda 530 4TB
PSU: Seasonic 1000w Prime PX Titanium

The symptom is as follows, randomly - seemingly without any causal factor like load etc, the machine will turn off, and then immediately back on again.

I've ran the usual benchmarks and ran HWINfo monitoring on the machine, PSU readings are all within 100mv of where it should be at all times. Sometimes the machine has crashed during and sometimes it hasn't. Sometimes it has crashed during gaming, and sometimes just sat on desktop.

Temperatures are all where they should be.
Machine has had BIOS updated since (inc the microcode eTVB fix released recently)
BIOS is set to Intel's specifications (enforced)
Windows has been fully re-installed.
I've previously disabled Speedstep/C-states for testing purposes
Event viewer gives no indications other than system had unexpected power loss.

There seems to be no reproducible factors - it's gone over a month before without a crash and then randomly out of the blue happened. There seems to be no commonality with what I'm doing with the machine that is tripping this.

Before I start replacing I'd like to try and find a way to narrow things down.

I suspect the following:
1. Faulty PSU and/or power cables
2. Something is tripping OCP or some mechanism within the PSU meaning it's not faulty.

At first I thought it might be surge protected power brick but no other devices are affected when this happens - I've since ruled it out by initially changing the power plug it was connected to and then the entire power strip to be sure.

My other thought is that somehow it relates to the Intel 13th/14th gen voltage degradation issue but I'm reluctant to just blanket blame it and replace the unit without more evidence of it being the issue.

So, in summary my only thoughts lean to this plan of action:

1. Wait for microcode fix and see if issue continues
2. Replace PSU and power cables if continues
3. Replace CPU if issue continues
4. Replace motherboard if issue continues

I need to know your opinion on this and if you can suggest anything else. It's driving me nuts.
Any advice you can provide would be great