Question System instability (13700k, RTX4080, 1000w)

Aug 1, 2024
5
1
15
Hi guys,

To start off I'm a tech guy myself, so I've done a lot of checks already and I am mainly looking for a second opinion here.

A couple months back I started having an issue with my main rig which has been running beautifully for a year and a half now:

OS: Windows 11
CPU: 13700K
Cooler: Rog Ryuo III 360
RAM: 64GB (2x32GB Gskill Trident Z5 Neo)
Mobo: ROG Maximus Z790 Hero
GPU: MSI Suprim X 4080
SSD1: Firecuda 530 1TB
SSD2: Firecuda 530 4TB
PSU: Seasonic 1000w Prime PX Titanium

The symptom is as follows, randomly - seemingly without any causal factor like load etc, the machine will turn off, and then immediately back on again.

I've ran the usual benchmarks and ran HWINfo monitoring on the machine, PSU readings are all within 100mv of where it should be at all times. Sometimes the machine has crashed during and sometimes it hasn't. Sometimes it has crashed during gaming, and sometimes just sat on desktop.

Temperatures are all where they should be.
Machine has had BIOS updated since (inc the microcode eTVB fix released recently)
BIOS is set to Intel's specifications (enforced)
Windows has been fully re-installed.
I've previously disabled Speedstep/C-states for testing purposes
Event viewer gives no indications other than system had unexpected power loss.

There seems to be no reproducible factors - it's gone over a month before without a crash and then randomly out of the blue happened. There seems to be no commonality with what I'm doing with the machine that is tripping this.

Before I start replacing I'd like to try and find a way to narrow things down.

I suspect the following:
1. Faulty PSU and/or power cables
2. Something is tripping OCP or some mechanism within the PSU meaning it's not faulty.

At first I thought it might be surge protected power brick but no other devices are affected when this happens - I've since ruled it out by initially changing the power plug it was connected to and then the entire power strip to be sure.

My other thought is that somehow it relates to the Intel 13th/14th gen voltage degradation issue but I'm reluctant to just blanket blame it and replace the unit without more evidence of it being the issue.

So, in summary my only thoughts lean to this plan of action:

1. Wait for microcode fix and see if issue continues
2. Replace PSU and power cables if continues
3. Replace CPU if issue continues
4. Replace motherboard if issue continues

I need to know your opinion on this and if you can suggest anything else. It's driving me nuts.
Any advice you can provide would be great
 
Aug 1, 2024
5
1
15
See what your options are regarding replacing cpu under warranty. I'd start there first before looking at replacing anything else.

https://www.theverge.com/2024/8/1/24211616/intel-crashing-13th-14th-gen-cpus-warranty-two-more-years

https://www.reddit.com/r/overclocking/s/s5GmZ9KnbT
Do you think it's the overvoltage issue? Part of me though because my PSU is turning off makes me think that it's protecting my CPU from the over-voltage if thats the case, by shutting off.

Maybe the microcode patch will fix?
 
  • Like
Reactions: artk2219

boju

Titan
Ambassador
Do you think it's the overvoltage issue? Part of me though because my PSU is turning off makes me think that it's protecting my CPU from the over-voltage if thats the case, by shutting off.

Maybe the microcode patch will fix?

What the OP in the Reddit post describes what he believes might be going on with these cpus, it might be possible psu is sensing a short? I'm not sure. Since we're aware of the issue, it's probably a good idea to rule that out first since you've had the cpu for awhile. And he gives his view also what to do, limiting single core boost, if you happen to rma cpu. I think that would be worth exploring rather than solely relying on bios update.
 
  • Like
Reactions: artk2219
I would highly doubt the PSU is the issue. Here is my thinking, if it was the PSU you would be having issues under load consistently, or under little to no load consistently, not this intermittent issue regardless of load. PSUs either don't work or they do. The line at which they work when they have some defect or damage is usually consistent when they are the issue. For instance, at 400w's the PSU causes an unexpected shutdown.

The CPU on the other hand is known to have stability issues right now due to hastened degradation. Intel will cross ship you a replacement with a 25 dollar hold and then refund when they receive your CPU in the mail. If you are still having issues after a CPU RMA I would post back.
 
Aug 1, 2024
5
1
15
Curiously, how much of an OC did you have on the K processor for that year & a half beforehand?
I didn't increase any clocks or voltages etc. I did have it running at Asus specs (multicore enhancement) though not Intels. Infact during summer times I tend to drop the ratio to reduce temps a bit. It's been running at 4.8 all cores for a while. Although I recently realised that switching it to Intels specification has ended up saving me slightly over 10degrees in temps.

I would highly doubt the PSU is the issue. Here is my thinking, if it was the PSU you would be having issues under load consistently, or under little to no load consistently, not this intermittent issue regardless of load. PSUs either don't work or they do. The line at which they work when they have some defect or damage is usually consistent when they are the issue. For instance, at 400w's the PSU causes an unexpected shutdown.
This is exactly what I thought. I would think it would have a more consistent or reliable symptom. But it's completely random, it can be during gaming, minutes after I've quit gaming and returned to desktop doing general browsing tasks/Youtube.

Once it happened twice in a row (again shortly after hitting desktop) but it's so hit and miss as to when. I could experience it once or twice per day for a couple of days and then it could be weeks until the next one.

It is very bizzare but my gut tells me it's not the PSU either, I think it's simply reacting to some other defect and shutting my machine off.
 
Aug 1, 2024
5
1
15
Intel CPU - I'd really really try to test another CPU before anything else. Sound like to me its one of the faulty ones just like my 13900k that I have just recently also RMA'd to Intel.
What were the symptoms? Out of curiosity, as I've seen little as to what the ACTUAL symptoms of the issue are other than "instability"
 

Geoff Leven

Distinguished
Jun 2, 2013
116
3
18,595
I was getting random crashes to the desktop. Sometimes the crashes were so bad no game would run again until I re-installed the nVidia drivers because they became corrupt. Event viewer would always get flooded with big red crosses (critical errors) and yellow exclaimation marks at the same time of the crashes. And I MEAN FLOODED.