Question Intermittent lock up with DRAM LED lit up ?

David_676

Honorable
Apr 6, 2017
65
0
10,530
Computer Type: Desktop
GPU: EVGA RTX 3080
CPU: Ryzen 9 5950x 16 core 32 Thread
Motherboard: Asus x570 Crosshair VIII Hero (Wi-Fi)
BIOS Version: 4501 04/19/2023
RAM: G.Skill Trident Z Royal 3600 CL14 F4-3600C14D-32GTRG
PSU: Be quiet! Straight Power 11 Platinum 1000W BN644 Fully Modular
Case: Lian Li 011 Dynamic XL
Operating System & Version: WINDOWS 11 Pro
GPU Drivers: GEFORCE GAME READY DRIVER - WHQL Driver Version: 536.67

Description of Original Problem: I built the computer almost 3 years ago and it ran fine for 1.5 years and I have been slowly troubleshooting it ever since. It happens once every 2 or 3 days, sometimes sooner. The computer locks up, screen freezes but stays on, doesn't respond to power button and the PSU switch is the only way to reset it. The debug code sometimes stays AA and sometimes it changes to 00. The yellow DRAM led sometimes lights and sometimes doesn't, when it does it lights slowly from off to a weak light, almost like it has limited power, like a candle in the wind, flickering.

Troubleshooting: I have updated the BIOS, reset BIOS, reinstalled OS and all drivers, RMA'd everything but the CPU. Downclocking the RAM helps a bit but everything I researched says that it should work fine at the 3600 I paid for. Wondering if anyone has seen this before.

As said, I replaced the MB and the PSU so the flickering DRAM light is very confusing, could this be a CPU problem? CPU was my last guess, but unless there is an AMD specific thing I don't know about, it seems this is the last option.

EDIT: I posted the wrong MB, the correct board is the Crosshair VIII not the VII.
 
Last edited:
Usually these kind of issues can be pinned down to the motherboard's BIOS being the culprit or the board itself. You stated being on the latest BIOS for your motherboard did you clear the CMOS after you'd verified that the BIOS was successfully updated?

BIOS Version: 4501 04/19/2023
4501 isn't shown on their support site, meaning it was pulled off the site, if you're BIOS version wasn't a typo. I'd advise on flashing the BIOS to the latest version, then clear CMOS and then go back and see if you're able to get to DDR4-3600MHz via D.O.C.P.

reinstalled OS
Where did you source the installer for the OS?

and all drivers
Manually in an elevated command, i.e; Right click installer>Run as Administrator?
 
Usually these kind of issues can be pinned down to the motherboard's BIOS being the culprit or the board itself. You stated being on the latest BIOS for your motherboard did you clear the CMOS after you'd verified that the BIOS was successfully updated?

BIOS Version: 4501 04/19/2023
4501 isn't shown on their support site, meaning it was pulled off the site, if you're BIOS version wasn't a typo. I'd advise on flashing the BIOS to the latest version, then clear CMOS and then go back and see if you're able to get to DDR4-3600MHz via D.O.C.P.

reinstalled OS
Where did you source the installer for the OS?

and all drivers
Manually in an elevated command, i.e; Right click installer>Run as Administrator?

Yes, there may be a new bios update since then as I have been troubleshooting this for a long time now, but I did clear cmos, and I just put in the new MB 2 days ago, I can update the bios again since it likely didn't ship with new bios but unless there was an update since then specific to this, I don't think that will fix it. The flickering DRAM led is what confuses me the most since that looks entirely like a HW problem, but I have replaced everything but the CPU. I will update the bios and clear CMOS again just because it's a good idea either way.
 
Yes, there may be a new bios update since then as I have been troubleshooting this for a long time now, but I did clear cmos, and I just put in the new MB 2 days ago, I can update the bios again since it likely didn't ship with new bios but unless there was an update since then specific to this, I don't think that will fix it. The flickering DRAM led is what confuses me the most since that looks entirely like a HW problem, but I have replaced everything but the CPU. I will update the bios and clear CMOS again just because it's a good idea either way.
Have you operated it after resetting CMOS and with memory in default settings? If it's flawless like then then the memory is probably borderline stable at 3600. It could be BIOS...or CPU memory controller...or memory itself. Whichever it is, it's important to remember that Ryzen CPU's are tested and warranted only to 3200. And hunting down the "perfect BIOS" seems an exercise in futility to me since each revision can also fix things such as security exploits and windows compatibility.

Something to try is enabling XMP but manually setting the clock speed below 3600: maybe 3400 or even 3200 for instance. See if the trouble light stays out. If it does then almost certainly the memory is borderline stable at 3600 for whatever reason.

If it were my kit I'd increase DRAM voltage as a first step. DDR4 is safe up to 1.5V but unless you know you have Samsung B-die memory I'd not operate above 1.4V 24/7; being CL14/3600 yours quite likely is but know first. Samsung B-die actually like higher voltage and are often times run above 1.5V at higher clocks. Your motherboard should have both a DRAM voltage and DRAM temperature sensor so you can see that it's all good. I run my Samsung B-die at 3600/1.47V and DRAM temp never goes above 40C or so even during memory stress tests.
 
Last edited:
Usually these kind of issues can be pinned down to the motherboard's BIOS being the culprit or the board itself. You stated being on the latest BIOS for your motherboard did you clear the CMOS after you'd verified that the BIOS was successfully updated?

BIOS Version: 4501 04/19/2023
4501 isn't shown on their support site, meaning it was pulled off the site, if you're BIOS version wasn't a typo. I'd advise on flashing the BIOS to the latest version, then clear CMOS and then go back and see if you're able to get to DDR4-3600MHz via D.O.C.P.

reinstalled OS
Where did you source the installer for the OS?

and all drivers
Manually in an elevated command, i.e; Right click installer>Run as Administrator?

oh crap, im sorry it's the Crosshair VIII, I mistyped the board and left out an I. It does show that 4501 is the latest revision.:
https://rog.asus.com/us/motherboards/rog-crosshair/rog-crosshair-viii-hero-wi-fi-model/helpdesk_bios
I'll update my post.

I get my OS right from Microsoft.

I did not reinstall drivers manually, Windows catches most of them upon reinstall and the rest I elevate with the usual UAC prompt when running them.
 
Have you operated it after resetting CMOS and with memory in default settings? If it's flawless like then then the memory is probably borderline stable at 3600. It could be BIOS...or CPU memory controller...or memory itself. Whichever it is, it's important to remember that Ryzen CPU's are tested and warranted only to 3200. And hunting down the "perfect BIOS" seems an exercise in futility to me since each revision can also fix things such as security exploits and windows compatibility.

Something to try is enabling XMP but manually setting the clock speed below 3600: maybe 3400 or even 3200 for instance. See if the trouble light stays out. If it does then almost certainly the memory is borderline stable at 3600 for whatever reason.

If it were my kit I'd increase DRAM voltage as a first step. DDR4 is safe up to 1.5V but unless you know you have Samsung B-die memory I'd not operate above 1.4V 24/7; being CL14/3600 yours quite likely is but know first. Samsung B-die actually like higher voltage and are often times run above 1.5V at higher clocks. Your motherboard should have both a DRAM voltage and DRAM temperature sensor so you can see that it's all good. I run my Samsung B-die at 3600/1.47V and DRAM temp never goes above 40C or so even during memory stress tests.

I have used it, as I said, I have been troubleshooting this for over a year now, since it's intermittent, it takes longer. The computer worked flawlessly for the first year and a half of life at the D.O.C.P. spec of 3600 CL14
I have downclocked to 3200 and it helped, but the reliability report was still showing program crashes that would normally go unnoticed and failed updates. I tried 3600 CL16, 3200 CL14 and CL16. I am currently trying 3600 18-22-22-42. I was so thrown off by the flickering DRAM led that I RMA'd everything but the cpu and I was shocked to see it still happening, I thought for sure it was the MB by the time I got to that. Now I am back to square 1 trying different timings and such. Currently running complete stock default BIOS except the RAM is set to D.O.C.P with 18-22-22-42 timings.

As far as it being B-die, it is listed on the B-die finder:
https://benzhaomin.github.io/bdiefinder/
And I believe I verified it with software as well. Should I try higher voltage then? I am still thrown by the flickering DRAM light tho, again it's fine when booting up, nice and bright, but it starts off and slowly flickers to half life when it freezes. Would low voltage setting cause that? I can't think of anything but the board that would cause it cuz the LED itself should be on a completely different circuit and just get told to light or not. I am obviously wrong on this assumption, but I don't know how they wired it produce that kind of behavior. The voltage for the LED comes directly from the DRAM voltage? I don't know pointless to speculate, but does that change the equation for you at all? Again the same behavior persisted across 2 separate boards.
 
I....
And I believe I verified it with software as well. Should I try higher voltage then? I am still thrown by the flickering DRAM light tho, again it's fine when booting up, nice and bright, but it starts off and slowly flickers to half life when it freezes. Would low voltage setting cause that? I can't think of anything but the board that would cause it cuz the LED itself should be on a completely different circuit and just get told to light or not. I am obviously wrong on this assumption, but I don't know how they wired it produce that kind of behavior. The voltage for the LED comes directly from the DRAM voltage? I don't know pointless to speculate, but does that change the equation for you at all? Again the same behavior persisted across 2 separate boards.
Since you're confident it's B-die totally run it up to 1.5V and test it a while like that. If it stops misbehaving, bring it down a bit at a time. I'd be comfortable around 1.45V although 1.5V is not a problem for it.

And speaking of testing...are you? have you ever? I mean with a memory stress/stability test like Memtest. Even Windows' Memory Diagnostic (type it in the search box and run it) would be a decent test of stability.

EDIT add: oh yes, and I didn't see it anywhere: how many DIMM's are you running? 4 DIMM's are more difficult to get stable at high clocks on boards with a daisy chain DDR4 topology and conversely, 2 DIMM's can also be more difficult to get stable on a board with T-topology. I don't know which your motherboard has, but T-topology is more expensive to implement and a CHVIII is pretty much the top-end board of all AM4 motherboards so maybe...
 
Last edited:
Computer Type: Desktop
GPU: EVGA RTX 3080
CPU: Ryzen 9 5950x 16 core 32 Thread
Motherboard: Asus x570 Crosshair VIII Hero (Wi-Fi)
BIOS Version: 4501 04/19/2023
RAM: G.Skill Trident Z Royal 3600 CL14 F4-3600C14D-32GTRG
PSU: Be quiet! Straight Power 11 Platinum 1000W BN644 Fully Modular
Case: Lian Li 011 Dynamic XL
Operating System & Version: WINDOWS 11 Pro
GPU Drivers: GEFORCE GAME READY DRIVER - WHQL Driver Version: 536.67

Description of Original Problem: I built the computer almost 3 years ago and it ran fine for 1.5 years and I have been slowly troubleshooting it ever since. It happens once every 2 or 3 days, sometimes sooner. The computer locks up, screen freezes but stays on, doesn't respond to power button and the PSU switch is the only way to reset it. The debug code sometimes stays AA and sometimes it changes to 00. The yellow DRAM led sometimes lights and sometimes doesn't, when it does it lights slowly from off to a weak light, almost like it has limited power, like a candle in the wind, flickering.

Troubleshooting: I have updated the BIOS, reset BIOS, reinstalled OS and all drivers, RMA'd everything but the CPU. Downclocking the RAM helps a bit but everything I researched says that it should work fine at the 3600 I paid for. Wondering if anyone has seen this before.

As said, I replaced the MB and the PSU so the flickering DRAM light is very confusing, could this be a CPU problem? CPU was my last guess, but unless there is an AMD specific thing I don't know about, it seems this is the last option.

EDIT: I posted the wrong MB, the correct board is the Crosshair VIII not the VII.
I had such or problems in my old PC.
If the problem persist after all that,then it's almost certainly the GPU.
 
Since you're confident it's B-die totally run it up to 1.5V and test it a while like that. If it stops misbehaving, bring it down a bit at a time. I'd be comfortable around 1.45V although 1.5V is not a problem for it.

And speaking of testing...are you? have you ever? I mean with a memory stress/stability test like Memtest. Even Windows' Memory Diagnostic (type it in the search box and run it) would be a decent test of stability.

I did run Memtest but it didn't reliably produce the problem, I ran it 4 times in a row and it only had the problem once. It is currently set to 1.45v as that is what the D.O.C.P. sets it at. I will do a bit more testing with different voltages. It's a pain because stress testing doesn't cause the problem, it really is intermittent and not related to stress at all and most of the time it happens when the computer is idle or close to it.
 
I did run Memtest but it didn't reliably produce the problem, I ran it 4 times in a row and it only had the problem once....
Once is all you need. It's not a stable memory configuration.

I added a note in the above...repeated here because CTRL-C/CTRL-V is so easy to do:

I didn't see it anywhere: how many DIMM's are you running? 4 DIMM's are more difficult to get stable at high clocks on boards with a daisy chain DDR4 topology and conversely, 2 DIMM's can also be more difficult to get stable on a board with T-topology. I don't know which your motherboard has, but T-topology is more expensive to implement and a CHVIII is pretty much the top-end board of all AM4 motherboards so maybe...

EDIT: and oh yeah: more voltage. Give 1.5V a shot, B-die is good for it. If it works well, lower to 1.495, 1.49 and so forth. And you can TRY to RMA it but remember, the CPU's warranted only to 3200 speed. So if you say "it's not stable with my 3600 kit". Well you can try but you are overclocking.
 
Last edited:
Once is all you need. It's not a stable memory configuration.

I added a note in the above...repeated here because CTRL-C/CTRL-V is so easy to do:

I didn't see it anywhere: how many DIMM's are you running? 4 DIMM's are more difficult to get stable at high clocks on boards with a daisy chain DDR4 topology and conversely, 2 DIMM's can also be more difficult to get stable on a board with T-topology. I don't know which your motherboard has, but T-topology is more expensive to implement and a CHVIII is pretty much the top-end board of all AM4 motherboards so maybe...

EDIT: and oh yeah: more voltage. Give 1.5V a shot, B-die is good for it. If it works well, lower to 1.495, 1.49 and so forth. And you can TRY to RMA it but remember, the CPU's warranted only to 3200 speed. So if you say "it's not stable with my 3600 kit", well you can try.
Ah sorry I didn't see the edit up there, I am running 2 DIMM's, yeah I went with really high end components so that's why I really don't want to have to run at stock speeds, but hopefully just fiddling with the numbers a bit can help sort it out.

Yeah I figured they only warrant for a certain amount, every other company hasn't even asked tho lol I think they just figured you paid a ton for it so if it's not doing what it's suppose to do then we'll replace it. I'll do some more testing first tho, if it's a matter of 14CL vs 18CL or 1.46v instead of 1.45v then I'll just accept that. It's just a matter of finding that point where it is for sure stable. I appreciate the advice, I was kinda scared to mess with voltages but I feel a bit better about it now.

Just to be sure tho, if it runs great at say 1.49, is it safe to leave it there forever? or should be setting everything to base when it's not getting used? I tend to leave my PC on 24/7.
 
....Just to be sure tho, if it runs great at say 1.49, is it safe to leave it there forever....
I know several people running their B-die at 1.55V and have been since not long after 3000 series CPU's came out. I'm running mine at 1.47V and have been about as long. I've read some say it's safe even up to 1.65V, but I consider that kind of apocryphal. Just watch the temperature at first, if memory is not getting really hot you'll be fine. I feel certain GSkill has one or more temperature sensors on each DIMM since there is one on my FlareX.

It might be worth your while to ask around to see if a CHVIII has T-Topology memory routing. If it does, then getting 2 DIMM's stable is more difficult so that would explain a lot. It worked before but probably with very little margin. But components age, it's inevitable, and with so little margin it went unstable after a while. Ironicaly, running the same setup on a cheaper daisy chain motherboard it might be stable.
 
Last edited:
I know several people running their B-die at 1.55V and have been since not long after 3000 series CPU's came out. I'm running mine at 1.47V and have been about as long. I've read some say it's safe even up to 1.65V, but I consider that kind of apocryphal. Just watch the temperature at first, if memory is not getting really hot you'll be fine. I feel certain GSkill has one or more temperature sensors on each DIMM since there is one on my FlareX.

It might be worth your while to ask around to see if a CHVIII has T-Topology memory routing. If it does, then getting 2 DIMM's stable is more difficult so that would explain a lot. It worked before but probably with very little margin. But components age, it's inevitable, and with so little margin it went unstable after a while. Ironicaly, running the same setup on a cheaper daisy chain motherboard it might be stable.

I found a spreadsheet that says it is daisy chain, kinda surprises me but it looks like most of ASUS x570 boards are daisy chain, does that mean upgrading to 4x will be a struggle if I ever wanted to do that in the future?
 
... upgrading to 4x will be a struggle if I ever wanted to do that in the future?
It almost definitely will be if adding another 2 DIMM kit. You need to get a matched 4 DIMM kit for best results.

But going to 4 DIMM's is usually a necessity only for an extreme memory demanding use case. When running out of memory Windows will start using virtual memory that is several orders of magnitude slower than even base-speed DDR4, 2133. So memory clock speed becomes a willing sacrifice if you need the memory capacity.
 
It almost definitely will be if adding another 2 DIMM kit. You need to get a matched 4 DIMM kit for best results.

But going to 4 DIMM's is usually a necessity only for an extreme memory demanding use case. When running out of memory Windows will start using virtual memory that is several orders of magnitude slower than even base-speed DDR4, 2133. So memory clock speed becomes a willing sacrifice if you need the memory capacity.

Right, I want to avoid virtual memory as best as I can, I haven't run into any problems with limited memory at the moment, but who knows in the future, I may get into 3D modeling or Video editing which was the reason for the 5950x, I wanted that workstation power with the ability to game as well. I appreciate the advice, I will tinker with it and hopefully come up with something stable.
 
If you intend using Adobe Premiere Pro for video editing, check out the Puget Systems web site, where they have a benchmark test showing minimal gains when overclocking RAM.

It's pointless trying to squeeze another 3 to 5% out of your system RAM, when much of the acceleration comes from OpenCL or OpenGL on your graphics card and not from the CPU.

I'm running 2 X 32GB (64BG total) DDR5 at only 4,800MT/s on my 7950X video rig for stability. The last thing I want is a system crash when I'm nine hours into a video render.

You seem to be spending enormous amounts of time and effort trying to overclock your RAM, instead of just using the machine at a slightly lower XMP. I enjoy overclocking CPUs and RAM on many machines, but I aim for stability, not a total can of worms. If you need more speed, buy a new system.

If you intend to devote your machine to video editing, consider changing from Gaming drivers to Studio drivers for your GPU. They're supposedly more stable.
 
If you intend using Adobe Premiere Pro for video editing, check out the Puget Systems web site, where they have a benchmark test showing minimal gains when overclocking RAM.

It's pointless trying to squeeze another 3 to 5% out of your system RAM, when much of the acceleration comes from OpenCL or OpenGL on your graphics card and not from the CPU.

I'm running 2 X 32GB (64BG total) DDR5 at only 4,800MT/s on my 7950X video rig for stability. The last thing I want is a system crash when I'm nine hours into a video render.

You seem to be spending enormous amounts of time and effort trying to overclock your RAM, instead of just using the machine at a slightly lower XMP. I enjoy overclocking CPUs and RAM on many machines, but I aim for stability, not a total can of worms. If you need more speed, buy a new system.

If you intend to devote your machine to video editing, consider changing from Gaming drivers to Studio drivers for your GPU. They're supposedly more stable.

Yeah, I understand that, my goal isn't to overclock as much as possible, I just wanted to get the 3600 that I paid for which is not an extreme overclock at all and from what I have read is the sweetspot for my CPU, so it should have no problems with it.

My main worry was the flickering DRAM led which appeared to be a hardware issue and not an overclocking issue. If I know for certain that the issue is overclocking then I will likely accept the lower speed, but if it still has issues at 3200 then I won't accept that because that is the 5950's upper limit and if it wont run at that speed then I paid a whole lot of money for something that much cheaper hardware could easily achieve.

I'm really not trying to nit pick, I realize that it doesn't amount to a big difference in real life scenarios, I just expected these expensive components to do a simple overclock that they should all easily do. And the system ran flawless for a year and a half so the capability is there. I just need to figure out what happened and that is the point here, trying to troubleshoot to see if the CPU has degraded (which seems to be the case) and if it has degraded enough to RMA it.

In my opinion, 3600 isn't an extreme overclock from the research I did before buying it, and I paid for more expensive components for stability at a small OC rather than pushing OC as high as I can. That is my opinion tho, I could be wrong, but I think most ppl would agree that this shouldn't be a problem with these components.

As far as the use case goes, when I asked about upgrading the RAM I meant size rather than speed, 64 vs 32 GB. I do understand that OC has very little gains that wasn't really my goal, as I said, everything I read said that 3600 was the best speed so that is what I got. Gaming is it's primary purpose at the moment, but I paid for the ability to do workstation loads just so I have that option. I will absolutely reconfigure for
workstation when that happens, but first I need stability.

The one thing I overlooked was the timings, CL14 may be too aggressive for this CPU. The RAM listed my system on it's QVL so I assumed it was fine, but the MB didn't list the RAM on it's QVL. I tried lowering the timings and still had problems so I moved past that. Again the flickering LED made me thing hardware so I didn't spend a whole lot of time trying different timings and voltages, which I am currently doing.

The main question for this post was, could a bad CPU cause the flickering LED. None of the feedback I got had an answer for that, but I did get other feedback about the RAM timings and voltages, so that is what I am trying. I appreciate your feedback as well, I have built 5 PC's but they were all intel and never had any issues at all, so I have a decent knowledge base, but I have never seen this before, so I am looking for as much info as I can get. I don't know what I don't know, so I'll take any advice I can get.
 
If you've been running PBO on the 5950X I have a sneaking suspicion that invalidates your AMD warranty. How they can tell you've enabled PBO I don't know, but it's worth checking the small print.

From what I've been reading online, AMD systems are more fussy about DDR5 than Intel. I never take it for granted that all combinations of Intel or AMD processor, plus mobo and RAM will be 100% compatible, even if they're listed in the QVL.

I had to reduce the 25% overclock on an AMD FM2 processor after a few years when it became unstable. I suspect electro-migration had set in. The same might be true for your 5950X.

When XMP fails, I relax the CL (CAS) value by several steps and sometimes stability returns. By the sounds of it you've already tweaked CL14. If you've only tried CL15, try CL16 or 17.

If your video editing is 1080p, 32GB RAM should be enough. For 4K fit 64GB. For 6K/8K fit 128GB. Most of the video rendering apps I use do not run the CPU at 100%. 60 to 80% is more typical. However, my GPU is pinned to the end stop for hours on end, so GPU stability is vital. Handbrake and WinRAR push the CPU hard.

Two DIMMs will usually be more stable than four DIMMs, especially if you intend to overclock with XMP. The general (cautious) advice seems to be "don't mix pairs of DIMMs, even if they have the same part number". If you do decide to add two more DDR5 DIMMs, forget thoughts of high XMP speeds. A matched kit of 4 is better than two pairs.

Good luck.
 
If you've been running PBO on the 5950X I have a sneaking suspicion that invalidates your AMD warranty. How they can tell you've enabled PBO I don't know, but it's worth checking the small print.

From what I've been reading online, AMD systems are more fussy about DDR5 than Intel. I never take it for granted that all combinations of Intel or AMD processor, plus mobo and RAM will be 100% compatible, even if they're listed in the QVL.

I had to reduce the 25% overclock on an AMD FM2 processor after a few years when it became unstable. I suspect electro-migration had set in. The same might be true for your 5950X.

When XMP fails, I relax the CL (CAS) value by several steps and sometimes stability returns. By the sounds of it you've already tweaked CL14. If you've only tried CL15, try CL16 or 17.

If your video editing is 1080p, 32GB RAM should be enough. For 4K fit 64GB. For 6K/8K fit 128GB. Most of the video rendering apps I use do not run the CPU at 100%. 60 to 80% is more typical. However, my GPU is pinned to the end stop for hours on end, so GPU stability is vital. Handbrake and WinRAR push the CPU hard.

Two DIMMs will usually be more stable than four DIMMs, especially if you intend to overclock with XMP. The general (cautious) advice seems to be "don't mix pairs of DIMMs, even if they have the same part number". If you do decide to add two more DDR5 DIMMs, forget thoughts of high XMP speeds. A matched kit of 4 is better than two pairs.

Good luck.

Yeah, the system still locks up without PBO but it I have tried it. I don't think they can prove that tho. This is DDR4 but I think what you said still applies, I probably should have just went intel, but it's too late now unfortunately.

I may have went a little too ambitious with the RAM at CL14, it ran fine for a year and a half but maybe that made it degrade faster? I just tried 18-22-22-42 and it failed. I tried adjusting the voltage to 1.495 as was suggested since it's B-die RAM. That made it lock up on boot after posting. And now it won't boot at the 1.45 that was listed on the XMP. It appears that voltage is the problem and I made it worse. The problem persisted through RAM and MB swap tho, so that means it must be CPU hardware if it got worse. Likely
just wear-in and degrading, but I think that confirms that, it is hardware and it's CPU specifically. Right?

Now it's just a matter of trying to get AMD to RMA it. Which if I tell them I never enabled XMP and act like I don't even know what that is then they should accept it. From what I understand there is no way they can tell that it was OC'd.
 
If they de-lid the CPU and scan the dies with an electron microscope (costly) they'll be able to see the damage. Don't assume there aren't methods to detect over enthusiastic overclocking, outside the warranty.

Although people are merrily telling you it's OK to apply up to 1.55V (for example) to a particular type of memory chip, have you considered the effect of such high voltages on the integrated memory controllers in the CPU. You may have damaged not only the RAM but also the CPU with the overclock and overvoltage.

If you want long term reliability (5 years+) it's much safer to run a stock speeds. Think of it like a top fuel dragster. If you want insane power levels out of a V8, you pay the price with short component life. Admittedly, you won't have rebuild your PC after each quarter mile run like a nitro methane engine, but you might have paid the price for 18-months of go faster operation.

I doubt it would make much difference if you switched to Intel. The 13900K runs even hotter than the 7959X. Abusing silicon too far eventually ends in tears. At least you had fun while it lasted.
 
As an update, AMD did RMA the CPU and it seems
If they de-lid the CPU and scan the dies with an electron microscope (costly) they'll be able to see the damage. Don't assume there aren't methods to detect over enthusiastic overclocking, outside the warranty.

Although people are merrily telling you it's OK to apply up to 1.55V (for example) to a particular type of memory chip, have you considered the effect of such high voltages on the integrated memory controllers in the CPU. You may have damaged not only the RAM but also the CPU with the overclock and overvoltage.

If you want long term reliability (5 years+) it's much safer to run a stock speeds. Think of it like a top fuel dragster. If you want insane power levels out of a V8, you pay the price with short component life. Admittedly, you won't have rebuild your PC after each quarter mile run like a nitro methane engine, but you might have paid the price for 18-months of go faster operation.

I doubt it would make much difference if you switched to Intel. The 13900K runs even hotter than the 7959X. Abusing silicon too far eventually ends in tears. At least you had fun while it lasted.

It's been a little over a week but they RMA'd the CPU and it's running fine now, I believe you are correct about the voltages on the MC... I really don't feel like 3600 is a big overclock and shouldn't need the 1.45V that the xmp applies, I have it running at 1.4V right now and haven't seen any issues, but do you know if anyone has found the safe level for the MC? is 1.4V still too high for long term stability?
 
2 different voltages. One doesn't apply to the other. Ram voltage is only used by the ram, MC voltage by the MC. It's like you going for a walk, you use a certain amount of energy to move your arms, pump your legs, if you run, you require more energy use, but that has zero affect on the amount of energy required by the monitors in your cellphone to keep track of heartbeats, gps, speed etc. The ram requires a certain amount of voltage to open doors, sort, shunt and whatever else it does, when you apply a higher speed, it often requires more voltage to do so, and remain stable.