News AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
In my original post I said "I don't understand why users overclock".
Not them, but these days it's just e-peen points. Heck, I tune my video cards now because I've noticed they consume a disproportionate amount of power for whatever gain in performance I get. Like I could get about 12-15% more performance out of my card if I didn't kneecap it, but it'd also use 50-60% more power.

I forget which generation, but I felt like there was a time when everyone was buying Intel platforms because they could brag that they have a 5.0GHz system.
 
  • Like
Reactions: Dr3ams
Apr 25, 2023
2
0
10
I like how AMD and motherboard manufacturers are handling this issue to get to the bottom of this while releasing safe bios updates. I think we need more information about the issue happening to some of the standard Ryzen 7000 processors before AMD updates its statement.
 
Last edited:

CeltPC

Distinguished
Jun 8, 2017
75
55
18,610
I checked the Asus website and sure enough a spanking new Bios version 1412 dated today was there - my previous version was 1409. Did my usual flash procedure, enabled the EXPO tweaked and AI overclock settings, and ran some new Cinebench with monitoring from HWiNFO64. Scores seem unaffected. The SOC volts were slightly reduced from around 1.35 volts to 1.335 volts locked in.

Not much of a change from my previous results, but I do feel more comfortable that I won't run into flaming (or bulged) CPU issues. I also appreciate Asus jumping to get the new Bios out quickly.
 
  • Like
Reactions: nerdvous

atomicWAR

Glorious
Ambassador
Wow, what a bunch of whiners here! Intel and AMD both support various overclocking features because a lot of their customers asked them to. But vendors do not deliberately leave performance on the floor for overclockers to pick up, and it's always been the case that your overclocking results will vary, and you stand a decent chance of decreasing the life of your CPU, possibly severely. If the vendors reacted to the complaints about lack of warranty support for advertised overclock features by removing every feature that was possibly unsafe, people would be right back here in the forums whining about that instead.

The simple fact is that CPU vendors do a lot of careful testing to establish safe operating limits for a chip. I've worked on the engineering team for a CPU, and have personal experience with this. If you want warranty support, and don't want to risk your CPU, don't overclock. Simple as that.
It's not as simple as that. When you have AMD saying things like "DDR5 6000 is going to be the sweet spot for performance" it gives people a false sense of security in addition to sounding like that is what AMD is expecting users to run their CPU and ram at.

I have EXPO issues myself though I think I have dialed them in now (should know in a few more days if no crashes occur). I never trust EXPO/XMP settings. They always want to use to much voltage, have their timings too tight or both. Thankfully my second thought was to lower voltages from 1.4 to 1.35 volts (my first was to loosen timings) to fix my issues.

Regardless my larger point is AMDs marketing DDR5 6000 as a sweet spot is problematic if they aren't going to support it.
 
Last edited:

InvalidError

Titan
Moderator
Point being I never trust XMP/EXPO
When I put my i5-11400 together, my DDR4-3200 kit wanted 1.35V in XMP and that seemed excessive to me. I experiment to find the lowest stable voltage and went down to 1.16 or 1.17 so I settled for an even 1.20V. There was no reason whatsoever for my DIMMs to request more than the 1.25V standard DDR4 voltage.

Presets probably shouldn't request voltages higher than stock memory spec since memory manufacturers have no clue what their DIMMs will ultimately get used on. Some platforms may require more voltage to make a given speed stable, some may not gain anything from needlessly bumping operating voltages, and others may fry themselves like what appears to be going on with Zen 4 here.
 

rluker5

Distinguished
Jun 23, 2014
911
594
19,760
That just shows you how much performance manufacturers were leaving on the table in the past.

Now a days, they don't leave that much head room for OC-ing.

So any OC-ing done today is really climbing up that hockey stick for power consumption while having minimal gains in performance.
You are right that there isn't much room left for OC.

My 13900kf gets about 200mhz with a lot of extra power. It is climbing up that hockey stick and probably the most appealing thing is seeing 6.0. My 3080 overclocks infinitesimally. My 6800 does more, but hits a power limit that is a hassle to overcome so still no that much. Even my A750 is more dialed in than older hardware and Intel has little experience in that type of hardware. The most recent stuff I had that did +25% was 4770k and 780ti.

The 13600k seems to be the exception with new stuff since it is intentionally clocked low for product segmentation and will hold nearly the same clocks as my 13900kf. And there are noticeable gains like gaming at the level of a stock 13700k.
I'm not sure what you are getting at. What you stated above is not a reason why, all it says is that you overclocked something.

In my original post I said "I don't understand why users overclock".
1. I enjoy it. 2. I get a bit more performance and/or quieter operation. With the 13600k being a good example. 3. I'm just buying stuff from an e-tailer like everybody else. Feels more like I make it my own if I tune that stuff to my liking. Even if the performance isn't that different. I still wind up liking my toys a bit more. That 6.0 Ghz is an example.
 
  • Like
Reactions: Dr3ams

Joseph_138

Distinguished
I doubt that any motherboard manufacturer is going to continue to allow overvolting beyond the limits specified in AMD's fix. They would be shifting the burden of responsibility for customers burned out CPU's onto themselves, for ignoring AMD's advice not to do it anymore. I can foresee AMD rejecting warranty claims after a certain point, and telling customers to submit claims to their motherboard manufactuer, instead, if they continued to allow overvolting, and they would be right to do so.
 

Joseph_138

Distinguished
Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.

Nah.
You are right that there isn't much room left for OC.

My 13900kf gets about 200mhz with a lot of extra power. It is climbing up that hockey stick and probably the most appealing thing is seeing 6.0. My 3080 overclocks infinitesimally. My 6800 does more, but hits a power limit that is a hassle to overcome so still no that much. Even my A750 is more dialed in than older hardware and Intel has little experience in that type of hardware. The most recent stuff I had that did +25% was 4770k and 780ti.

The 13600k seems to be the exception with new stuff since it is intentionally clocked low for product segmentation and will hold nearly the same clocks as my 13900kf. And there are noticeable gains like gaming at the level of a stock 13700k.

1. I enjoy it. 2. I get a bit more performance and/or quieter operation. With the 13600k being a good example. 3. I'm just buying stuff from an e-tailer like everybody else. Feels more like I make it my own if I tune that stuff to my liking. Even if the performance isn't that different. I still wind up liking my toys a bit more. That 6.0 Ghz is an example.

Then you should accept the risk of damge to your system, that goes with that. Anyone who continues to overvolt their CPU after this, should be prepared to shoulder the financial burden of replacing their CPU, (and possibly their motherboard) and not have their warranty claims honored.

Ryzen 7k already runs on the edge of it's power limits, out of the box.It is essentially a factory overclocked CPU. Nobody should even be attempting to overclock them further.
 
Apr 25, 2023
2
0
10
Seems like this would represent a pretty large attack surface for viruses, especially since the voltage changes can be done real-time in windows. Isn't all this unlocked on all AMD systems, not just EXPO? Could set voltages over 1.3v and poof.
 

Joseph_138

Distinguished
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?
If 95 is the 'normal' operating temp, should you be trying to overclock it at all? It's obviously already at, or near, it's thermal limits. It doesn't take a rocket scientist to understand that a CPU that runs this hot as it's baseline, is going to run much hotter when overclocked.
 
Last edited:

InvalidError

Titan
Moderator
Seems like this would represent a pretty large attack surface for viruses, especially since the voltage changes can be done real-time in windows. Isn't all this unlocked on all AMD systems, not just EXPO? Could set voltages over 1.3v and poof.
If secondary voltages work anything like the core voltages, there should be a platform management micro-controller responsible for managing VRM voltages and any software attempts to change voltages have to go through it. In that case, a firmware update could lock the voltage range down to whatever AMD deems safe enough to allow.
 
  • Like
Reactions: superf1y

tamalero

Distinguished
Oct 25, 2006
1,231
246
19,670

rluker5

Distinguished
Jun 23, 2014
911
594
19,760
Then you should accept the risk of damge to your system, that goes with that. Anyone who continues to overvolt their CPU after this, should be prepared to shoulder the financial burden of replacing their CPU, (and possibly their motherboard) and not have their warranty claims honored.

Ryzen 7k already runs on the edge of it's power limits, out of the box.It is essentially a factory overclocked CPU. Nobody should even be attempting to overclock them further.
I guess. These latest Intel chips seem pretty indestructible though. They will overwhelm the cooler and thermal throttle long before they see voltages that are dangerous to them.
I think I have been running my IMC at 1.45v+ ever since I put in the 13th gen bios. Stock XMP was over 1.5v when I got into Windows. Only noticed because I was checking HWinfo64 because of this article. I just turned it back to 1.3v.
I've accidentally put 1.6v into the cores. But now they all core 1.22, double 1.35v. I would dump in 1.5v for benches if I had a 600w cooler.

The delidded 4770k in my daughter's pc has seen up to 1.55v, but due to the mobo liked about 1.45v best for a 1ghz oc.
It is still running fine. But really 22nm seems like a paper tiger compared to 10nm. That node CPU in a normal mobo/cooler combo is like a paper bag of sleeping squirrels. Give the squirrels enough volts and the paper bag doesn't stand a chance.
 

Vanderlindemedia

Commendable
Jul 15, 2022
132
73
1,660
Auto voltages have bin a problem for the last 20 years. What your bios reports vs what really goes in can be different as up to 40%.

I think this is where the problems are to be found. And people often forget that the ryzen series, dont matter which generation, are sensitive to high voltages and high current, and it is capable of degrading in days.
 
Apr 25, 2023
5
3
15
Problem is most people don't view running DRAM at its intended XMP/EXPO settings as overclocking. It makes no sense for AMD and Intel to introduce standards for automating high-speed memory setup if it isn't covered under warranty. The range of whatever parameters automatic setup can play with should be limited to safe values so people don't automatically fry their components through no real fault of their own.
EXPO is a short form of Extended Profiles For Overclocking. The AMD web page says: "Get easy DDR5 memory overclocking ..." Every tech site I can recall writing about EXPO says it's overclocking. I think AMP makes it clear enough that it's overclocking. Intel has similar language describing XMP.
 
Apr 25, 2023
5
3
15
This is a bad thermal sensor blowing up at potentially as low as 1.35V , overclocking might speed it up but it's pretty safe to say that it could happen on 100% safe settings as well.
Every time you boot up the CPU gets a pretty increased amount of Vcore and that could be enough to fry the sensor over time.

What job did you do on the engineering team to not know something as basic as that?!
Maybe it will eventually be determined that the chip has a design flaw leading to excessive failure rates at stock settings. Good news: people suffering that failure are covered by warranty! If you run your CPU at a higher voltage in an attempt to overclock, the kinds of voltage instabilities you reference will become more severe.

As for my expertise? I've worked on inter-chip SerDes links for memory fabrics, register window fill/spill engines, hardware transactional memory, MESI/MOESI cache coherence protocols, ECC failure correction pipeline stages, and hardware page table caches. While my experience in these areas is a few years old now, I feel quite qualified to comment on the risks one takes when running a VLSI chip out of spec.
 
Apr 25, 2023
5
3
15
...

Regardless my larger point is AMDs marketing DDR5 6000 as a sweet spot is problematic if they aren't going to support it.
I'd certainly agree with that. And it's not just AMD: it's hard to find any tech website that advises people to purchase DDR5-5200, which seems to be as high as is supported without overclocking.
 
  • Like
Reactions: atomicWAR
I doubt that any motherboard manufacturer is going to continue to allow overvolting beyond the limits specified in AMD's fix. They would be shifting the burden of responsibility for customers burned out CPU's onto themselves, for ignoring AMD's advice not to do it anymore. I can foresee AMD rejecting warranty claims after a certain point, and telling customers to submit claims to their motherboard manufactuer, instead, if they continued to allow overvolting, and they would be right to do so.
No, if this turns out to really be a thermal sensor then AMD has no defense, I don't think that they can argue in court that it's reasonable to never get above 1.3V in a system.
They would get a class action lawsuit and it would be a much clearer case than the "is it a core or isn't it a core" class suit that they lost recently.
 
Feb 13, 2023
26
22
35
As someone who has just built a new PC with a 7800X3D, this is very concerning!

I was one of those who didn't really question when I read things like "DDR5 6000 is the sweet spot" or think too hard about what the letters in EXPO stood for. I get that yes, it's overclocking, but the way it is talked about, not only on tech websites and forums but in official manufacturer marketing copy, gives a layman (such as I am) a definite sense of it being right at the bottom of the ladder compared to 'real' overclocking (PBO, manual overclocking etc) .

The fact that you have to enable XMP/EXPO to get your RAM at the speeds advertised on the packaging solidifies the impression that this is standard procedure and as close to safe as overclocking gets. There is even a little AMD EXPO logo on the front of the box for my memory! (Although to be fair, on the back it does say that 'system stability with overclocked memory kits may depend on the capability of the motherboard & CPU'.)

1.35V SoC also seems to be standard on this RAM, as it's listed on the label, and that's the value I'm seeing in HW info with EXPO on the most basic profile. It leaves me with a lot of uncertainty. Is this a dangerous voltage? Opinions seem to be split. What will happen to memory performance if I lower it (or perform a BIOS update that lowers it for me?) I've been experimenting with my MSI mobo's 'High Efficiency' memory feature - is that exacerbating the risk?

Bottom line - am I faced with the dilemma of picking between safety and performance? You can tell me that this is the obvious choice facing anyone who wants to overclock and I'll agree, but as someone who just wants to reach all the numbers this expensive kit was sold to me on, it feels pretty terrible. Caveat emptor, I guess.
 

abufrejoval

Reputable
Jun 19, 2020
614
451
5,260
This is going to be expensive for AMD.

The biggest issue that I see is that it's not doing binary damage to the SoC. It's not "good" or "burnt" but going through various stages of "damaged-but-still-working" and there is no way of knowing, where your personal chip is, unless you buy it new and put it into a system with a known good BIOS.

If I had one of those chips today, I'd be tempted to burn it intentionally in order to get a kown-good replacement instead of living with the incertainty of it failing in days, weeks, months or years and being left without a replacement warranty.

AMD really needs to come out with a Pentium FDIV-bug like lifelong replacement warranty or chaos will ensue. And that essentially means that this "unwise" policy of refusing warranty after PBO activation needs to go, too.

I don't overclock my systems, because I need reliability. Well I do overclock them perhaps just a little bit during "burn-in" to be sure they have a reliability margin and they will be cool enough to work in the heat of summer. And I might overclock them once they go to the kids for gaming.

But of course I activated PBO on my Zen 3 chips. Because I consider PBO a regular feature designed by AMD to self-regulate clocks in a safe manner. And of course I expect them to be replaced when they fail. It's a vendor provided advertised feature not a mechanism to duck out of a legally prescribed warranty!

They either need to make the chips failsafe or be ready to replace them "forever" (no, I wouldn't want a brand new Pentium today.)

I sure hope AMD understands that and reacts accordingly. And I hope they arrange/compensate for the motherboard vendors to do the same.

Imagine Intel hadn't given FDIV pledge...
 
Last edited: