News AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

A little typo, DDR4 -6000 is a bit crazy for DDR4 lol.

The problems you have to face when adopting a brand new platforms, hope them people get replacements, I know Asus will probably try everything they can to get out of replacing broken stuff.
 
Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.

Nah.
You should always add the corresponding quote since most people never read the aritcle itself.
Yeah, the thermal sensor burns out and the CPU just keeps running willy nilly.
Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature.
 
As with all forms of overclocking, any damage from using an EXPO overclocking profile is not covered by your warranty, but given the situation, we don't think that AMD or the motherboard vendors would use the lack of warrantied EXPO support to invalidate warranties.

I like that part. A great way to not lose this customer.
 
  • Like
Reactions: atomicWAR
So new general rules for Ryzen 7000 series SoC Voltage.
1.25V is the "recommended safe SoC voltage limit".
1.35V "appears to be safe."
1.40V and beyond "definitely increases the likelihood" of the Burn-Out condition occurring

So if you're OC-ing past 1.25V, make sure your voltage is < 1.40 V

1.35V "appears to be 'Safe'" for the CPU's, anything between:
"> 1.35 V" & "< 1.40 V" has some danger factor to "Burning-Out" your CPU.
 
Interesting.

I have issues with EXPO and sleep, so I only turn EXPO on if I'm going to play a game, which happens quite seldom.

But I do use Eco mode, EXPO or not, so I'm never running at the thermal limits. In fact, I don't think I ever exceed 75C even at the highest load.

And if the problem is overheating, then Eco mode should be able to help, right? Somehow only safe and unsafe voltages are discussed, but not thermals and Eco mode.
 
it's a lesson for AMD and Partner to conduct more deep testing when launching brand new platform, including overclocking possibility and limit (both memory and CPU). to ensure safety to their customer

ppl who expeirence such issue, might never back, if it were me I'll never touch their product again for a long time because it's really painful and discouraging to have something burn, even if they replace it for free.
 
Sounds like a very plausible root cause, so I hope AMD and motherboard vendors actually DO TALK now.

Talk about growing pains, oof.

And I agree: I hope they realize that the "on paper" restriction of EXPO/XMP invalidating warranties is stupid. If you won't warranty it, then don't advertise it as part of the platform, you stupid people from marketing.

That also begs the question: can we start talking about not using EXPO/XMP going forward? Not even advertising using higher clocked kits, unless it's for OC investigations and always remind people it will void their warranty. Until both AMD and Intel stop being stupid about it.

Regards.
 
So this is a widespread problem with the entire 7000 series which I suspected. Sure am glad I didn’t buy one of this series

Inadequate testing is one hypothesis

I’ve used AMD processors all my life and career and still that’s all I buy, but this seems really super sloppy, and I may have to reconsider my purchase decisions from now on. I agree that if a sensor stops reporting data, the CPU should shut itself down for safety reasons and report an error. This is unacceptable performance from AMD. I will never recommend this series of processor to anybody for any reason.

For the affected people they should replace the CPU with one that doesn’t have these problems and also reimburse them for their motherboard and anything else that got damaged. It’s the least they could do.
 
Last edited by a moderator:
I have some pretty mundane GSkill 6400c32 Hynix m-die DDR5 and the XMP profile, motherboard combo sets some of the voltages to over 1.4v (VDD,VDDQ=1.41) Agent=1.233, mem controller = 1.312. (Intel chip that takes a lot of voltage if I don't limit it.)

EDIT: My Z690 P bios said 1.312 on mem controller, but HWinfo64 said 1.456 when I got into windows. I set it to auto and it went up to 1.506 in windows. I turned it down to 1.406 in windows for now. Not having problems, but I don't want degredation.


I think that some of the Expo presets with motherboard adjustments may go too high of volts. It would be good to check and remember that timings may have to be loosened with decreased volts if that is necessary.

Losing thermal sensors on an arch that is designed to throttle power according to those thermal sensors and clearly will exceed cooling in some scenarios otherwise is not something you want to have happen.

I know my 13900kf could get pretty hot if the thermal sensors didn't tell the fans to cool it.
 
Last edited:
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?
 
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?
That is actually a very good point... I would like to believe they're not linked, but I can't help but align myself with your thoughts in that regard.

I've been very vocal about not liking the "this is fine" mentality with the over 90°c operating temps on consumer-grade CPUs being "normal", so it would be interesting to check if those higher operating temps would make the situation worse. I have the feeling it wouldn't when the voltage is well within safe margins, but if the stupid IHS was thinner, this may have been avoided? Perhaps?

There may be correlation, but proving a causation it's a different topic altogether.

Regards.
 
So this is a widespread problem with the entire 7000 series which I suspected. Sure am glad I didn’t buy one of this series

Inadequate testing is one hypothesis

I’ve used AMD processors all my life and career and still that’s all I buy, but this seems really super sloppy, and I may have to reconsider my purchase decisions from now on. I agree that if a sensor stops reporting data, the CPU should shut itself down for safety reasons and report an error. This is unacceptable performance from AMD. I will never recommend this series of processor to anybody for any reason.

For the affected people they should replace the CPU with one that doesn’t have these problems and also reimburse them for their motherboard and anything else that got damaged. It’s the least they could do.
Shutting down and giving an error is not really much better because you would still have an unusable system.
It should fall back to base 100% safe settings so you have at least a working system even if it is a bit slower.
 
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?
The temp itself is not the main issue, the lack of temp reporting just makes it go into a feedback loop with ever increasing Voltages/clocks and that over voltage is what is causing the blow ups.
Max allowed temp at max allowed vcore should be fine for many years because the allowed limit is a bunch below the real limits.
 
I don't understand why users want to overclock their hardware. If I want something to run at a certain speed, then I purchase the hardware that does what I want.

Also...if AMD wants a high performance CPU that natively runs at higher temps, then invent one. Don't tease users into thinking that it's OK to run a processor (designed to run safely below 90 degrees) over the specified limits and then leave it there.
 
Last edited:
Ouch. That's gonna hurt.... but wait... what??? Using EXPO voids the warranty?!?! What kind of absolute nonsense is that? AMD had a hand in creating that as an answer to Intel's XMP, so they had better own
up to it and get with the memory vendors to fix their profiles or find another way to protect their
sensitive processors ASAP. But you can't void warranties on something you yourself designed and blessed.
No, that's just wrong. Everyone uses XMP and EXPO. Fix it.
 
I don't understand why users want to overclock their hardware. If I want something to run at a certain speed, then I purchase the hardware that does what I want.

Also...if AMD wants a high performance CPU that natively runs at higher temps, then invent one. Don't tease users into thinking that it's OK to run a processor (designed to run safely below 90 degrees) over the specified limits and then leave it there.
Your opinion is wrong.
 
Ouch. That's gonna hurt.... but wait... what??? Using EXPO voids the warranty?!?! What kind of absolute nonsense is that? AMD had a hand in creating that as an answer to Intel's XMP, so they had better own
up to it and get with the memory vendors to fix their profiles or find another way to protect their
sensitive processors ASAP. But you can't void warranties on something you yourself designed and blessed.
No, that's just wrong. Everyone uses XMP and EXPO. Fix it.
XMP also voids warranty, anything above stock does so for both amd and intel, and nvidia and anybody else as well.
 
  • Like
Reactions: King_V
And I agree: I hope they realize that the "on paper" restriction of EXPO/XMP invalidating warranties is stupid. If you won't warranty it, then don't advertise it as part of the platform, you stupid people from marketing.
I agree. Advertising a feature that voids the warranty makes no sense. Either limit a "standard" feature's scope and range to something covered under warranty or don't implement it at all.
 
Buildzoid does not think it's the SOC. Theres simply not enough amperage on that rail to cause the kind of damage we're seeing here. It's all speculation at this point, but I'm sure someone will get to the bottom of it soon
Traces within the chip are only a few nm apart. A catastrophic failure on stuff powered by Vsoc could cause a short on stuff powered by Vcore and things go downhill from there.