News AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

Admin · Apr 25, 2023

We reached out to industry contacts to find out why AMD's Ryzen 7000 processors are burning out.

AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame : Read more

Viking2121 · Apr 25, 2023

A little typo, DDR4 -6000 is a bit crazy for DDR4 lol.

The problems you have to face when adopting a brand new platforms, hope them people get replacements, I know Asus will probably try everything they can to get out of replacing broken stuff.

jkflipflop98 · Apr 25, 2023

Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.

Nah.

TerryLaze · Apr 25, 2023

jkflipflop98 said:
Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.

Nah.

You should always add the corresponding quote since most people never read the aritcle itself.
Yeah, the thermal sensor burns out and the CPU just keeps running willy nilly.

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature.

Deleted member 2838871 · Apr 25, 2023

As with all forms of overclocking, any damage from using an EXPO overclocking profile is not covered by your warranty, but given the situation, we don't think that AMD or the motherboard vendors would use the lack of warrantied EXPO support to invalidate warranties.

I like that part. A great way to not lose this customer.

Kamen Rider Blade · Apr 25, 2023

So new general rules for Ryzen 7000 series SoC Voltage.
1.25V is the "recommended safe SoC voltage limit".
1.35V "appears to be safe."
1.40V and beyond "definitely increases the likelihood" of the Burn-Out condition occurring

So if you're OC-ing past 1.25V, make sure your voltage is < 1.40 V

1.35V "appears to be 'Safe'" for the CPU's, anything between:
"> 1.35 V" & "< 1.40 V" has some danger factor to "Burning-Out" your CPU.

Sergei Tachenov · Apr 25, 2023

Interesting.

I have issues with EXPO and sleep, so I only turn EXPO on if I'm going to play a game, which happens quite seldom.

But I do use Eco mode, EXPO or not, so I'm never running at the thermal limits. In fact, I don't think I ever exceed 75C even at the highest load.

And if the problem is overheating, then Eco mode should be able to help, right? Somehow only safe and unsafe voltages are discussed, but not thermals and Eco mode.

heickel.ramadhan · Apr 25, 2023

it's a lesson for AMD and Partner to conduct more deep testing when launching brand new platform, including overclocking possibility and limit (both memory and CPU). to ensure safety to their customer

ppl who expeirence such issue, might never back, if it were me I'll never touch their product again for a long time because it's really painful and discouraging to have something burn, even if they replace it for free.

TerryLaze · Apr 25, 2023

Sergei Tachenov said:
But I do use Eco mode, EXPO or not, so I'm never running at the thermal limits. In fact, I don't think I ever exceed 75C even at the highest load.

The problem is that the thermal sensor blows up which means that if it happens it will keep showing you the same temp until the whole CPU blows up because it won't get any new update on the real temp.

-Fran- · Apr 25, 2023

Sounds like a very plausible root cause, so I hope AMD and motherboard vendors actually DO TALK now.

Talk about growing pains, oof.

And I agree: I hope they realize that the "on paper" restriction of EXPO/XMP invalidating warranties is stupid. If you won't warranty it, then don't advertise it as part of the platform, you stupid people from marketing.

That also begs the question: can we start talking about not using EXPO/XMP going forward? Not even advertising using higher clocked kits, unless it's for OC investigations and always remind people it will void their warranty. Until both AMD and Intel stop being stupid about it.

Regards.

Deleted member 14196 · Apr 25, 2023

So this is a widespread problem with the entire 7000 series which I suspected. Sure am glad I didn’t buy one of this series

Inadequate testing is one hypothesis

I’ve used AMD processors all my life and career and still that’s all I buy, but this seems really super sloppy, and I may have to reconsider my purchase decisions from now on. I agree that if a sensor stops reporting data, the CPU should shut itself down for safety reasons and report an error. This is unacceptable performance from AMD. I will never recommend this series of processor to anybody for any reason.

For the affected people they should replace the CPU with one that doesn’t have these problems and also reimburse them for their motherboard and anything else that got damaged. It’s the least they could do.

bolweval · Apr 25, 2023

"it's better to burn out than fade away"

said some guy once.

rluker5 · Apr 25, 2023

I have some pretty mundane GSkill 6400c32 Hynix m-die DDR5 and the XMP profile, motherboard combo sets some of the voltages to over 1.4v (VDD,VDDQ=1.41) Agent=1.233, mem controller = 1.312. (Intel chip that takes a lot of voltage if I don't limit it.)

EDIT: My Z690 P bios said 1.312 on mem controller, but HWinfo64 said 1.456 when I got into windows. I set it to auto and it went up to 1.506 in windows. I turned it down to 1.406 in windows for now. Not having problems, but I don't want degredation.

I think that some of the Expo presets with motherboard adjustments may go too high of volts. It would be good to check and remember that timings may have to be loosened with decreased volts if that is necessary.

Losing thermal sensors on an arch that is designed to throttle power according to those thermal sensors and clearly will exceed cooling in some scenarios otherwise is not something you want to have happen.

I know my 13900kf could get pretty hot if the thermal sensors didn't tell the fans to cool it.

Math Geek · Apr 25, 2023

i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?

-Fran- · Apr 25, 2023

Math Geek said:
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?

That is actually a very good point... I would like to believe they're not linked, but I can't help but align myself with your thoughts in that regard.

I've been very vocal about not liking the "this is fine" mentality with the over 90°c operating temps on consumer-grade CPUs being "normal", so it would be interesting to check if those higher operating temps would make the situation worse. I have the feeling it wouldn't when the voltage is well within safe margins, but if the stupid IHS was thinner, this may have been avoided? Perhaps?

There may be correlation, but proving a causation it's a different topic altogether.

Regards.

wakuwaku · Apr 25, 2023

I would like to see some ram manufacturers comment on whether can we have ddr5 6000mhz at lower than 1.35v

TerryLaze · Apr 25, 2023

Mandark said:
So this is a widespread problem with the entire 7000 series which I suspected. Sure am glad I didn’t buy one of this series

Inadequate testing is one hypothesis

I’ve used AMD processors all my life and career and still that’s all I buy, but this seems really super sloppy, and I may have to reconsider my purchase decisions from now on. I agree that if a sensor stops reporting data, the CPU should shut itself down for safety reasons and report an error. This is unacceptable performance from AMD. I will never recommend this series of processor to anybody for any reason.

For the affected people they should replace the CPU with one that doesn’t have these problems and also reimburse them for their motherboard and anything else that got damaged. It’s the least they could do.

Shutting down and giving an error is not really much better because you would still have an unusable system.
It should fall back to base 100% safe settings so you have at least a working system even if it is a bit slower.

TerryLaze · Apr 25, 2023

Math Geek said:
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?

The temp itself is not the main issue, the lack of temp reporting just makes it go into a feedback loop with ever increasing Voltages/clocks and that over voltage is what is causing the blow ups.
Max allowed temp at max allowed vcore should be fine for many years because the allowed limit is a bunch below the real limits.

Dr3ams · Apr 25, 2023

I don't understand why users want to overclock their hardware. If I want something to run at a certain speed, then I purchase the hardware that does what I want.

Also...if AMD wants a high performance CPU that natively runs at higher temps, then invent one. Don't tease users into thinking that it's OK to run a processor (designed to run safely below 90 degrees) over the specified limits and then leave it there.

RichardtST · Apr 25, 2023

Ouch. That's gonna hurt.... but wait... what??? Using EXPO voids the warranty?!?! What kind of absolute nonsense is that? AMD had a hand in creating that as an answer to Intel's XMP, so they had better own
up to it and get with the memory vendors to fix their profiles or find another way to protect their
sensitive processors ASAP. But you can't void warranties on something you yourself designed and blessed.
No, that's just wrong. Everyone uses XMP and EXPO. Fix it.

rluker5 · Apr 25, 2023

Dr3ams said:
I don't understand why users want to overclock their hardware. If I want something to run at a certain speed, then I purchase the hardware that does what I want.

Also...if AMD wants a high performance CPU that natively runs at higher temps, then invent one. Don't tease users into thinking that it's OK to run a processor (designed to run safely below 90 degrees) over the specified limits and then leave it there.

Your opinion is wrong.

TerryLaze · Apr 25, 2023

RichardtST said:
Ouch. That's gonna hurt.... but wait... what??? Using EXPO voids the warranty?!?! What kind of absolute nonsense is that? AMD had a hand in creating that as an answer to Intel's XMP, so they had better own
up to it and get with the memory vendors to fix their profiles or find another way to protect their
sensitive processors ASAP. But you can't void warranties on something you yourself designed and blessed.
No, that's just wrong. Everyone uses XMP and EXPO. Fix it.

XMP also voids warranty, anything above stock does so for both amd and intel, and nvidia and anybody else as well.

XMP Warranty void

Hello, I'm running on an I5 9600k and I wanted to ask if my CPU warranty is voided even though i've been running on XMP since the day i bought it. I have been in no way informed of XMP being "overclocking" and something that voids the warranty, all I was informed of is that it's the way of...

community.intel.com

btmedic04 · Apr 25, 2023

Buildzoid does not think it's the SOC. Theres simply not enough amperage on that rail to cause the kind of damage we're seeing here. It's all speculation at this point, but I'm sure someone will get to the bottom of it soon

View: https://youtu.be/DP-PqRduunw

InvalidError · Apr 25, 2023

-Fran- said:
And I agree: I hope they realize that the "on paper" restriction of EXPO/XMP invalidating warranties is stupid. If you won't warranty it, then don't advertise it as part of the platform, you stupid people from marketing.

I agree. Advertising a feature that voids the warranty makes no sense. Either limit a "standard" feature's scope and range to something covered under warranty or don't implement it at all.

InvalidError · Apr 25, 2023

btmedic04 said:
Buildzoid does not think it's the SOC. Theres simply not enough amperage on that rail to cause the kind of damage we're seeing here. It's all speculation at this point, but I'm sure someone will get to the bottom of it soon

Traces within the chip are only a few nm apart. A catastrophic failure on stuff powered by Vsoc could cause a short on stuff powered by Vcore and things go downhill from there.

News AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

Administrator

Splendid

Distinguished

Titan

Deleted member 2838871

Guest

Distinguished

Reputable

Honorable

Titan

Glorious

Deleted member 14196

Guest

Distinguished

Distinguished

Titan

Glorious

Distinguished

Titan

Titan

Reputable

Respectable

Distinguished

Titan

Distinguished

Titan

Titan

Share this page