News AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

punkncat

Polypheme
Ambassador
The comment in the article about warranty is the aspect I am interested in. I would like to think since this was a failing on the part of AMD/board partners and (as mentioned above) my opinion of inadequate testing....I hope all these consumers are made whole.
 
  • Like
Reactions: RichardtST
I agree. Advertising a feature that voids the warranty makes no sense. Either limit a "standard" feature's scope and range to something covered under warranty or don't implement it at all.
As long as they also "advertise" the voiding to a degree that satisfies the law then it's only fair.
Intel for example shows it on the same page and you only have to read the headlines to get it.
You are warned and you know what you get.
 
  • Like
Reactions: hannibal and KyaraM

RichardtST

Respectable
May 17, 2022
242
268
1,960
XMP also voids warranty, anything above stock does so for both amd and intel, and nvidia and anybody else as well.
Interesting. I did not know that. That information should be made blatantly clear on all the products in big bold letters. It should also be printed in big bold letters on the CPU/GPU boxes. I find it very misleading that it is not. I have been led to believe that it was all perfectly sanctioned and safe. Here I was thinking that "stock" was super-safe-ultra-conservative mode and that XMP/EXPO was still safe, just a little less so. Apparently not....
 
  • Like
Reactions: DookieDraws

RichardtST

Respectable
May 17, 2022
242
268
1,960
As long as they also "advertise" the voiding to a degree that satisfies the law then it's only fair.
Intel for example shows it on the same page and you only have to read the headlines to get it.
You are warned and you know what you get.
Where does that say "voids warranty"?
My browser doesn't seem to display fine print that small....
 

btmedic04

Distinguished
Mar 12, 2015
486
383
19,190
Traces within the chip are only a few nm apart. A catastrophic failure on stuff powered by Vsoc could cause a short on stuff powered by Vcore and things go downhill from there.
Vsoc rails do no have the amperage necessary to create the heat needed to bubble the substrate. That sort of heat is more likely to come from the vcore rails as that has access to far more amperage available to it. If a temperature probe is failing due to excessive vsoc voltage, then the cpu has no way of knowing if it's within spec, which causes it to pull as much amperage through vcore as the vrm will supply, which in turn causes the physical damage were seeing
 
I've been wondering if it's possible that OVERTIGHTENING of the CPU cooler could also be playing a part in causing this issue? Maybe too much force is causing some of the pins to bend, or causing damage to the temperature probe(s). Maybe these newer chips just aren't as durable. Just a thought.
 

setx

Distinguished
Dec 10, 2014
264
237
19,060
I'm running my cheap Samsung 32GB 4800 x2 memory at 6200 with 1.25 V on memory and SoC. Yes, it probably can be pushed higher with more voltage but there is no point.

It's quite sad to see people burning their CPUs just because they left voltages on "auto". I put the blame 100% on motherboard makers who just love to stealthy rise voltages.
 
  • Like
Reactions: btmedic04
Where does that say "voids warranty"?
My browser doesn't seem to display fine print that small....
It says overclocking right in the title but if that's not enough of a clue for you
they also explain that it's overclocking in the same line they explain what XMP is and if that's still not enough of a clue they also give you a footnote.
QbqiFpp.jpg

If somebody doesn't know what overclocking is, reading the FAQ on the same page makes it even clearer.
tuCUlkw.jpg
 

InvalidError

Titan
Moderator
Vsoc rails do no have the amperage necessary to create the heat needed to bubble the substrate.
That doesn't matter.

You only need a couple of mA to burn CMOS transistors inside an IC. All SoC voltage needs to do is have enough power to melt stuff inside the silicon and cause a short on something else that does have the power to melt the entire socket like Vcore.

That is called cascade failure. The initial point of failure can be several steps removed from the spectacular outcome.
 

kal326

Distinguished
Dec 31, 2007
1,230
109
20,120
I have issues with EXPO and sleep, so I only turn EXPO on if I'm going to play a game, which happens quite seldom.

I currently have mine disabled to confirm that my latest build is actually stable on wake. Crash dumps pointed to crappy Gigabyte software primarily causing my on wake crashes. I'll probably wait a few more weeks and then turn it back on for day to day usage. Go from there. I'd like to set it on and keep it on, but stability is a higher concern.
 
Apr 25, 2023
5
3
15
Wow, what a bunch of whiners here! Intel and AMD both support various overclocking features because a lot of their customers asked them to. But vendors do not deliberately leave performance on the floor for overclockers to pick up, and it's always been the case that your overclocking results will vary, and you stand a decent chance of decreasing the life of your CPU, possibly severely. If the vendors reacted to the complaints about lack of warranty support for advertised overclock features by removing every feature that was possibly unsafe, people would be right back here in the forums whining about that instead.

The simple fact is that CPU vendors do a lot of careful testing to establish safe operating limits for a chip. I've worked on the engineering team for a CPU, and have personal experience with this. If you want warranty support, and don't want to risk your CPU, don't overclock. Simple as that.
 

sitehostplus

Honorable
Jan 6, 2018
404
163
10,870
it's a lesson for AMD and Partner to conduct more deep testing when launching brand new platform, including overclocking possibility and limit (both memory and CPU). to ensure safety to their customer

ppl who expeirence such issue, might never back, if it were me I'll never touch their product again for a long time because it's really painful and discouraging to have something burn, even if they replace it for free.
Sorry, but it's not up to them.

All any manufacturer is required to do is ensure the equipment they make works as designed.

We as overclockers by our nature push equipment beyond what it is designed to do.

When you do that, there are risks involved, and as you have found out things can and will break. When it happens, that is not their fault in the least, it's ours.

The only reason companies like AMD are in our corner, is it's good PR. To show off how much abuse these computers can withstand and still work is nothing more than proof of how well built the stuff is, and how much the average person can trust the product to work as designed.

At the end of the day, it's risky. And if that risk bothers you, then don't overclock.
 

InvalidError

Titan
Moderator
If you want warranty support, and don't want to risk your CPU, don't overclock. Simple as that.
Problem is most people don't view running DRAM at its intended XMP/EXPO settings as overclocking. It makes no sense for AMD and Intel to introduce standards for automating high-speed memory setup if it isn't covered under warranty. The range of whatever parameters automatic setup can play with should be limited to safe values so people don't automatically fry their components through no real fault of their own.
 

rluker5

Distinguished
Jun 23, 2014
911
594
19,760
Please...elaborate.
My 13600k runs 5.5 on the p cores and 4.4 on the E cores at stock volts. That's+400,500mhz. But I run it at an undervolted 5.3 so the fan on my $50 air cooler never spins up. Also the micron ram is taken from 4800c40 to 5500c28@1.25v and latency is dropped by 13ns.

A lot of people like to overclock. There are locked chips you can buy if you don't want the option.
 
Wow, what a bunch of whiners here! Intel and AMD both support various overclocking features because a lot of their customers asked them to. But vendors do not deliberately leave performance on the floor for overclockers to pick up, and it's always been the case that your overclocking results will vary, and you stand a decent chance of decreasing the life of your CPU, possibly severely. If the vendors reacted to the complaints about lack of warranty support for advertised overclock features by removing every feature that was possibly unsafe, people would be right back here in the forums whining about that instead.

The simple fact is that CPU vendors do a lot of careful testing to establish safe operating limits for a chip. I've worked on the engineering team for a CPU, and have personal experience with this. If you want warranty support, and don't want to risk your CPU, don't overclock. Simple as that.
This is a bad thermal sensor blowing up at potentially as low as 1.35V , overclocking might speed it up but it's pretty safe to say that it could happen on 100% safe settings as well.
Every time you boot up the CPU gets a pretty increased amount of Vcore and that could be enough to fry the sensor over time.

What job did you do on the engineering team to not know something as basic as that?!
 

sitehostplus

Honorable
Jan 6, 2018
404
163
10,870
i wondered how long it would take to start seeing failures. once they said "95 degrees is normal operating temp" it was only a matter of time.

to just let it keep ramping it up until it hits the max allowable temp is just stupid no matter who is making it. 95 may still be temporarily safe-ish but to force it to stay there all the time is asking for this type of thing.

this is happening with pushing it even further than it already is which makes me wonder if there is any long term damage from "normal" 95 degree all the time even at 1.25-1.35 volts. may not happen this fast but is it slowly happening anyway and these extreme oc people just helped it along so we see it faster? are we a couple years away from a mass of dead and dying chips even though they ran within specs?

I haven't even started any overclocking, and my temps run at max 170 degrees fahrenheit (don't know exactly what that is in Celsius, but I'm pretty sure that isn't close to 95 degrees). I'm using Open Hardware Monitor btw.

I'll hazard a guess that the average user (the people who don't oc) is safe.
 
My 13600k runs 5.5 on the p cores and 4.4 on the E cores at stock volts. That's+400,500mhz. But I run it at an undervolted 5.3 so the fan on my $50 air cooler never spins up. Also the micron ram is taken from 4800c40 to 5500c28@1.25v and latency is dropped by 13ns.

A lot of people like to overclock. There are locked chips you can buy if you don't want the option.
And yet do you get any practical performance gains from this? Or are these only measurable if you run benchmarks?

<10% gains don't sound impressive when a decade ago 25%+ gains were commonplace.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,455
997
20,060
And yet do you get any practical performance gains from this? Or are these only measurable if you run benchmarks?

<10% gains don't sound impressive when a decade ago 25%+ gains were commonplace.
That just shows you how much performance manufacturers were leaving on the table in the past.

Now a days, they don't leave that much head room for OC-ing.

So any OC-ing done today is really climbing up that hockey stick for power consumption while having minimal gains in performance.
 

UWguy

Commendable
Jan 13, 2021
68
54
1,610
I assume these changes will result in a decrease in performance?

Time to redo the benchmarks with the safe voltage.

Also, AMD should either replace all potentially affected processors or extend the warranty. I’m sure even with a new BIOS there are chips out there that will meet an early demise by being partially cooked.
 
Last edited:
That just shows you how much performance manufacturers were leaving on the table in the past.

Now a days, they don't leave that much head room for OC-ing.

So any OC-ing done today is really climbing up that hockey stick for power consumption while having minimal gains in performance.
If we go back another decade, while you could still potentially overclock some processors with say up to 1.0GHz in the latter half of the decade, the problem is the hardware around that CPU was likely not up to part at the time. Heck, when I last saw the invoice for my 2005 build, the motherboard was $80, but the CPU was $350. That computer had no real issue running said CPU all day, but even $80 seems kind of on the cheaper in back then, plus the thing was still built with mostly electrolytic caps and large ferrite core chokes.

In any case, overclocking created a lot of headaches for people back then, if this is anything to go by: https://devblogs.microsoft.com/oldnewthing/20050412-47/?p=35923

Besides that, clock speed wasn't really the only thing to have performance gains in. I think the only reason why boosting pushes the CPUs up the wall in terms of their limits is because we've simply ran out of easy ways to get performance, on top of the infrastructure to support it being readily available for cheap.
 

Dr3ams

Reputable
Sep 29, 2021
255
280
5,060
My 13600k runs 5.5 on the p cores and 4.4 on the E cores at stock volts. That's+400,500mhz. But I run it at an undervolted 5.3 so the fan on my $50 air cooler never spins up. Also the micron ram is taken from 4800c40 to 5500c28@1.25v and latency is dropped by 13ns.

A lot of people like to overclock. There are locked chips you can buy if you don't want the option.
I'm not sure what you are getting at. What you stated above is not a reason why, all it says is that you overclocked something.

In my original post I said "I don't understand why users overclock".
 
  • Like
Reactions: STbob