News Too Hot to Last? Investigating Intel's Claims About Ryzen Reliability

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

kinggremlin

Distinguished
Jul 14, 2009
574
41
19,010
Please share the article that explores this issue and its relationship to longevity. Particularly Intel's claims.

Well, your response makes it clear why this site is continuing on its downward spiral. I'm not sure how much more clearly I could have stated in my last post, that the problem was the title of the article and not the article itself.

The title of the article: "Too Hot to Last? Investigating Intel's Claims About Ryzen Reliability"

Third paragraph of the article:
"Interestingly, Intel then drove further on the issue, citing a report that claims reliability is behind AMD's apparent, but not proven, reasons for reducing its chips' frequencies. "

Wait... So it's not actually Intel that came up with this hypothesis? They're just quoting someone else.

Little further down:
"We chose to look into the matter further based on a comment made by legendary overclocker and Asus engineer Shamino on the Overclock.net forums, which is the same comment that spurred the article Intel cited in the slide above. "

Come again? The title says Intel spurred you to write this article, now you're claiming to be investigating this issue because an Asus engineer hypothesized it.
Shouldn't the article be titled: "Too Hot to Last? Investigating an Asus engineer's Claims About Ryzen Reliability"?

Anyone who has been on this site for more than 10 minutes, knows the vast majority of posters are pro-AMD. It was no accident that "Asus engineer" was replaced in the article title with "Intel." If you want your articles read, why don't you trying accurately titling them? Is inciting the AMD mob with clickbait titles the only way for you to drive traffic now? Keep up the outstanding trolling guys.
 

vaughn2k

Distinguished
Aug 6, 2008
769
4
19,065
It is a semiconductor limitation.
The process node is already at 7nm, therefore, the dielectric could be below 5nm or less.
While temperature increase limits charge carrier, thus limits performance.
If there is a material, that is super cool, or a super conductor material, then noobs will not complain..
 
Well, your response makes it clear why this site is continuing on its downward spiral. I'm not sure how much more clearly I could have stated in my last post, that the problem was the title of the article and not the article itself.

The title of the article: "Too Hot to Last? Investigating Intel's Claims About Ryzen Reliability"

Third paragraph of the article:
"Interestingly, Intel then drove further on the issue, citing a report that claims reliability is behind AMD's apparent, but not proven, reasons for reducing its chips' frequencies. "

Wait... So it's not actually Intel that came up with this hypothesis? They're just quoting someone else.

Little further down:
"We chose to look into the matter further based on a comment made by legendary overclocker and Asus engineer Shamino on the Overclock.net forums, which is the same comment that spurred the article Intel cited in the slide above. "

Come again? The title says Intel spurred you to write this article, now you're claiming to be investigating this issue because an Asus engineer hypothesized it.
Shouldn't the article be titled: "Too Hot to Last? Investigating an Asus engineer's Claims About Ryzen Reliability"?

Anyone who has been on this site for more than 10 minutes, knows the vast majority of posters are pro-AMD. It was no accident that "Asus engineer" was replaced in the article title with "Intel." If you want your articles read, why don't you trying accurately titling them? Is inciting the AMD mob with clickbait titles the only way for you to drive traffic now? Keep up the outstanding trolling guys.

Really good argument there. We are falling somewhere together on the side of Editorial power. Toms have a responsibility of scrutiny. If they really wanted to do an investigation that is fair and unbiased, they would have included the 9900K in the melee.

And you are right, the title is of youtube influencers caliber.

Should have been renamed.

Thermal investigation on silicon degration over time, an AMD and Intel analysis.

Instead the title is saying that Intel made the allusion and that Toms made their move... and spreading Intel dirty propaganda scheme. The goal was to establish a doubt in people minds to attack the mindshare of AMD, and by having toms posting this article, they clearly succeeded.

So basically, your mom is telling everyone the house is dirty. You understand that the real message here to clean the mess and you do it for her to shut up. Basically, you have been manipulated in doing something that you have never been asked for. This is EXACTLY what happened here. If Intel was not directly the one telling toms this, then the title is totally false, which make this a personal initiative which perspire of bias. ANd if it was even the case, the fact that this is entirely directed at AMD is downright wrong.

Toms should just post videos on youtube now. Just trying to be popular and stuff, people are just going to watch it!
 

CerianK

Distinguished
Nov 7, 2008
261
50
18,870
I am glad to not be the only one understanding what is at odd here.
Best case: Ensure AMD does not go down the same dark path in pursuit of profits that Intel has taken in the past (e.g. PIII).
Worst case: Use FUD to remind AMD to stay on the correct path, which will not detract from AMDs current offerings in the the long run.
Either way, thank you Intel and Tom's for effective (if not intentional) use of 'tough-love', which is really just news, like any, to be critically considered.
[disclosure]I have a vested interest in AMD, but historically have used mostly Intel products, which I am not vested in.[/disclosure]
 

logainofhades

Titan
Moderator
So Ryzen is clocking down at temps most users will never see? Sounds correct. Still deciding on AMD vs. Intel for my next build. Currently on an i5 4690k overclocked to 4.6Ghz, my typical max, full load temps are just below 60 C. So Intel living up to their old reputation is definitely swaying me towards AMD. Worst part is, part of me wants to hold out to see if this Ryzen behaviour is revealing of any underlying issues, which means Intels release is working. I suspect it's just tweaking on AMD's part, and I'd like to see a comparison with competing Intel designs

In nearly 20yrs of building PC's, I have only experienced one dead CPU, and that was because of the Rosewill PSU, that I had at the time, died and took motherboard and CPU with it. I was just fortunate that it was a cheap sempron 145 system, that was only used as an HTPC. I am currently using a 3700x, myself, and have had no complaints.
 
Most experienced PC enthusiasts know about temperature, voltage and electron migration. This is why the simplest and most popular mitigation for longevity, especially when combined with overclocking, has been high end cooling.

I can say from the past 2+ years with 2 Ryzen 1700 CPUs pushed to 3.95 GHz and 4.0 GHz that I have had no performance degradation thanks to highly effective cooling. Of course, YMMV. But the point is, if you keep it well below the thermal limits your longevity will definitely improve. And in my 20+ years of experience with multiple families of CPUs, this has borne out to be true.

So personally, I am not concerned. If this was a real issue, the community would already be reporting a large number of CPU failures during normal operation. My opinion is that Intel is just trying to recover some of the market share they keep bleeding out in the face of further 10nm delays.
 
Best case: Ensure AMD does not go down the same dark path in pursuit of profits that Intel has taken in the past (e.g. PIII).
Worst case: Use FUD to remind AMD to stay on the correct path, which will not detract from AMDs current offerings in the the long run.
Either way, thank you Intel and Tom's for effective (if not intentional) use of 'tough-love', which is really just news, like any, to be critically considered.
[disclosure]I have a vested interest in AMD, but historically have used mostly Intel products, which I am not vested in.[/disclosure]

AMD is a company. They have share holders. If they see a path to profit or a path to help the little guy and lose money, they will go down the path to profit.

In nearly 20yrs of building PC's, I have only experienced one dead CPU, and that was because of the Rosewill PSU, that I had at the time, died and took motherboard and CPU with it. I was just fortunate that it was a cheap sempron 145 system, that was only used as an HTPC. I am currently using a 3700x, myself, and have had no complaints.

This may become more common though as smaller transistors have less resistance to the electricity flowing through them.

I am not saying over night they will start failing but we may start to see more than we used to in smaller process nodes.

Most experienced PC enthusiasts know about temperature, voltage and electron migration. This is why the simplest and most popular mitigation for longevity, especially when combined with overclocking, has been high end cooling.

I can say from the past 2+ years with 2 Ryzen 1700 CPUs pushed to 3.95 GHz and 4.0 GHz that I have had no performance degradation thanks to highly effective cooling. Of course, YMMV. But the point is, if you keep it well below the thermal limits your longevity will definitely improve. And in my 20+ years of experience with multiple families of CPUs, this has borne out to be true.

So personally, I am not concerned. If this was a real issue, the community would already be reporting a large number of CPU failures during normal operation. My opinion is that Intel is just trying to recover some of the market share they keep bleeding out in the face of further 10nm delays.

Cooling wont stop electron migration. Cooling just helps to dissipate the heat from the processor. If a uArch is leaking a lot cooling will just help to keep the heat off but wont stop any damage that may do.

That said, your Ryzen is not included in this. They specifically are talking about Zen 2 on 7nm. Which considering that Intel has talked about cores failing before its not something to just brush off, especially considering Intel is one of the top process technology companies out there.


https://spectrum.ieee.org/semiconductors/processors/transistor-aging


There are plenty of issues faced with smaller transistors. Intel and all the big process technology companies have been looking for solutions to many of them for year. Different materials, new ways to produce them etc. This is where EUV came from as it makes 7nm and below much easier to produce, and more cost effective.
 

martel80

Distinguished
Dec 8, 2006
368
0
18,780
Thank's for perfect review and analyze of the current boost clock frequencies etc which hopefully will be improved in new AGESA released next week. Anyway I see much serious issue with Windows Hardware Errors (WHEA) which makes from my Ryzen 3800X setup totally unreliable peace of hardware and as searched in all discussion forums so many usera are affected and still not solved. Basically my story is following as already posted on nVidia forum:

My configuration is Ryzen 7 3800X, MB Gigabyte X470 AORUS ULTRA GAMING and newly Asus 1070 replaced by Gigabyte 1080Ti Extreme. With the 1080Ti I'm not able to bypass 3D Mark and also my only game PUBG crashing due to WHEA issues. Graphics card Gigabyte 1080Ti Extreme has been replaced already with exactly the same result and crashing. I'm running latest BIOS with Agesa 1.0.0.3 ABB, latest nVidia drivers, latest AMD chipset drivers, latest everything and clean Windows installation as well. It's super annoying and driving me crazy and it's not acceptable. Hopefully AMD will fix it it in next Agesa which should mainly fix boost frequencies but who cares as WHEA errors are really serious issues as I see in many discussions and should have TOP priority for AMD!!! Strange is that I didn't had issue with Asus 1070 but with two 1080Ti I have. On top of it I tried all tips, reset BIOS, change PCI-E from Auto to Gen4 to Gen3 to Gen2, underclock infifinity fabric with memory clocks no success at all. PSU is also sufficient with 750W Corsair. Any advice how to temporarily fix is highly appreciated and hopefully AMD will fix it within coming days!

Have you tried checking your memory for errors with e.g. memtest86?
 

TJ Hooker

Titan
Ambassador
Cooling wont stop electron migration. Cooling just helps to dissipate the heat from the processor. If a uArch is leaking a lot cooling will just help to keep the heat off but wont stop any damage that may do.
Superior cooling will lower the operating temperature of the CPU. Operating temperature has a significant impact on electromigration.
 
Superior cooling will lower the operating temperature of the CPU. Operating temperature has a significant impact on electromigration.

Heat does yes. However it is not as major a factor as say voltage. You could keep a 9900K cool with a peltier cooler at 1.55v but its above the maximum 1.52v stated by Intel and well above the recommended maximum, by many overclockers and tech sites, of 1.4v. Even running cool the voltage will speed up the transistor degradation compares to it running hotter at a much lower voltage.

https://community.cadence.com/caden...ectromigration-what-ic-designers-need-to-know

I was only stating that keeping it cool does not stop it. It helps to mitigate but not as much as running a lower voltage would.
 

TJ Hooker

Titan
Ambassador
Heat does yes. However it is not as major a factor as say voltage. You could keep a 9900K cool with a peltier cooler at 1.55v but its above the maximum 1.52v stated by Intel and well above the recommended maximum, by many overclockers and tech sites, of 1.4v. Even running cool the voltage will speed up the transistor degradation compares to it running hotter at a much lower voltage.

https://community.cadence.com/caden...ectromigration-what-ic-designers-need-to-know

I was only stating that keeping it cool does not stop it. It helps to mitigate but not as much as running a lower voltage would.
Nowhere in your source does it say that voltage has a bigger impact that than temperature...

Nothing "stops" electromigration (while your CPU is running anyway). Mitigation is the name of the game.
 
Nowhere in your source does it say that voltage has a bigger impact that than temperature...

Nothing "stops" electromigration (while your CPU is running anyway). Mitigation is the name of the game.

One of the main mitigations listed is lowering the supply voltage.

Voltage absolutely is a much more major factor than heat. Excessive heat, yes. But cooling will not stop if you push more voltage through than the CPU itself is meant to handle, as I said you can use methods to lower extreme heat easily but the only mitigation to voltage is lowering it. It also wont stop if the process itself is prone to current leakage. It will mitigate the heat but it will not stop electromigration and transistor degradation as well as running at optimal or lower voltages.
 

TJ Hooker

Titan
Ambassador
One of the main mitigations listed is lowering the supply voltage.
Yes, and it also says "EM is worse at higher temperatures. "
Voltage absolutely is a much more major factor than heat. Excessive heat, yes. But cooling will not stop if you push more voltage through than the CPU itself is meant to handle, as I said you can use methods to lower extreme heat easily but the only mitigation to voltage is lowering it. It also wont stop if the process itself is prone to current leakage. It will mitigate the heat but it will not stop electromigration and transistor degradation as well as running at optimal or lower voltages.
Whether a change in voltage or a change in temperature will have a larger effect on EM depends on what your current operating condition is. If you look at Black's equation for estimating MTTF from EM, temperature appears in the exponent (compared to voltage being in the denominator, as current density is proportional to voltage), meaning changes in temp will generally have a larger effect. https://en.wikipedia.org/wiki/Black's_equation

Edit: I ran some numbers, and I was wrong. For a realistic operating scenario for a CPU, a change in voltage would have a larger effect on EM than change in temp, at least if you compare on the basis of a change of 1V to a change of 1 degree C. Math below!
Assume current density is proportional to voltage, i.e. J=B*V (B is some constant). If we combine B with the constant A from the equation to form C (C=A/B), the equation is:

MTTF= (C/V)e^[Q/(kT)]

We can then take partial derivatives of that equation with respect to V and T to see the rate of change of MTTF to a change in V and T (d/dV and d/dT), respectively. We can then divide these two derivatives, to form a ratio of the rate of change with respect to T to the rate of change with respect to V. We get:

(d/dT)/(d/dV) = (V/T^2)(Q/k)

If we set the equation equal to one, we can see at what point a change in T will have an equal effect as a change in V. If we take a typical value for Q as 0.7 electronVolts (from the wikipedia page on EM), and assume voltage is 1V, we get:

(1/T^2)*8126 = 1

Solve for T, we get 90.1 K. So with the above assumptions, a change in voltage would have a greater effect than a change in temp for temp > -183 C.

If we pick a more realistic temperature (e.g. 70 C), we get a ratio of 0.07. Which means a change 1 degree C would have an equivalent effect as a change of 70mV
 
Last edited:
Yes, and it also says "EM is worse at higher temperatures. "

Whether a change in voltage or a change in temperature will have a larger effect on EM depends on what your current operating condition is. If you look at Black's equation for estimating MTTF from EM, temperature appears in the exponent (compared to voltage being in the denominator, as current density is proportional to voltage), meaning changes in temp will generally have a larger effect. https://en.wikipedia.org/wiki/Black's_equation

Except again you can control the temperature. Of course run away temps will be bad. But again you can control the temperature while sustaining excessive voltage to the chip. Running a chip with good cooling but excessive voltage will still eventually cause transistor degradation.

Thats my point. Not saying that you can use crap cooling and run it hot and all will be fine. Just that using superior cooling will mitigate heat but not always does that mean electromigration is under control. Unless you think keeping a chip cool while running excessive voltage will keep it from occurring?
 

TJ Hooker

Titan
Ambassador
Unless you think keeping a chip cool while running excessive voltage will keep it from occurring?
Yes, if you run excessive voltage it is still possible to mitigate EM back to stock levels through controlling temp. Although depending on the exact scenario it could impractical, and/or require extreme cooling, e.g. LN2.

I edited my previous comment, based on some quick and dirty math it looks like voltage would have a bigger impact than temperature on EM for a typical scenario (e.g. a CPU running at ~1V in a 20 C room). At least if you compare a change of 1V to a change of 1 degree C. Based on my rough calculations, if your CPU is running at 1V and 70 C, an increase of 70mV would have an approximately equivalent effect on EM as a change of 1 degree C.
 
From the article, indicating that Tom's doesn't approve of Intel's claims:


Ice lake is no paper launch (yet). It takes a couple of months for devices with new CPUs to appear on the market, especially when they require a brand new platform.

Pitchforks aside, the article does give a decent insight into what may be the cause of Ryzen 3000's frequency "problem". And indeed, AMD would not have changed its boost bins for no reason. The company does seem a little concerned about reliability.

Even then, the maximum boost frequency often falls 25-75MHz shy of the advertised speed, which appears to be a legitimate problem with how Precision Boost handles SenseMI data. I will put the claims of differences up to 300MHz down to confirmation bias. Many people have a lot of processes running in the background that they're not aware of, which can throw the Windows scheduler off and make it target multiple cores at low usage. Those are also the people who wouldn't normally check their CPU frequency, unless they saw a forum post saying some people are experiencing issues.

Nah, this article was clearly the best outcome Intel was hoping for... a story that get mediated on baseless assumption for doing their dirty work. Congrats because it worked.

Today, der8auer and Toms are the laughing stock of credibility. I called Bios issues 3 weeks after the launch... and Steve from HardwareUnboxed did it also. But the best, he did a reall little investigation with DIFFERENT motherboards... something that Toms or der8auer didn't do before spreading their fud.

Since Toms stuck to the MSI Godlike x570, they were just displacing the issue from one CPU to the other. That is fault finding troubleshooting 101. I am calling the so called tech press on this.

So today we have two options:

  1. Toms and der8auer are either ignorant and incompetent: or
  2. They both have agendas.
Both are extremely preoccupying.

In the meantime, how it is supposed to be done.

View: https://youtu.be/PB72OrnSeQo
 
Nah, this article was clearly the best outcome Intel was hoping for... a story that get mediated on baseless assumption for doing their dirty work. Congrats because it worked.

Today, der8auer and Toms are the laughing stock of credibility. I called Bios issues 3 weeks after the launch... and Steve from HardwareUnboxed did it also. But the best, he did a reall little investigation with DIFFERENT motherboards... something that Toms or der8auer didn't do before spreading their fud.

Since Toms stuck to the MSI Godlike x570, they were just displacing the issue from one CPU to the other. That is fault finding troubleshooting 101. I am calling the so called tech press on this.

So today we have two options:

  1. Toms and der8auer are either ignorant and incompetent: or
  2. They both have agendas.
Both are extremely preoccupying.

In the meantime, how it is supposed to be done.

View: https://youtu.be/PB72OrnSeQo

Except multiple users with different hardware configurations across multiple sites and multiple countries had the issue. If it was just a few people using similar hardware, sure. But since it was not and was found from multiple platforms you are just being your normal negative self.

I have yet to see you post anything positive about any article Toms posts. Especially if it doesn't praise AMD and rather calls out issues that are present. AMD is not perfect. They can make mistakes and if no one says anything they, like any company, will let it go.
 

spikey in tn

Distinguished
May 14, 2009
21
1
18,515
Why are you posting Intel's garbage propaganda? You are doing them a favor for spreading their fud! That's sound like Intel 5GHz 28 cores demo all over again... and their 10nm ice lake paper launch... you didn't learned anything yet???!!!
Fud, fud, and more fud is Intel's trademark.
 
  • Like
Reactions: Avro Arrow
One of the biggest issues we face with silicone today is the degradation of the material as it gets smaller and under higher temperatures. Its why a lot of smaller process technology has lower temperature thresholds, although we still push them.

Intel actually did at one point have an idea to have a CPU designed with reserve cores so that if a core died or had issues it could be activated and the dying/dead core could be brought back.

Another solution that Intel, IBM and any company involved in process technology are looking for is alternative materials to Silicon that can survive the stresses better.

I doubt we will see a Ryzen CPU just die any more than an Intel would die in its useful lifetime. So its not a major issue but I do wonder if AMD does know there is potential there,
It doesn't really matter as much as some of us might think and this is why:
"Google says an advanced computer has achieved "quantum supremacy" for the first time, surpassing the performance of conventional devices. The technology giant's Sycamore quantum processor was able to perform a specific task in 200 seconds that would take the world's best supercomputer 10,000 years to complete. "
https://www.bbc.com/news/science-environment-50154993

Google's 53-qubit quantum computer essentally demonstrated that one day we will look at silicon-based digital electronics the same way we see ENIAC today. To give you an idea, consider the feeling you get from seeing all those old analogue electronic devices in Unigine's Superposition.
 
This is the same company that made the Core2Duo and Core2Quad by connecting two and then four CPUs together in a single CPU and then chided AMD's chiplet strategy for Zen as "glued together".

Core 2 Duo was not 2 dies it was a monolithic dual core. Core 2 Quad was two dual core C2D dies connected in the same package. The dual core you are thinking of is the Pentium D. That was two single core dies in the same package.

And to be fair, AMD gave Intel all kinds of hell for Core 2 Quad stating their monolithic design was superior. Yet we all remember how well Phenom I did against Core 2.

I find it ironic that AMD went the MCM approach after giving Intel hell about it but if it works, it works.
 
  • Like
Reactions: Avro Arrow
Core 2 Duo was not 2 dies it was a monolithic dual core. Core 2 Quad was two dual core C2D dies connected in the same package. The dual core you are thinking of is the Pentium D. That was two single core dies in the same package.

And to be fair, AMD gave Intel all kinds of hell for Core 2 Quad stating their monolithic design was superior. Yet we all remember how well Phenom I did against Core 2.

I find it ironic that AMD went the MCM approach after giving Intel hell about it but if it works, it works.
You could be right about the C2Q, I could just have it bass-ackwards. I do also find it funny that AMD proclaimed themselves "The First TRUE Quad-Core" because their design was monolithic and Intel's wasn't. Now history is repeating itself in the opposite direction (crazy, eh?). In any case, negative criticism from the other side is almost never valid because they're like opposing political parties. Everything they do is right and everything the other does is wrong.

Bottom Line: It doesn't matter how a CPU is put together, just so long as it works and works well.
 
It doesn't really matter as much as some of us might think and this is why:
"Google says an advanced computer has achieved "quantum supremacy" for the first time, surpassing the performance of conventional devices. The technology giant's Sycamore quantum processor was able to perform a specific task in 200 seconds that would take the world's best supercomputer 10,000 years to complete. "
https://www.bbc.com/news/science-environment-50154993

Google's 53-qubit quantum computer essentally demonstrated that one day we will look at silicon-based digital electronics the same way we see ENIAC today. To give you an idea, consider the feeling you get from seeing all those old analogue electronic devices in Unigine's Superposition.
Quantum computers are only capable of doing very specific things,those things they will do in 200 seconds if it would take the world's best supercomputer 10,000 years but a quantum computer can't do normal stuff at all, if you want to edit a simple text file quantum computing can't do it.
Conventional computers will be around for many years to come,quantum computers will become the GPUs of the future computers.