Anandtech: The truth of CPU degradation

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.


There is nothing to understand about the reading.
They did not peform any studies.
They did not reference any studies.
They did not have an significant anacdotel evidence to upon which their graphs were based.

They took a very basic truism and extrapolated upon it until whatever truth existed was lost.

The basic fact is so long as you operate your PC within design specifications, your CPU will last a very long time.
Much longer than it will be a servicable CPU.
 



Either your motherboard was overheating all that time, or your chip was a lemon, or you did something wrong as far as the settings are concerned, or all of the above.

I have a b3 q6600 overclocked to 3.2 for a year so far (well almost a year), with no signs of slowdown or wear.
My computer has been solid as a rock sinse the overclock.

My 3dmark06 cpu score has either stayed the same or has improved throughout the year. (score is always close, never wrote it down)

Some people forget that jacking up the fsb puts strain on the motherboard and sometimes more aggresive cooling is needed. I have a fan blowing on my Northbridge, Ram, and voltage regulators. I'm confident this cpu will be around until my next upgrade in 2010 or 2011.


 


I concur.

The E6400 is built on the same exact process as the E6800 which runs at 3.0Ghz by default.
The same exact degradation be seen by the E6800 chips after a few months of use at stock speed and thus forced to run under-clocked to function.

This simply is not happening.
There is also huge tolerance built into the chips to allow for such things.

 

You and I have made several assumptions here. Let's for instance take my example of the E8xx CPU's. Theoretically the E8500 should be a higher-binned chip than the E8400 (very unlikely in reality I know, hence theoretically) in which case, it should be of higher quality and therefore as likely to go pop as the E8400 under normal operating (i.e. not likely!). But in reality, the 8500 is just an 8400 with a bumped multiplier. Now what you said (your first point) is true, assuming ONLY the multi is bumped at the same voltage. The only way to test that would be to run two EE's at different multi's 24/7 and see which goes bang first (hoping, obviously that manu. tolerances are the same).
The problem is, as most people can't afford an EE with an unlocked multi, they bump the FSB, which as we know pretty much OC's the rest of the system, which is where the problems the article is trying to describe start!
I personally have no problem in paying more for a stock CPU and not OCing, I would though probably dabble with a cheaper machine that i'm not going to care about if it fries itself! :lol:
 
And Buying a Faster Stock Chip is not going to change the rate of decay.

The two chips are phyiscally the same.
Simply because Intel sets one to a higher default speed does not mean it will not decay any slower than a different CPU set to the lower defaults but then configured in BIOS to use higher speeds.
No two chips coming off any fab are physically the same.

A low end chip may or may not reach the stock frequencies of a high end chip at the stock voltage. A high end chip, however, is warranted to run at its stock frequency at stock voltage. You pay for that assurance over the risk/hassle of repeatedly generating excuses to RMA non-defective low-end chips.

If running the low end chip at high-end speeds requires a bump in voltage, the guarantee that it'll last as long as the high-end one goes out the window. Whether it lasts its useful life depends on the specific voltage and the quality of the manufacturing.

pc 1, place it in a hot place, resulting much higher overall system temp.
pc 2, overclock the cpu a little (ex, 500mhz to 1ghz with maybe a little voltage added), place it in a AC cooled place with good cpu cooler.

Let's say that pc 1 has cpu idle to load temps of 35 to 50c,
and pc 2 has cpu idle to load temps of 25 to 40c.

Here comes my question, which one will last longer?
Given that your "hot place" doesn't create anything near out-of-spec temperatures, we're talking about a very long lifespan for any decent chip manufacturer. However, a blanket statement on the relative longevity of pc 2 is not possible because the voltage/lifespan and temperature/lifespan curves are very far apart.

Loosely, a 10C rise in temperature halves the life expectancy of a processor - that's the basis of many electronics burn-in tests. However, depending on what the initial voltage was, a light bump of 0.05V on a modern CPU can affect longevity by an order of magnitude or very little. That's because many electronic properties are based on voltage thresholds, not typical exponential scaling.

They will both last until you don't want them any longer.
Most likely until about 2023-2025.
That sounds arbitrary unless you have some insider fab info. Anandtech's graphs were arbitrarily drawn, too, suggesting that CPUs would fail soon after their warranties expired when running at a margin of error past spec.

Occasionally I see a leak on reliability info, but I don't have current information, and I would certainly be more likely to doubt the reliability of a new process than a mature one. From several years ago, during the Pentium 4 days:

"Intel’s internal goal is that the failure rates of systems in service be less than 1% cumulative at 7 years and less than 3% cumulative at 10 years" -- http://www.eng-tips.com/viewthread.cfm?qid=35348

A far better reliability figure than suggested by Anandtech's graphs.
 


But you can control your voltage and since they are identical from a manufacturing point of view degradation can occur.

Now, Binning can be real.
You may need different voltages to reach different speeds on different chips.
It could very well be the case you are more likely to hit higher speeds with an E8500 than an E8400 with the same voltage. Therefore if a given chip takes more voltage to hit a certain clock, it will then degrade faster.

But when holding voltage, heat, and speed constant between the two, neither is more likely to suffer degradation than the other.

Also, there is very little emperical evidence to prove that chips degrade within their life span.
There is absolutely no scientific evidence to show that.
Now, It is possible to over-voltage and over-heat a CPU to cause damage, but that is not what we are discussing.
 




coolit is trash - you need a 5000rpm torando fan just to get to air cooling capablity

to match a water cooling set up with coolit you need two 5000rpm fans

we modified these with dual 120mm fans and funnels and they still failed to cool a oc quad,

this is a class action law suit product - they most have big fincial backing. i am temped to try the new version - has anyone - start a post if so.
 


It's somewhat arbitrary.
I don't have any studies, just personal observations.

Each model of chip is different so may last different.

There are not any chips being made today that closely resemble chips made 10-15 years ago in regards to materials and manufacturing process. So any current estimates are just that.

However, I've worked for very large organizations that have 1,000s and even 10,000s of devices running 24x7.
Some at high workloads for years straight.

I can't recall ever needing to replace a CPU, even for systems running 24x7 for upwards of 10 years.

HDDs, PSUs, Motherboards, RAM, Cables, etc... many times. But never just the CPU.
Now, it's possible the CPU may have been part of a larger swap.
And I'm sure that CPUs go bad sometimes, but I've just never seen it and I have quite a large base to look at.

But based on your link with only a 3% failure at 10 years, would seem to indicate that a 12-15 life expectancy is probably right on the nose.
 
Not only can different chip models behave differently, but different "copies" of a particular model in the same batch will vary in precise voltages and maximum stable frequency. Both AMD and Intel program different chips across the same model line with different base operating voltages - the VID.

However, I've worked for very large organizations that have 1,000s and even 10,000s of devices running 24x7.
Some at high workloads for years straight.

I can't recall ever needing to replace a CPU, even for systems running 24x7 for upwards of 10 years.
That is likely because these CPUs were running well within stock specifications and were produced on mature processes. The mentioned 1% failure rate target is based on a simulated 7 years of borderline stock conditions; actual failure rate is likely much lower.

But these lifespans do not compare with what happens when you run at out-of-spec temperatures and especially voltage. There were many consistent reports of dead Northwood P4s from electromigration at 1.7V; I think the same is happening with Wolfdales at 1.4x or even 1.3x V. A decades-long lifespan can be reduced to days when you encounter electromigration.
 


Yes, they were running inside of specs.
But the point is running within Spces.

1.365 Volts is within spec for the E8xxx 45nm (Less than E6xxxx)

1.5Volts is within spec for the E6xxxx Series.

I don't recommend running outside of specs.
I just point want people to be concerned about running chips within design specs.

I would never try pumping 1.5 or 1.6v through an E8xxx chip and expect it to last a long time.
I don't mind if folks do it and know and accept the risks.
 
I don't dispute that Anandtech's conclusions are concise, but I think anyone with any build experience who worries about shortening the life of their CPU by overclocking is a nitwit. I have been overclocking since the days of the Celeron 300A, and have never had a CPU go bad. I do know people who have pushed Northwoods too far and fried them, but most CPUs will give you a warning to let you know they are too close to the limit. I have had many processors that were overclocked close to the limit last far beyond their useful life. My current E6400 has been running @ 3.7 ghz for over a year and a half now. I recently sold a P4 2.4 on Ebay that had been running nearly 24 hours a day @ 3.2 ghz since 2003. The bottom line is that if you don't have a clue what you are doing, yes overclocking can significantly shorten the life of your CPU. Otherwise if you keep it cool, and use modest voltage increases, you can overclock near the limit and your CPU will be obsolete many years before it goes bad.
 
I think it depends on more than build experience. It depends on how conservative you are as an overclocker versus how well tested the process is. I see a substantial percentage of people on these forums with slight to moderate out-of-spec voltages in the same relative region as those who reported frying their Wolfdales. To top it off, many Wolfdale owners report no stability problems at the same voltages as those who popped theirs.

most CPUs will give you a warning to let you know they are too close to the limit.
That is true, but this warning level varies greatly among manufacturing processes. A 90nm SOI-based Athlon64 will probably reach a quick ceiling at voltages safe from electromigration, whereas the first 65nm Conroes were mostly limited by heat. I own both of the aforementioned and have noticed degradation twice on the Conroe due to several months of load at 3.74G/1.4875v (BIOS) and briefly at 3.81G/1.55v. The Athlon operated at nothing close to these extremes, so I encountered no issues. The 65nm Conroe was quite solid, though, as it held through periods where I intentionally tested throttling (thus, accelerated aging) at 1.4875v. But as soon as I verified the CPU had degraded (maximum stable frequency down to 3.67G despite low FSB on a backup m/b), I immediately shut down and reset it to run at 3.2G/stock, which it has done so stably for over a year.

Part of saving any CPU from burnout is, I think, to closely monitor stability and react immediately on errors. I get the impression those with fried Wolfdales auto-restarted their systems after BSOD to troubleshoot and continued to operate for some time with unstable settings, possibly even bumping up their voltage further. While electromigration can cause sudden "power off" failure, when operating in borderline conditions, the symptoms tend to appear gradually over minutes to weeks.
 


Considering you are always so gung-ho to bash Intel read this :

The new design, called a Digital Thermal Sensor (DTS), no longer relied on the use of an external biasing circuit where power conditioning tolerances and slight variances in sense line impedances can introduce rather large signaling errors. Because of this, many of the reporting discrepancies noted using the older monitoring methods were all but eliminated. Instead of relying on each motherboard manufacturer to design and implement this external interface, Intel made it possible for core temperatures to be retrieved easily, all without the need for any specialized hardware. This was accomplished through the development and documentation of a standard method for reading these values directly from a single model specific registers (MSR) and then computing actual temperatures by applying a simple transformation formula. This way the complicated process of measuring these values would be well hidden from the vendor.

And here is something another reader stated earlier and got out of it as well:

This confirms Intel's documented errata on motherboard unable to measure E8400's DTS accurately, and thereby causing thermal interrupts.

Basically that some mobos are having problems reading the DTS sensor due to the fact that they are not calibrated for it. Thus a BIOS update may be the way to fix that.

If it were trully a errata on the E8400 then every last one would be affected by this and not all of them are. There are some who have been able to OC to 4GHz and the temps read normaly and stay well below the thermal trip.
 


Using my E6400 as an example... It ran @ 3.5 ghz at stock voltage. I have it overvolted to 1.375 volts to get 3.7 ghz out of it, and stays cool under load. By going to 1.5 volts I am only able to get a maximum stable speed of 3816 ghz, and temps went through the roof. Common sense tells you where the comfort zone is. As a rule of thumb backing the speed and voltage down just slightly from the maximum stable settings will result in a very long lasting CPU. I am not an AMD fan, but have owned plenty of them. While I agree they have different characteristics, the safe/reliable overclock limits are just as easy to find.
 
Wow, a lot of research and knowledge in this forum.

So it goes to say,

1) If you overclock your processor either through FSB method (placing strain on other system components) or by unlocked multiplier whilst bumping up the voltage to acheive maximum speed you will shorten the life of said processor or possibly even blow it, lol

Most of it is luck of the draw, ran a 130nm P4 2.5 ghz chip with a 0.4v bump on it to get a stable 3.2 which ran for three years, fsb method with the 800mhz fsb at over 980mhz, and this s478 board is still going strong five years in at that. Blew a Socket A duron with a 100mhz increase no voltage lol. Intel make fine chips of high quality, if you are worried, buy a faster chip so you dont need to overclock.

(Personally i like to game with an apple in my throat tearing at the edge of voltage extremes)

2) the 45nm processors are more prone to 'electromigration' and overvolting them can cause damage easier than 65nm processors, lol

Of course this is the case, they could not even get 45nm processors working without electromigration without High-K gates to work, think about it 45nm, thats one itsy bitsy tiny transistor. this problem will manifest its-self further the smaller the manu process.

3) Cooling your processor is they key to longetivity whilst performing the dark art of overclocking, lol

Of course the cooler the processor the longer it will live, this is true of all electrical components, right down to a simple electrical motor. This is why Toms Hardware should get a thumbs up for their cooling guide to help 'NOOBS!' choose a bloody cooler for that new 45nm processor they will happily change a few googable settings in their bios'es to near on double the performance of said processors.

Finally, once nelhalem hits, and forward on to 32nm revision, the need to overclock will eventually dwindle back to the hardcore who do it for the love of it because they can? the power to do everything else will be in the standard clocked chips in anycase, and this will be because the software we run will catch up and start using the power offered correctly.

(very few 4 core threaded applications out there) :sol:
 

More interesting points from yourself :)
Personally, from a Physics stand-point I believe a CPU will degrade overtime, whether at it's 'stock' voltage or not, but regardless either way, that is my opinion.
But I certainly think that as I feel I don't understand enough of OCing and can't really be bothered with an exotic cooling method, that I will be quite happy to pay extra for something faster :)
Looking into the future, surely with more die & process shrinks, OCing is going to become a dark art once more as it becomes more and more difficult to get something stable. In my opinion, the Conroe was probably the sweet spot for OCing as it has history behind it, whereas the Wolfdale doesn't, I once again put the caveat on that, is MY OPINION so don't anyone start a flame war. :)
 
Actually it was a good article.

I agree the forums there are not so good ... but a few threads are outstanding nevertheless.

We also have the smartest mod ... but If I tell you his name he is likely to post another essay ... LOL ... <groan>.

Thanks for the info about the temperature measurement Yo.

The point about the sweet spot ... where you get a very good return for the minimum overvolt is a good point ... and well worth remembering.

Pushing excess voltage into a given chip just for that extra 200Mhz is not worth it in the end.

Plus the odd bit of flaming here isn't taken so seriously ...



 
Using my E6400 as an example... It ran @ 3.5 ghz at stock voltage. I have it overvolted to 1.375 volts to get 3.7 ghz out of it, and stays cool under load. By going to 1.5 volts I am only able to get a maximum stable speed of 3816 ghz, and temps went through the roof. Common sense tells you where the comfort zone is. As a rule of thumb backing the speed and voltage down just slightly from the maximum stable settings will result in a very long lasting CPU. I am not an AMD fan, but have owned plenty of them. While I agree they have different characteristics, the safe/reliable overclock limits are just as easy to find.
I agree at the core, but my problems actually stemmed from the differences between the AMD and Intel cores. I had been using AMD for almost 6 years before getting the Conroe.

The A64 hit a pretty hard ceiling for stability, so it was easy to figure out where to set frequency and voltage. The Conroe, on the other hand, reached successively higher frequencies at gradually increasing Vcore bumps. I thought I had backed down the voltage from maximum stable, but not enough for this chip, as symptoms manifested after a few months of nonstop use.

Your E6400 is a more mature bin than my 6800, so while I also encountered the same stability ceiling at ~3816 with air, it appears my safe frequency range, dictated by required Vcore, is lower than yours, about 3.6G at ~1.375V.
 
Well I'm fairly positive that Intel Bins their processors based for certain speeds and cache levels. AMD's just recently started trying to go that way, sort of how the Tolimon tri core is an agena with a weak core disabled. This is also how Intel has such high yields, they only waste the completely fragged chips, and the ones at the edges of the wafers.

I can safely use the sl2yk P2 300mhz chip for this as an example. For those who don't know, the sl2yk retail/ sl2w8 oem p2 300 was the P2 version of the Celeron 300A. That particular run of p2 300s uses the Deschutes core, which was the exact same core, and had the exact same multiplier lock as the p2 450Mhz based on the deschutes core. The only difference was the default FSB that the chip was set to auto detect on. Those particular p2 300s could be installed on a 440bx based board, such as the old legendary Abit Bh6, and easily be OCed to 450Mhz by simply changing to FSB to 100mhz in the bios. Doing this you weren't OCing the rest of the components in your PC, as long as you had pc100 ram. But, some of those sl2yk's needed a bit of extra voltage to run at 450 completely stable, even though they were technically capable of doing so without it. I had one of the few that needed a bit more voltage to be stable. Only reason I knew I wasn't completely stable was I'd run for a few days just fine, then one day I'd have a hell of a time with lockups, and bsod. But, that was back before I started monitoring temps and all that stuff. Did I shorten the life of the chip? Possibly, but, I doubt it was by enough to matter for that die process. P2's were damn resilient chips as far as that goes. I still have an old Dell Latitude CPi A with a p2 366 in it that runs fine, and that things about 10 years old, use that for playing around with things that won't run on XP or run better on the old systems, like the old Mechwarrior 2 games.

You also have to take into account that Intel probably does a lot more than just stability testing when they Bin processors. They probably do electrical current and power draw testing. Binning the cores that don't meet the standards for a particular FSB or clock speed multi based on the amount of Power or Current it takes to run stable. Unless someones gone out of their way to do so, we don't know for example what the power usage difference is, or cpu current (amps) usage is for say an e2160 running oced to 3ghz, and an e6850 which it's base speed is 3ghz. Notice that both are rated at 65w at their default clocks.

Some things we don't know for example though. Maybe certain multiplier settings in the processors, have different resistance or impedance values associated with them. Which could possibly be changed during the binning process, for lower performing chips. And if thats the case what happens is, even though the two processors may have the same multi, one will require more voltage to run at at given speed than the other will. That is simply due to the fact it would require more voltage to get the same amount of current through the higher resistance circuits on the lower binned processor, thus allowing the processor to run stable. This may also be why a lot of people consider the 9x multi intel chips to be the sweetspot.

This may also be why even though the Extreme edition processors are all unlocked multipliers, they're still speed binned based of electical or max theoretical speed characteristics. Those processors that perform with the best characteristics are binned for the highest profit margin, and cherry picked for that purpose. So even though almost all Intel processors are based off the same die, there may be imperfections in a given number of cores, that cause the cpu to need more voltage than what they think is within their safe limit to get the desired level of performance.

@ Zenmaster You have some valid points about processor longevity, but, those processors are all likely based on manu process that is a lot more resilient that todays 65nm and 45nm cpu's. My guess is once they went below 90nm they probably expected much shorter lifespans for anything that is running out of their bin specs. So while a 90nm or larger processor may well run for 20 years or so, 15 if pushed with OCing, a 65nm or 45nm die based cpu may not. I'd say probably 10-15 years at manu spec on the 65nm cpu's, shortened down to about 5-7 on anything that requires voltage raises to run at higher clocks. While the base life span at 45nm may stay the same as 65nm, the overclocked life may be even shorter.

And before you say, but if you aren't raising the voltage then the OC shouldn't effect the life of a processor, thats not likely true. Even though you haven't raised the voltage on the processor, processor A while OCed will use more power and dissipate more heat than Processor B that isn't OCed but running at the same speed. Even though we use HSF on our CPU's, all they do is move to generated heat away from the processor as fast as physically possible, which in reality means all they really do is keep the heat from building up on the die to prevent hotspots from forming. Hot spots on a processor die are bad, much like Frankenstein's Monsters' little thing "Mmmm Fire BAAD". You gotta remember that as the die shrinks, the cores are having to dissipate an ever rising amount of heat per square cm or mm. Which is the exact reason why both AMD and Intel use IHS's on their current CPUs, so they can try and spread the heat per square cm out, making the processor die slightly easier to cool, and draw heat off of.
 
And don't forget to add that as die shrink the electrical field field also becomes more concentrated and that also contributes to susceptibility to electromigration with increase in heat or voltage.
 

Great, but you should have tried to understand what you asked me to read.
The DT sensor, and msr are a great combination, but only if they transmit an understandable temp to the mobo.
What they tell the bios is simply a negative number. That number is the degrees below tjunction that the processor is at. It gives no usable information. It doesn't even work right on Intel mobos.
Sort of like having the weatherman say "It's ten degrees cooler" Well yah right, it's always ten degrees cooler than something.
There is no errata here. The 8xxx are great chips. If I'm going to build some noob a system with one though, I want to know that an alarm is going to go off because the fan on the heatsink is so clogged with dust that it can hardly spin.