Inside Intel's Secret Overclocking Lab: The Tools and Team Pushing CPUs to New Limits

bit_user

Polypheme
Ambassador
@PaulAlcorn , thanks for the awesome piece!

I'm still making my way through it, but wanted to draw special attention to this bit:
the engineers told us they feel perfectly fine running thier Coffee Lake chips at home at 1.4V with conventional cooling, which is higher than the 1.35V we typically recommend as the 'safe' ceiling in our reviews. For Skylake-X, the team says they run their personal machines anywhere from 1.4V to 1.425V if they can keep it cool enough, with the latter portion of the statement being strongly emphasized.

At home, the lab engineers consider a load temperature above 80C to be a red alert, meaning that's the no-fly zone, but temps that remain steady in the mid-70’s are considered safe. The team also strongly recommends using adaptive voltage targets for overclocking and leaving C-States enabled. Not to mention using AVX offsets to keep temperatures in check during AVX-heavy workloads.
Thanks for that!
 
Aug 29, 2019
35
7
35
Some one should comparison between different vendors die size like Intel 10nm vs AMD 7nm to see if there is actually performance gain. I would use per-core speed and not taking multiple cores into account.
 

bit_user

Polypheme
Ambassador
@PaulAlcorn , uh oh. Now that I just finished heaping praise, I've got a gripe. In the penultimate paragraph:

... assures that the learnings lessons and advances made in the overclocking realm ...

I was saddened to see the "learnings" virus infecting your otherwise admirable writing.

I think "learnings" is one of those pseudo-jargon words that MBAs and other B-school types like to throw around, out of jealousy for practitioners of real professions. Everyone from auto mechanics to accountants, lawyers, and doctors needs jargon to adequately and efficiently express concepts and constructs central to their work. However, common sense pervades business to such a degree that I think they're embarrassed by how easily understandable it'd be, if they didn't inject some fake jargon to obscure the obvious. The resulting assault on the English language is disheartening, at best.

Yes, if you've ever heard of her, you probably guessed I'm a fan of Lucy Kellaway, former journalist of the Financial Times and BBC. Worth a read:


 
Last edited:
  • Like
Reactions: jakjawagon

Gurg

Distinguished
Mar 13, 2013
515
61
19,070
AMD CTO Mark Papermaster: "you can't rely on that frequency bump from every new semiconductor node." AMD's future outlook of very limited frequency bumps, performance increases only from more cores and expensive software modifications to use more cores.
Versus
Intel Ragland: "People who think this the end of the world for overclocking because our competitors' 7nm has very little headroom, that's not true. Intel is all about rock-solid reliability; our parts aren't going to fail...you can count on your part running at spec, so there's so much inherent margin that we will always have overclocking headroom...I think users will be happy with the margin we can offer in the future."

Ouch! Intel's Ragland really "punked" AMD's negative outlook.

PS Great fascinating article
 
  • Like
Reactions: CompuTronix

nofanneeded

Respectable
Sep 29, 2019
1,541
251
2,090
In the past OC gave a huge difference , today we can easy hit 4.4 all cores without OC and this is more than enough for me.

for me OCing is dead. and I dont care about missing 5 fps.

I put the price difference in a better GPU ...
 

CompuTronix

Intel Master
Moderator
Outstanding article! Thank you, Paul! I would love to have been there. I have a few dozen questions that the Team may or may not have been allowed answered.

However, like bit_user, I found it of particular interest that the Team was forthcoming regarding specific voltage and temperature values they're comfortable with running on their personal home rigs, which max out at 1.425 and 80°C. With respect to electomigation and longevity, every day in the forums we see many overclockers express their concerns over these very issues.

On their website, Silicon Lottery shows Historical Binning Statistics that include the Core voltages used to validate their overclocked 14 and 22nm processors. For 22nm the maximum is 1.360. For 14nm the maximum is 1.456. While Intel's warranty is 3 years, Silicon Lottery's warranty is 1 year, which suggests at least one reason for the voltage difference between Intel's Team and Silicon Lottery.

Here's a forgotten link to a revealing Tom's Hardware video interview of July, 2016, with Intel's Principal Engineer (Client Computing Group), Paul Zagacki, where Intel Discusses i7-4790K Core Temperatures and Overclocking. The video coincides with the formation of Intel's Overclocking Lab, also in 2016. In the video, Intel points out that overclocking abilities begin to "roll off" above 80°C, which agrees with the value the Team revealed in your article.

While Core temperatures, overclocking and Vcore are often highly controversial and hotly debated topics in at least the overclocking forums, the term "electromigration" is closely related to a much less known term, which is "Vt (Voltage threshold) Shift". With respect to voltage and temperature, the two terms describe the causes and effects of processor and transistor "degradation" at the atomic level.

In the Intel Temperature Guide, in Section 8 - Overclocking and Voltage, I created a table for Maximum Recommended Vcore per microarchitecture from 2006 to the present. For 22 and 14nm, those values are 1.300 and 1.400 respectively. I also created a graph showing the Degradation Curves for 22 and 14nm processors. The table and graph helps overclockers get a better perspective of the degradation and longevity issue:

iQuLSzu.jpg

Sparing our members and visiting readers the deep dive, Vt Shift basically represents the potential for permanent loss of normal transistor performance. Excessively high Core voltage drives excessively high current, power consumption and Core temperatures, all of which contribute to gradual Vt Shift over time. Core voltages that impose high Vt Shift values are not recommended. The 14nm curve suggest 1.425'ish is the practical limit, which also agrees with the value the Team revealed in your article. The curve also suggests that Silicon Lottery might be pushing the edge of the envelope a bit.

The concern here is that when novice overclockers casually glance around the computer tech forums, where conflicting and misleading numbers get flung around like gorilla poo in a cage, many don't realize through the fog of all the confusion that one size Vcore does not fit all. Aside from high Core temperatures, Vcore that might be reasonable for one microarchitecture can degrade another. So 22nm Haswell users now wanting to overclock their aging processors to keep up with today's games need to heed the degradation curves, which applies as well to 14nm Skylake and Kaby Lake users.

CT :sol:
 
  • Like
Reactions: bit_user
AMD CTO Mark Papermaster: "you can't rely on that frequency bump from every new semiconductor node." AMD's future outlook of very limited frequency bumps, performance increases only from more cores and expensive software modifications to use more cores.
Versus
Intel Ragland: "People who think this the end of the world for overclocking because our competitors' 7nm has very little headroom, that's not true. Intel is all about rock-solid reliability; our parts aren't going to fail...you can count on your part running at spec, so there's so much inherent margin that we will always have overclocking headroom...I think users will be happy with the margin we can offer in the future."

Ouch! Intel's Ragland really "punked" AMD's negative outlook.

PS Great fascinating article

To be fair Intel has total control over their process and design it for their needs specifically. AMD has no control over it and TSMC will design their process to meet multiple customers designs. I can even say that AMD probably will not be their largest customer so while TSMC may work with them they wont focus solely on their products.
 
  • Like
Reactions: CompuTronix
The comment that about overclocking not being dead and TSMC's process was exactly how I assumed. I think the same could be said about Intel's current 10nm process since its also using quad patterning. TSMC's current 7nm process is using quad patterning and not pushing densities very far. I would argue that was a good choice vs intels aggressive 10nm densities. TSMC's 7nm+ should hopefully provide more headroom as it ditched quad pattterning for EUV.
 

Math Geek

Titan
Ambassador
i am 100% insanely jealous you got to take this tour.

i am also 100% thankful you got to and knew the right questions to ask to get some info they may not have really wanted shared. getting them to talk about their home pc's was pretty much genius as that info obviously leads to some great insight into the true question they could not answer.

i'm gonna stop typing now and go back and read the article again :)
 
  • Like
Reactions: bit_user
getting them to talk about their home pc's was pretty much genius as that info obviously leads to some great insight into the true question they could not answer.

Not necessarily.

What THEY do at home, considering they are specialized Intel employees, and can not only afford to upgrade every single cycle, but likely do it for free anyhow given the perks of the job, is probably almost certainly not in line with what SHOULD be recommended for the average person or even some of the outliers riding just on the edge of what we'd consider "daily drive-able".

To be clearer, these guys don't care if their CPUs will last beyond the next cycle or even make it TO the next cycle, in terms of degradation from electomigration and VT shift, because they have an inside track to resources nobody else has, and the obvious financial ability to swap out faulty hardware at ANY time they wish to, so the fact that THEY are ok with a given voltage or configuration does not necessarily put me at ease assuming that is something which is safe or shows wisdom for anybody with a need for those parts to last a while to imitate.
 

Karadjgne

Titan
Ambassador
That's funny. That whole article, all the pages about LN2 and world records and help and feedback from the best and brightest OC'rs and what hit me the most is a couple of paragraphs on the next to last page.

I'm still running 3rd gen Intel, with only dreams of upgrade, no plans or even budget for plans, so my pc needs to last a while longer yet. Worry about cpu lifetime? Bet your bupkiss I do.

Is OC dead? Nope, it's in a wheelchair with busted legs and AMD stock performance isn't helping. We've been at @ 5GHz since the FX and Ivy-Bridge generations, and OC has been losing out ever since. The performance gains over stock honestly are no longer worth the bother to anything but a benchmark. If Intel wants to bring it back to life, John Q. Public needs to be looking at the possibility of 6GHz with currently available cooling. Not peltier and/or LN2 application. On a cpu that'll last longer than 1-2 generations before electromigration tears it apart.
 
Not necessarily.

What THEY do at home, considering they are specialized Intel employees, and can not only afford to upgrade every single cycle, but likely do it for free anyhow given the perks of the job, is probably almost certainly not in line with what SHOULD be recommended for the average person or even some of the outliers riding just on the edge of what we'd consider "daily drive-able".

To be clearer, these guys don't care if their CPUs will last beyond the next cycle or even make it TO the next cycle, in terms of degradation from electomigration and VT shift, because they have an inside track to resources nobody else has, and the obvious financial ability to swap out faulty hardware at ANY time they wish to, so the fact that THEY are ok with a given voltage or configuration does not necessarily put me at ease assuming that is something which is safe or shows wisdom for anybody with a need for those parts to last a while to imitate.

They probably get paid well but not quite as well as some people think. As for replacing parts, from friends who worked at Intel they stated they can get a discounted price (very good) on a current model CPU once a year. That may have even changed as last I heard of that was quite a while ago.

I would expect them, while stating personal feelings, to still remain on the safe side. Mainly because even though its their personal opinions because this came out in an article about their jobs they could still get in hot water.

That's funny. That whole article, all the pages about LN2 and world records and help and feedback from the best and brightest OC'rs and what hit me the most is a couple of paragraphs on the next to last page.

I'm still running 3rd gen Intel, with only dreams of upgrade, no plans or even budget for plans, so my pc needs to last a while longer yet. Worry about cpu lifetime? Bet your bupkiss I do.

Is OC dead? Nope, it's in a wheelchair with busted legs and AMD stock performance isn't helping. We've been at @ 5GHz since the FX and Ivy-Bridge generations, and OC has been losing out ever since. The performance gains over stock honestly are no longer worth the bother to anything but a benchmark. If Intel wants to bring it back to life, John Q. Public needs to be looking at the possibility of 6GHz with currently available cooling. Not peltier and/or LN2 application. On a cpu that'll last longer than 1-2 generations before electromigration tears it apart.

Even without overclocking electromigration is an issue as we delve into smaller process nodes. Intel even patented an idea to have "reserve" cores for just this case:

https://www.tomshardware.com/news/intel-patent-many-core-processor-multicore,14205.html

That was a long time ago too. But its still something that we face with smaller nodes at stock voltages and temperatures unless we move away from silicon and to some miracle material.
 

Karadjgne

Titan
Ambassador
Hmm, dunno how well that patent applies though, SSDs do it all the time with redundancy. The only way I see it happening as imagined is if they somewhat drastically increase die size to accommodate the extra Tx necessary. But the whole thing is kinda moot except in specific application, given the general working lifespan of a cpu. Software generally renders a cpu dead long before electromigration does.
 
Hmm, dunno how well that patent applies though, SSDs do it all the time with redundancy. The only way I see it happening as imagined is if they somewhat drastically increase die size to accommodate the extra Tx necessary. But the whole thing is kinda moot except in specific application, given the general working lifespan of a cpu. Software generally renders a cpu dead long before electromigration does.

I don't disagree it just shows that a company as involved with this as Intel is already considering it without overclocking in mind. It is an issue that they know will rear its head as they push towards smaller process nodes. The question is when it will and if they and others will have an alternative material worked out in time to move to it.

Software is absolutely the killer of CPUs but it is just something thats on the minds.
 
friends who worked at Intel they stated they can get a discounted price (very good) on a current model CPU once a year.

Yeah, but you're not talking about people who work WITH the CPUs, in a department specifically intended for punishing and killing them. To think these guys don't have the ability to source parts or take them home, is not very realistic. I can understand for other employees, but I don't think we're talking about the same kind of scenario with people who are IN this department and likely have unlimited access to as many CPUs as they need or want for testing.
 
Yeah, but you're not talking about people who work WITH the CPUs, in a department specifically intended for punishing and killing them. To think these guys don't have the ability to source parts or take them home, is not very realistic. I can understand for other employees, but I don't think we're talking about the same kind of scenario with people who are IN this department and likely have unlimited access to as many CPUs as they need or want for testing.

IDK. The guy I knew worked in silicon development and showed me the tiers. For Intel employees it was a pretty damn good deal, such as top end CPUs for a couple hundred vs the $1K plus they cost, and third parties qualified for some decent discounts. It is much like car sales. Direct workers get a very nice discount, third parties directly involved get decent and dealers get decent discounts.

I can't speak for them just taking parts although I would assume they have to log all the parts used.

I still would hope they wouldn't push anything beyond what is safe based on their testing. It can easily be seen as an official statement, you know how people construe things.
 

Karadjgne

Titan
Ambassador
I still would hope they wouldn't push anything beyond what is safe based on their testing.
Ahh, but that's just it. They can't. Not unless they've somehow figured out the decay rate for their cpus vrs expected lifespan. 1.4v might be perfectly 'safe' if the expected usage isn't going to span much more than 2-3 years, there's enough redundancy and Tx to handle that at least, but what about the general public that is like me and will hang onto the same cpu for 7-8 years or more if it can deal with the software. Might need to cut that back to 1.35v if you plan on longetivity vrs right now performance.

I mean you can do a quick burnout in your car, and the tires will look and act like they did prior, but in reality you just lost 5000 miles worth of wear.
 
Yes, I still don't believe that the guys working in the overclocking department are intending to say that the "safe" voltage they run their CPUs at, overclocked, at home, is the same "safe" they'd recommend for people who want their CPU to last five years or more. I just don't believe that.

Like I say, a pro gas engine builder thinks it's "safe" to run dual shot NOS in his engine or straight alcohol fuel, because he knows he's going to be rebuilding it every ten to twenty runs if not sooner. I doubt he's going to recommend his weekend warrior young neighbor should do the same thing when he helps him rebuild and configure the streetable Camaro in his garage that he drives to school and hotrods around town in on weekends.
 

CompuTronix

Intel Master
Moderator
From page 6, paragraph 6:

" ... This is definitely among the most secret of Intel's tech in the lab, so there was quite a bit of trepidation from the lab team about exactly what they could show us, and what we could show you. After several clarifying conversations between the lab crew and the PR team assigned to our tour, and some negotiation on our part, the team allowed us ... "

From page 1, paragraph 2:

" ... To our knowledge, we are the first tech news outlet to visit and report on the facility ... "

No doubt the overclocking lab team was well briefed and prepared by Intel's PR team prior to Tom's visit, so I would think that the lab team was cautioned specifically against disclosing any information that would violate their NDA's, or could be construed as misleading to the public, which would potentially be grounds for dismissal in many companies.

Also, it's interesting to note that the maximum Vcore value the OC lab team stated they're comfortable with running on their personal "home rig" is 1.4 to 1.425, emphasizing "if they can keep it cool enough" (page 7, paragraphs 8 & 9) which is <80°C.

Moreover, considering that the graph in my previous post for Vt Shift Degradation Curves does indeed support the OC team's home rig maximum Vcore of 1.425, I still maintain that a maximum of 1.4 for a daily driver is reasonable, provided that <80°C is observed, and Adaptive Vcore, SpeedStep, Speed Shift and all C States remain enabled, as recommended by the OC team (page 7, paragraph 9).
 
I still maintain that a maximum of 1.4 for a daily driver is reasonable, provided that <80°C is observed, and Adaptive Vcore, SpeedStep, Speed Shift and all C States remain enabled, as recommended by the OC team (page 7, paragraph 9).

This is where I'm at. Under 80 and less than 1.4v, for any system you care to have last beyond the next gen release.