News Intel Rocket Lake Six-Core CPU Shows Off 4.2 GHz Boost Clock

Impressive that the same boost clock of the 6700K from 4 years ago has been achieved...again ;/
To be fair, it's a new architecture. Skylake had 300MHz lower clocks than Devil's Canyon but delivered about 15% more performance thanks to the architectural updates. It could be Intel will have some tweaks that make the lower clocks sensible -- the high-end Zen 2 CPUs still operate at all-core frequencies of around 4.2GHz as well.

The other option, which is possibly more likely, is that these ES chips are intentionally set to lower clocks. There could be 5.0GHz and higher retail models. Not sure if that will happen if the new architecture boosts IPC, though, as that would typically mean more transistors and more complexity, and sticking with 14nm Rocket Lake could be pretty toasty.
 
  • Like
Reactions: alextheblue

InvalidError

Titan
Moderator
The other option, which is possibly more likely, is that these ES chips are intentionally set to lower clocks. There could be 5.0GHz and higher retail models.
Since Comet Lake is barely getting to 5GHz stock boost and Willow Cove has two architectural generations worth of extra stuff crammed in its cores, I wouldn't expect Rocket Lake to get anywhere near 5GHz. For *Cove to reach 5GHz, Intel will need 10nm++(+) or 7nm.
 
The other option, which is possibly more likely, is that these ES chips are intentionally set to lower clocks. There could be 5.0GHz and higher retail models. Not sure if that will happen if the new architecture boosts IPC, though, as that would typically mean more transistors and more complexity, and sticking with 14nm Rocket Lake could be pretty toasty.
I agree. Architectural upgrades had always brought bigger dies. Plus, backporting to 14nm is always going to be pose extra issues on die size, thermals, and power if cores and clocks stay the same. Don't expect anywhere above 10 cores, and 4.5 GHz boost. But since it's a new architecture, even the 8 core version can beat 10-core Comet Lake in multi-threaded performance if configured properly, so Intel did the best they could here.

AMD is also going for a 20% IPC uplift in the next gen. Competition is great.
 

Deicidium369

Permanantly banned.
BANNED
Mar 4, 2020
390
61
290
To be fair, it's a new architecture. Skylake had 300MHz lower clocks than Devil's Canyon but delivered about 15% more performance thanks to the architectural updates. It could be Intel will have some tweaks that make the lower clocks sensible -- the high-end Zen 2 CPUs still operate at all-core frequencies of around 4.2GHz as well.

The other option, which is possibly more likely, is that these ES chips are intentionally set to lower clocks. There could be 5.0GHz and higher retail models. Not sure if that will happen if the new architecture boosts IPC, though, as that would typically mean more transistors and more complexity, and sticking with 14nm Rocket Lake could be pretty toasty.

Not expecting to see 5GHz clocks for quite a while if at all - 3.5GHz base will prob be 4-4.5GHz on the high spec product... can't see much more than that initially.
 

Deicidium369

Permanantly banned.
BANNED
Mar 4, 2020
390
61
290
I agree. Architectural upgrades had always brought bigger dies. Plus, backporting to 14nm is always going to be pose extra issues on die size, thermals, and power if cores and clocks stay the same. Don't expect anywhere above 10 cores, and 4.5 GHz boost. But since it's a new architecture, even the 8 core version can beat 10-core Comet Lake in multi-threaded performance if configured properly, so Intel did the best they could here.

AMD is also going for a 20% IPC uplift in the next gen. Competition is great.
They have been saying they are increasing IPC with every gen - but yet still have no caught up yet. So take that 20% (more than likely fanboy sourced, not AMD) - it is from Another Marketing Deception after all.

8 cores on Rocket Lake S - and something around a 24EU Xe LP IGP. Clocks will be at least 4GHz base - and maybe a little later 4,5GHz..
 

InvalidError

Titan
Moderator
They have been saying they are increasing IPC with every gen - but yet still have no caught up yet. So take that 20% (more than likely fanboy sourced, not AMD) - it is from Another Marketing Deception after all.
Just look at Ryzen 3300X vs 3100, AMD is getting ~15% higher performance in gaming simply from having a single-CCX layout. Zen 3 has single-CCX CCDs, so that may be the bulk of Zen 3's IPC gains right there.
 
They have been saying they are increasing IPC with every gen - but yet still have no caught up yet. So take that 20% (more than likely fanboy sourced, not AMD) - it is from Another Marketing Deception after all.
No. You don't have a clue. It's from Retired Engineer, who is connected with the most reliable industry sources from Taiwan. They have caught up pretty well especialy in non-latency sensitive workloads.
 

MasterMadBones

Distinguished
They have been saying they are increasing IPC with every gen - but yet still have no caught up yet. So take that 20% (more than likely fanboy sourced, not AMD) - it is from Another Marketing Deception after all.
AMD is currently in the lead with IPC on desktop and the advertised improvements have been pretty accurate since Zen. My only issue with the 20% is that it is the upper boundary of what is supposed to be somewhere between 15% and 20%. I prefer to use the lower boundary when talking about rumours.
 

InvalidError

Titan
Moderator
AMD is currently in the lead with IPC on desktop
IPC is workload-dependent. AMD's IPC takes a nosedive whenever the workload involves significant communication between cores on different CCXes, which is why Intel is continue dominating gaming at least until Zen 3.

I looked at some i10400 vs Ryzen 3600 reviews/benchmarks and the i5 looks surprisingly good for gaming and power-efficiency despite its 14nm-manyplusses handicap, only costs $20 more. Not much info on H470 motherboard pricing yet.
 

MasterMadBones

Distinguished
IPC is workload-dependent. AMD's IPC takes a nosedive whenever the workload involves significant communication between cores on different CCXes, which is why Intel is continue dominating gaming at least until Zen 3.
True, but we generally use geomean SPEC performance as the IPC benchmark. It sounds like the same applies to Zen 3, since some of the engineers said the IPC gain felt much larger than 15-20%.
 

spongiemaster

Admirable
Dec 12, 2019
2,273
1,277
7,560
Intel is going to need to get the clocks a bit higher before launch, maybe up to 4.7Ghz. Hopefully Rocket Lake will see about 20% IPC improvement over Skylake. About 15% moving to Sunny Cove and another 5% to Willow Cove. At 4.2Ghz, the 5 GHz boost of the 9900k gives it a 20% higher clock speed, so the overall per core performance is still treading water, though hopefully with somewhat lower power despite still being on 14nm.
 
Since Comet Lake is barely getting to 5GHz stock boost and Willow Cove has two architectural generations worth of extra stuff crammed in its cores, I wouldn't expect Rocket Lake to get anywhere near 5GHz. For *Cove to reach 5GHz, Intel will need 10nm++(+) or 7nm.
I'm not expecting all-core boost clocks on a presumably more complex architecture to match Comet Lake. However, a tweak to pipeline length could make that possible at least, and boost clocks of close to 5GHz on lighter workloads are at least plausible. We need more concrete details on the Cove CPU architecture to say for sure. But I still think if the current boost clock is limited to 4.2GHz, that's because of ES rather than the final plans for Rocket Lake.
 

InvalidError

Titan
Moderator
However, a tweak to pipeline length could make that possible at least, and boost clocks of close to 5GHz on lighter workloads are at least plausible.
The last time Intel sacrificed pipeline length for clocks, we got Netburst and that did not end well with the first generation getting destroyed by the P3 in most benchmarks despite the P4 having a 500-1000MHz clock advantage and the P4 got its ass handed to it again with the Core2 3+GHz P4 while at a 500-1000MHz handicap as well. The last time AMD tried deep pipeline in pursuit of high clocks, we got Faildozer which got its ass handed to it against an i3 even when OC'd to 5GHz. Trading deep pipelines for higher clocks rarely works.

It makes no sense to make the pipeline longer when the IPC penalty from having higher execution latency (can't pack execution units as tightly when dependencies take one or two extra clocks to become available) is greater than the clock gain. Deeper pipelines also waste more power and silicon on clock distribution and clocking data. Splitting the pipeline in smaller stages also means more total cycle time wasted on setup-and-hold times between data latches that cannot be used for useful work.

Intel has been refining the Core architecture for 14 years, pretty sure Willow Cove is on a knife's edge between IPC potential and clocks where adding an extra pipeline stage is practically guaranteed to cause more harm than good.
 
The last time Intel sacrificed pipeline length for clocks, we got Netburst and that did not end well with the first generation getting destroyed by the P3 in most benchmarks despite the P4 having a 500-1000MHz clock advantage and the P4 got its ass handed to it again with the Core2 3+GHz P4 while at a 500-1000MHz handicap as well. The last time AMD tried deep pipeline in pursuit of high clocks, we got Faildozer which got its ass handed to it against an i3 even when OC'd to 5GHz. Trading deep pipelines for higher clocks rarely works.

It makes no sense to make the pipeline longer when the IPC penalty from having higher execution latency (can't pack execution units as tightly when dependencies take one or two extra clocks to become available) is greater than the clock gain. Deeper pipelines also waste more power and silicon on clock distribution and clocking data. Splitting the pipeline in smaller stages also means more total cycle time wasted on setup-and-hold times between data latches that cannot be used for useful work.

Intel has been refining the Core architecture for 14 years, pretty sure Willow Cove is on a knife's edge between IPC potential and clocks where adding an extra pipeline stage is practically guaranteed to cause more harm than good.
The thing is, NetBurst didn't have a bunch of other stuff in place to help it remain viable. In 2000, when NetBurst came out, it went from the Pentium 3's 12 stage pipeline to a 20 stage pipeline ... and Prescott jumped the shark with 31 stages. The lengthy pipeline allowed for higher clockspeeds, which at 130 and 90 nm was too hot to handle. At 10nm, 5GHz is a different story.

Plus, if you look at pipeline length, early NetBurst isn't necessarily that much longer than modern Skylake. There are all sorts of tradeoffs to make, and we don't know yet precisely what Intel is going to do. But two to as many as four extra stages combined with better branch prediction isn't a huge deal. 10-15 extra stages? Yeah, that leads quickly to bad things usually.

Modern pipelines are usually around 15 stages, and Intel has been in the 14-16 range since Nehalem. But Ice Lake is already apparently 14-20 stages, depending on which pipeline and instruction are being executed. There's a lot of wiggle room, depending on what else is done, and no absolutely universal answer as to what is best.

If four extra pipeline stages allow clocks to be 20-25% higher and only cause a 5% loss in performance due to branch mispredictions, it could be a net win. Power and efficiency also come into play, naturally. Bulldozer had many other issues that caused problems beyond the long pipeline -- like the unusual "2 partial cores" approach, and a lot of "edge cases" that ended up being more like the typical case and tanked performance.

Keep in mind, Willamette only had 42 million transistors -- you could make it the equivalent of 168 million for a quad-core variant, or 336 million for 8-core. Northwood was 55 million (~220 million equivalent for 4-core, 440 million for 8-core). Even Prescott with it's 31-stage pipeline was only 125 million, and a big chunk of that went to the L2 cache at the time. So 1 billion transitors for 8-core with an 8MB L2 equivalent, maybe.

With Comet Lake, Intel is already sitting at around ... well, it's not saying, but 2-core + GT2 Skylake was 1.75 billion, so probably at least 3-4 billion for the full 10-core chip seems likely. (SKL-X is 8.33 billion for 18-core with no GPU, for example. And a big chunk is the L3 cache, naturally.) Point being, with a budget of at least 3-4 times as many transistors per core, a lot can be done that makes a change in pipeline length not entirely out of the question.

Anyway, I'm not saying Intel is or even should have a longer pipeline than Skylake, but until it does a deep dive on Willow Cove or Golden Cove or whichever cove is inside Rocket Lake, we won't know what has changed.

Pipeline length is a lot like execution width. We can't really go much wider on designs -- 6-wide fetch and dispatch is already so wide that often most of the execution slots end up unused. I mean, what's the actual IPC for any given program on a modern AMD or Intel CPU? I've heard it ranges from about 0.7 to 2.0, with an average of maybe 1.4. That's out of a theoretical IPC of 6. If Intel went with an 8-wide design, it would probably only improve average IPC from 1.4 to 1.45 or something minuscule. And yet, we have to do something to get faster chips.
 

Deicidium369

Permanantly banned.
BANNED
Mar 4, 2020
390
61
290
Just look at Ryzen 3300X vs 3100, AMD is getting ~15% higher performance in gaming simply from having a single-CCX layout. Zen 3 has single-CCX CCDs, so that may be the bulk of Zen 3's IPC gains right there.
if all the IPC uplifts were true then 3GHz AMD would be out performing 5GHz Intel - and that is just not the case. No major new architecture coming in Zen3 - it's just the latest iterative update to Ryzen left by Keller. Zen 1/Zen 1 Refresh - small iterative update - Zen 2 move to TSMC's 10nm class process and multi chip modules - new process, new packaging - still not a huge departure from Zen 1/R. Zen3 is more of the same, slightly more optimized layout - which is typical in a product line. Intel has been doing that with Skylake and 14nm for ages - do you think the latest 14nm is the same as the 1st 14nm product? - not even close. The last flagship on that process drops this year in the form of Rocket Lake S - if they can get to a frequency optimized design on the Willow Cove cores it will be hard to beat. Maybe 4GHz initially -but the Flagship should see 4.5GHz relatively quickly - and with the 30% IPC uplift in Sunny Cove - Willow Cove should see another IPC uplift - likely in the 15% range over Ice Lake. I do not expect to see 5GHz stock clocks on RLS. I hope Alder Lake is not some big.LITTLE thing - I am hoping it was some crossed wires somewhere - if that's the case I will pass over Alder Lake like I passed over Comet Lake.

Any way you look at it - for people like us - this is an exciting time. Choosing either to build for yourself - everyone comes out a winner. Now if we can just get that massive breakthrough in programming for making code parallel easily - then all the extra cores we have will truly be useful. But we have been waiting on that breakthrough since the first MP computers. Alot of the benefits were realized with virtualization - in business - but some way of making a game use all the available power will be great.
 

Deicidium369

Permanantly banned.
BANNED
Mar 4, 2020
390
61
290
The thing is, NetBurst didn't have a bunch of other stuff in place to help it remain viable. In 2000, when NetBurst came out, it went from the Pentium 3's 12 stage pipeline to a 20 stage pipeline ... and Prescott jumped the shark with 31 stages. The lengthy pipeline allowed for higher clockspeeds, which at 130 and 90 nm was too hot to handle. At 10nm, 5GHz is a different story.

Plus, if you look at pipeline length, early NetBurst isn't necessarily that much longer than modern Skylake. There are all sorts of tradeoffs to make, and we don't know yet precisely what Intel is going to do. But two to as many as four extra stages combined with better branch prediction isn't a huge deal. 10-15 extra stages? Yeah, that leads quickly to bad things usually.

Modern pipelines are usually around 15 stages, and Intel has been in the 14-16 range since Nehalem. But Ice Lake is already apparently 14-20 stages, depending on which pipeline and instruction are being executed. There's a lot of wiggle room, depending on what else is done, and no absolutely universal answer as to what is best.

If four extra pipeline stages allow clocks to be 20-25% higher and only cause a 5% loss in performance due to branch mispredictions, it could be a net win. Power and efficiency also come into play, naturally. Bulldozer had many other issues that caused problems beyond the long pipeline -- like the unusual "2 partial cores" approach, and a lot of "edge cases" that ended up being more like the typical case and tanked performance.

Keep in mind, Willamette only had 42 million transistors -- you could make it the equivalent of 168 million for a quad-core variant, or 336 million for 8-core. Northwood was 55 million (~220 million equivalent for 4-core, 440 million for 8-core). Even Prescott with it's 31-stage pipeline was only 125 million, and a big chunk of that went to the L2 cache at the time. So 1 billion transitors for 8-core with an 8MB L2 equivalent, maybe.

With Comet Lake, Intel is already sitting at around ... well, it's not saying, but 2-core + GT2 Skylake was 1.75 billion, so probably at least 3-4 billion for the full 10-core chip seems likely. (SKL-X is 8.33 billion for 18-core with no GPU, for example. And a big chunk is the L3 cache, naturally.) Point being, with a budget of at least 3-4 times as many transistors per core, a lot can be done that makes a change in pipeline length not entirely out of the question.

Anyway, I'm not saying Intel is or even should have a longer pipeline than Skylake, but until it does a deep dive on Willow Cove or Golden Cove or whichever cove is inside Rocket Lake, we won't know what has changed.

Pipeline length is a lot like execution width. We can't really go much wider on designs -- 6-wide fetch and dispatch is already so wide that often most of the execution slots end up unused. I mean, what's the actual IPC for any given program on a modern AMD or Intel CPU? I've heard it ranges from about 0.7 to 2.0, with an average of maybe 1.4. That's out of a theoretical IPC of 6. If Intel went with an 8-wide design, it would probably only improve average IPC from 1.4 to 1.45 or something minuscule. And yet, we have to do something to get faster chips.
Sunny Cove is Ice Lake. Willow Cove is Tiger Lake. Golden Cove is Alder Lake. There is no question whatsoever. Rocket Lake S is Willow Cove - whether it is the exact sme core as the Willow Cove in Tiger Lake remains to be seen.

The key is being able to use all the capabilities of the CPU - whether it be AMD or Intel - that relies on a major breakthrough in programming that allows the easy and efficient method to make code more parallel. That problem has existed since the first Cray MP. Modern Super Computers get a TON of hand optimized libraries to take advantage of whatever resources that system has available. Silicon Graphics called it "Desktops to Teraflops"- code that would run on a single socket desktop system all the way up to their 512 socket - 2048 socket Origin systems. I was a Sysadmin on SGI since the Challenge L/M days - all the way through the Origin 2000 and ending with the Origin 300 (300 not the 3000). We need a Manhattan style project to make the changes we need - maybe our current paradigm isn't up to the task - was unthinkable sitting in front of my Atari ST to be talking about potentially hundreds of cores in a single socket. The paradigm is changing in the Data Center - the dis aggregation of monolithic servers to pools of CPUs, pools of GPUs, pools of traditional DRAM and Non Volatile memory, pools of AI processors, pools of FPGAs - all connected with CXL over PCIe5 and 6. This trend started with SANs - storage was no longer housed in a server, but a standalone system (which was basically a server itself) - combined with the open / OCP designs for network switches and other things coming out from those projects - will be cool to watch - 10 years we will look over the new way and barely remember how it used to be.
 
  • Like
Reactions: refillable

InvalidError

Titan
Moderator
if all the IPC uplifts were true then 3GHz AMD would be out performing 5GHz Intel
AMD started Ryzen from 50+% behind Intel clock-for-clock, it had huge handicaps to catch up on. AMD's IPC gains have been demonstrated in productivity and video editing software where Ryzen actually has managed to take the lead on Intel. However, Ryzen's relatively high inter-core latency due to its CCX arrangement is still AMD Achilles's heel in gaming, which is why Intel keeps dominating gaming and other applications where low latency is king.
 

InvalidError

Titan
Moderator
The thing is, NetBurst didn't have a bunch of other stuff in place to help it remain viable. In 2000, when NetBurst came out, it went from the Pentium 3's 12 stage pipeline to a 20 stage pipeline ... and Prescott jumped the shark with 31 stages.
And how much extra clock did Intel get from increasing the pipeline length and power by 50%? Less than 20%.

The worst part is that clock-for-clock, Prescott ended up losing to Northwood in most games and latency-sensitive applications by 10-20%, rendering the clock and power draw increases largely pointless. Clocks were officially dead as a primary design objective on Intel's side.
 

spongiemaster

Admirable
Dec 12, 2019
2,273
1,277
7,560
Maybe 4GHz initially -but the Flagship should see 4.5GHz relatively quickly - and with the 30% IPC uplift in Sunny Cove - Willow Cove should see another IPC uplift - likely in the 15% range over Ice Lake.

12114920848l.jpg

Intel claims Sunny Cove has on average about 18% higher IPC than Skylake. If you look at the slide above, Intel is not targeting single threaded performance with Willow Cove. Maybe 5%. We'll have a better idea when Tiger Lake is released in a few months. Willow Cove is not going to have 50% better IPC than Skylake. You need to either share your crack with the rest of us, or put the pipe down.

You also need to stop using single core performance and IPC interchangeably. They are not the same thing. AMD surpassed Intel on average in IPC with the 3000 series. Due to a significant clock advantage, which has nothing to do with IPC, Intel still maintains the per core performance title.
 

spongiemaster

Admirable
Dec 12, 2019
2,273
1,277
7,560
And how much extra clock did Intel get from increasing the pipeline length and power by 50%? Less than 20%.

The worst part is that clock-for-clock, Prescott ended up losing to Northwood in most games and latency-sensitive applications by 10-20%, rendering the clock and power draw increases largely pointless. Clocks were officially dead as a primary design objective on Intel's side.

Intel thought they could get the Netburst architecture to 10Ghz. Kind of baffling how wrong the engineers got that one. Had they gotten anywhere close, the performance would have been pretty impressive.
 
And how much extra clock did Intel get from increasing the pipeline length and power by 50%? Less than 20%.

The worst part is that clock-for-clock, Prescott ended up losing to Northwood in most games and latency-sensitive applications by 10-20%, rendering the clock and power draw increases largely pointless. Clocks were officially dead as a primary design objective on Intel's side.
That's the whole problem of going too far. Tejas got canned after Prescott because the clock speed scaling Intel had expected wasn't realized. And that happened because the shrink from 130nm to 90nm to 65nm had some unforeseen consequences, which led to the whole (now aborted) tick-tock cadence of Intel CPU architectures. Right now, Intel is already basically back into the too much power phase (Comet Lake), but architectural tweaks may help. Or they may not.

If Rocket Lake and its Willow Cove cores (tuned/tweaked Willow Cove, whatever that ends up meaning, as the cores likely won't be identical to the Tiger Lake chips -- which is what I was getting at earlier Deicidium) have an IPC uplift of 15% relative to Comet Lake but end up clocked 500MHz slower, the net gains will potentially be quite small. I'm really not expecting much from Rocket Lake at this point -- what we really want is the 10nm or 7nm desktop successor to Rocket Lake, not another 14nm part in 2021!
 

InvalidError

Titan
Moderator
Right now, Intel is already basically back into the too much power phase (Comet Lake), but architectural tweaks may help. Or they may not.
Netburst's problem wasn't so much a "too much power" issue as it was a too many sacrifices for clocks one since Intel sacrificed 40-50% of its IPC relative to Coppermine/Tualatin to get there. Netburst needed to clock almost twice as high out of the gate to beat the P3 under all circumstances, kind of awkward when your 2GHz top-end new part barely beats your 1-1.3GHz previous-gen parts.

As for 10nm and beyond, hitting the physical node dimensions (or at least something close enough to call it as such) is only half the battle, still got to get the process to also yield the expected performance and quantities. As far as we can tell from available Icelake SKUs, 10nm+ does not appear to be there yet. Hopefully things will go more smoothly for 7nm in 2021.

I'm not expecting much out of 10nm. At this point, 10nm is mainly about Intel having to prove to investors that the billions it spent getting 10nm to work as intended as Intel told investors that 10nm was "on track" while delaying 10nm products due to setbacks for years in a row were not completely wasted.