News Intel Teases Rocket Lake: Double-Digit IPC Gains, Cypress Cove Architecture

Shadowclash10

Prominent
May 3, 2020
184
46
610
0
You know, I just don't feel super excited for Rocket Lake. Intel has lately just not done anything great. I was excited for Ampere because new GPU generation. I was excited for Big Navi because of something that lot's of people overlook - the RX 5700 XT was even better than the 2060 Super - 2070 Super based on bang-for-the-buck. I'm excited for Ryzen 5000 because Ryzen 3000 desktop was pretty good, and Ryzen 4000 in mobile was great, and Big Navi looks to be very good. What has Intel done lately that would make me believe that Rocket Lake is gonna be real good?
 

spongiemaster

Reputable
Dec 12, 2019
2,119
1,150
4,560
0
You know, I just don't feel super excited for Rocket Lake. Intel has lately just not done anything great. I was excited for Ampere because new GPU generation. I was excited for Big Navi because of something that lot's of people overlook - the RX 5700 XT was even better than the 2060 Super - 2070 Super based on bang-for-the-buck. I'm excited for Ryzen 5000 because Ryzen 3000 desktop was pretty good, and Ryzen 4000 in mobile was great, and Big Navi looks to be very good. What has Intel done lately that would make me believe that Rocket Lake is gonna be real good?
You're right, there have been so many delays and it's been so long since Intel has had anything really new on the desktop that we're to point of I'll care when I can actually buy it and not before. The other issue is that Rocket Lake definitely feels like a compromise/stop gap and not a best foot forward polished product. Alder Lake looks like a true next generation platform with a new Jim Keller architecture, but that's at least a year off if it isn't delayed. So again, wake me up when it gets here.
 

shady28

Distinguished
Jan 29, 2007
385
244
19,090
8
I don't know, I am pretty excited about Rocket Lake. Look at Tiger Lake, it got really good reviews but they were almost grudgingly given. And the comparisons, almost universally pitting the 4C/8T Tiger against 8C/8T and 8C/16T AMD mobile systems - nevertheless Tiger came out on top in anything not able to use all 8 cores (which is, 90% of real use cases). I'm not even talking about the excellent Xe iGPU here, I'm just talking the CPU benchmarks.

If you translate those benchmarks and reviews of Tiger Lake into an 8C/16T desktop part running at 5Ghz, it could be quite amazing.
 

spongiemaster

Reputable
Dec 12, 2019
2,119
1,150
4,560
0
I don't know, I am pretty excited about Rocket Lake. Look at Tiger Lake, it got really good reviews but they were almost grudgingly given. And the comparisons, almost universally pitting the 4C/8T Tiger against 8C/8T and 8C/16T AMD mobile systems - nevertheless Tiger came out on top in anything not able to use all 8 cores (which is, 90% of real use cases). I'm not even talking about the excellent Xe iGPU here, I'm just talking the CPU benchmarks.

If you translate those benchmarks and reviews of Tiger Lake into an 8C/16T desktop part running at 5Ghz, it could be quite amazing.
Intel just confirmed that Cypress Cove is based on Sunny Cove which Ice Lake uses, not Willow Cove that Tiger Lake uses.
 

jeremyj_83

Distinguished
I don't know, I am pretty excited about Rocket Lake. Look at Tiger Lake, it got really good reviews but they were almost grudgingly given. And the comparisons, almost universally pitting the 4C/8T Tiger against 8C/8T and 8C/16T AMD mobile systems - nevertheless Tiger came out on top in anything not able to use all 8 cores (which is, 90% of real use cases). I'm not even talking about the excellent Xe iGPU here, I'm just talking the CPU benchmarks.

If you translate those benchmarks and reviews of Tiger Lake into an 8C/16T desktop part running at 5Ghz, it could be quite amazing.
While Tiger Lake is very good, almost all of the performance gains it had over Ice Lake are all clock speed related. Also Tiger Lake only really looked very good when it was in the 28W configuration. At 15W it did provide a nice boost in performance, but it wasn't sustainable. Compared to Zen 2, both architectures have similar IPC. Therefore when the Intel can boot 20% higher I would expect the performance to be a lot higher in lightly threaded applications.
 

spongiemaster

Reputable
Dec 12, 2019
2,119
1,150
4,560
0
While Tiger Lake is very good, almost all of the performance gains it had over Ice Lake are all clock speed related. Also Tiger Lake only really looked very good when it was in the 28W configuration. At 15W it did provide a nice boost in performance, but it wasn't sustainable. Compared to Zen 2, both architectures have similar IPC. Therefore when the Intel can boot 20% higher I would expect the performance to be a lot higher in lightly threaded applications.
Tiger Lake also gets a boost from 2.5x more L2 cache as well as additional last level cache, but as I said above, Rocket Lake isn't using the same architecture as Tiger Lake that we thought, it's using the previous architecture, Sunny Cove.
 
Reactions: shady28

shady28

Distinguished
Jan 29, 2007
385
244
19,090
8
I'll just throw this out there. This is spec2006 integer and floating point ops - single thread.

The 1185G7 is a max 4.7Ghz part, tested here at both 15W and 28W.
The 4800U is a max 4.2Ghz part, rated at 25W.
That is an 11.9% difference in clock speed.

In integer operations, 15W tiger is 49% faster than than the 25W 4800U.

In Float operations, 15W tiger is 48.6% faster than the 25W 4800U.

An 11.9% higher clock speed resulting in 49% and 48.6% higher performance here would imply one heck of a lot more IPC than the chip is being given credit for. This is assuming they are both hitting that max clock - and at this wattage rating I bet neither Renoir nor Tiger are sustaining those clocks.

And yes, the big difference Tiger Lake brought was a combination of higher frequency and at the same time lower power consumption vs Ice Lake. Think about that statement for a moment.

High clocks in this case came with lower power consumption. That is a total win win.

I am really interested to see what this arch can do with 95W or 125W even. Will Rocket lake come out as a 5.5Ghz 125W part? It might just do that, given what Tiger did. Even if it doesn't what will an all core 5Ghz turbo look like on this thing? Look at how close to the single thread performance of a 5.3Ghz 10900K that 28W part is.

 
Reactions: panathas

InvalidError

Titan
Moderator
High clocks in this case came with lower power consumption. That is a total win win.
That "total win" only happened because 10nm+ was still a disappointment after four years worth of delays and still wasn't up to par with what Intel was hoping for last time I read about it.

The most important thing though is to keep in mind that Rocket Lake is 14nm-manyplusses, not 10nm, so there will be a significant power penalty there and Intel may also need to lighten some circuitry (sacrifice some IPC relative to Sunny Cove) to maintain clocks.
 
Funny thing about double-digit improvement (*) is that it could mean anything from 10% to 99%, and it with intel record on benchmark numbers I would really take this claims with grain of salt. Don't get me wrong, Intel has made wonder with it 14nm iterations so everything is posible.

(*) From thier newsroom (https://newsroom.intel.com/news/intels-11th-gen-processor-rocket-lake-s-architecture-detailed/#gs.jg6r2d): "3 Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. "
 

jeremyj_83

Distinguished
I'll just throw this out there. This is spec2006 integer and floating point ops - single thread.

The 1185G7 is a max 4.7Ghz part, tested here at both 15W and 28W.
The 4800U is a max 4.2Ghz part, rated at 25W.
That is an 11.9% difference in clock speed.

In integer operations, 15W tiger is 49% faster than than the 25W 4800U.

In Float operations, 15W tiger is 48.6% faster than the 25W 4800U.

An 11.9% higher clock speed resulting in 49% and 48.6% higher performance here would imply one heck of a lot more IPC than the chip is being given credit for. This is assuming they are both hitting that max clock - and at this wattage rating I bet neither Renoir nor Tiger are sustaining those clocks.

And yes, the big difference Tiger Lake brought was a combination of higher frequency and at the same time lower power consumption vs Ice Lake. Think about that statement for a moment.

High clocks in this case came with lower power consumption. That is a total win win.

I am really interested to see what this arch can do with 95W or 125W even. Will Rocket lake come out as a 5.5Ghz 125W part? It might just do that, given what Tiger did. Even if it doesn't what will an all core 5Ghz turbo look like on this thing? Look at how close to the single thread performance of a 5.3Ghz 10900K that 28W part is.

Better comparison would be 28W Tiger Lake vs Ryzen 9 3950X. Both of them have the same boost clock and at 28W that allows a 50W turbo (that allows max clocks) is a good apples to apples comparison. That gives us a difference of 10.5% for the same boost clock. With Ryzen 5000 adding 19% IPC supposedly, that would give them an estimated score of 59.52 at 4.7GHz or 7.7% higher than Tiger Lake. Using that math means Rocket Lake needs to boost over 5GHz to equal a 4.7GHz Zen 3. Note this is all napkin math so we really won't know until Rocket Lake is released.
 

shady28

Distinguished
Jan 29, 2007
385
244
19,090
8
That "total win" only happened because 10nm+ was still a disappointment after four years worth of delays and still wasn't up to par with what Intel was hoping for last time I read about it.

The most important thing though is to keep in mind that Rocket Lake is 14nm-manyplusses, not 10nm, so there will be a significant power penalty there and Intel may also need to lighten some circuitry (sacrifice some IPC relative to Sunny Cove) to maintain clocks.
You might want to check that chart there again. There's an Ice Lake on it. And it's higher than the 7nm Zen 2 Renoir in performance, and lower on the power draw. That doesn't do much to support your comments about 10nm+ being a disappointment (quite the opposite). The main disappointment was that they couldn't make more of them.
 

shady28

Distinguished
Jan 29, 2007
385
244
19,090
8
Better comparison would be 28W Tiger Lake vs Ryzen 9 3950X. Both of them have the same boost clock and at 28W that allows a 50W turbo (that allows max clocks) is a good apples to apples comparison. That gives us a difference of 10.5% for the same boost clock. With Ryzen 5000 adding 19% IPC supposedly, that would give them an estimated score of 59.52 at 4.7GHz or 7.7% higher than Tiger Lake. Using that math means Rocket Lake needs to boost over 5GHz to equal a 4.7GHz Zen 3. Note this is all napkin math so we really won't know until Rocket Lake is released.
Well, in that comparison it doesn't state how long the chip can maintain that boost. The lower power chips won't maintain full turbo on a single core very long , and of course on all core they drop off very quickly.

I guarantee you that TL didn't maintain 4.7Ghz on one core the entire test. Probably more like 4.5-4.6 for 28 seconds. A higher power chip - a desktop chip - will maintain higher boost for longer. Power does matter because you don't actually get or maintain max speeds without it, and a 15w / 28w doesn't have much power.

I think people here are really reaching to explain away a massive difference in power / performance.
 

jeremyj_83

Distinguished
Well, in that comparison it doesn't state how long the chip can maintain that boost. The lower power chips won't maintain full turbo on a single core very long , and of course on all core they drop off very quickly.

I guarantee you that TL didn't maintain 4.7Ghz on one core the entire test. Probably more like 4.5-4.6 for 28 seconds. A higher power chip - a desktop chip - will maintain higher boost for longer. Power does matter because you don't actually get or maintain max speeds without it, and a 15w / 28w doesn't have much power.

I think people here are really reaching to explain away a massive difference in power / performance.
From Anandtech
(Note that in the single threaded test, the power limits ultimately should not apply because one core should not consume all the power of the chip. For the Tiger Lake processor, because this is a nominal 15 W TDP part with a 50 W turbo, this actually does go above the power limit with one core active, as it scores 554. As a result, the 50 W mode with a 28 W TDP was used and scores 595. This is more akin to a desktop processor anyway.)
This is from their Zen 3 preview and talking about AMD's stated Cinebench score.
 

InvalidError

Titan
Moderator
You might want to check that chart there again. There's an Ice Lake on it. And it's higher than the 7nm Zen 2 Renoir in performance, and lower on the power draw.
Well duh, Zen 2 is no competition for Sunny Cove since Zen 2 is barely even with Skylake.

The big disappointment I had in mind is when comparing Ice Lake's performance against Coffee Lake and Comet Lake CPUs: new architecture, new process, similar or worse performance in most cases since most of Ice Lake's ~18% IPC gains are offset by its ~10% lower clocks and loss of two cores in the i7 tier. That's why performance-oriented Intel laptops use Coffee/Comet Lake CPUs, not Ice Lake.
 
Reactions: panathas
The most important thing though is to keep in mind that Rocket Lake is 14nm-manyplusses, not 10nm, so there will be a significant power penalty there and Intel may also need to lighten some circuitry (sacrifice some IPC relative to Sunny Cove) to maintain clocks.
In theory because in practise there will be a bunch of power saving measures like turning off superfluous circuitry when not in use or/and having an offset for heavy IPC like they have for AVX.It will be drawing more power but it won't be crazy.
But of course all the benchmarks will be once again done with everything disabled and stating super high peak max draw watts as being the normal average.
 
Funny thing about double-digit improvement (*) is that it could mean anything from 10% to 99%, and it with intel record on benchmark numbers I would really take this claims with grain of salt. Don't get me wrong, Intel has made wonder with it 14nm iterations so everything is posible.
We already know from sunny cove that it has 20% more execution units,
and according to intel about 18% better IPC, that's if there is no other improvement ... or deprovement.
https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove
 
We already know from sunny cove that it has 20% more execution units,
and according to intel about 18% better IPC, that's if there is no other improvement ... or deprovement.
https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove
Artificial benchmarks aren't indicative of real world performance. Look at SLI, performs great in 3D mark and pretty much nothing else.
 

InvalidError

Titan
Moderator
Artificial benchmarks aren't indicative of real world performance. Look at SLI, performs great in 3D mark and pretty much nothing else.
IPC increases are averages and usually fairly accurate overall - it makes sense that a CPU with 25% more execution ports would aim to achieve nearly that much extra throughput per clock on average, otherwise the 25% wider architecture wouldn't be worth the trouble - you'd expect a GPU with 25% more shaders to be around 20% faster on average at a given clock frequency too.

SLI and CF do not scale well because they have always been kludges that require everything to line up perfectly for decent results, which practically never happens unless the game implements some degree of explicit support which almost no developer bothers with due to SLI/CF being a far too small audience to be worth any development time, which would be somewhat analogous to how most games get handicapped on platforms with non-uniform core-to-core latency because most PC developers cannot be bothered with the extra effort required to work around the CCX-to-CCX/CCD-to-CCD latency penalty much beyond the extra effort already required to prevent their games from breaking on Zen 1/1+/2.
 
IPC increases are averages and usually fairly accurate overall - it makes sense that a CPU with 25% more execution ports would aim to achieve nearly that much extra throughput per clock on average, otherwise the 25% wider architecture wouldn't be worth the trouble - you'd expect a GPU with 25% more shaders to be around 20% faster on average at a given clock frequency too.

SLI and CF do not scale well because they have always been kludges that require everything to line up perfectly for decent results, which practically never happens unless the game implements some degree of explicit support which almost no developer bothers with due to SLI/CF being a far too small audience to be worth any development time, which would be somewhat analogous to how most games get handicapped on platforms with non-uniform core-to-core latency because most PC developers cannot be bothered with the extra effort required to work around the CCX-to-CCX/CCD-to-CCD latency penalty much beyond the extra effort already required to prevent their games from breaking on Zen 1/1+/2.
But programs need to take advantages of it. You'd expect clock speed to have a linear effect on software and in some instances it gets fairly close but in others it just starts to flatline. Depends what you're doing and synthetics are very well optimised for hardware to put its best foot forward. I'd rather see a real workload vs synthetic.
 

InvalidError

Titan
Moderator
But programs need to take advantages of it.
Advantage of what? Wider architecture? No. You can run the exact same code and the instruction scheduler will shuffle things around to put the extra execution units to use regardless of what the code may have originally been optimized for. It may not be quite as efficient as code optimized specifically for how a specific CPU's instructions are distributed across its 10-wide architecture vs another CPU's 8-wide one but it'll get most of the way there, especially when SMT is being leveraged to give the scheduler a whole extra thread to to help it find something for all execution units to work on.

The whole point of having out-of-order execution in CPUs is to decouple low-level internals that can vary drastically between CPUs from software development so developers don't need dedicated code paths for every CPU model and variant thereof in existence to achieve remotely decent performance.
 

ASK THE COMMUNITY

TRENDING THREADS