News Intel Lunar Lake CPU gets die annotation — four Skymont E-cores slightly bigger than one Lion Cove P-core

I'm surprised they've seemingly managed to keep the same E-core proportions when compared to the P-cores given the performance boost. It'll be interesting to see how they perform on ARL given the direct L3 access. Also hoping that they do a new N series even though I don't need to buy more of them and this would likely make me want to.
 
  • Like
Reactions: cyrusfox

bit_user

Titan
Ambassador
The article said:
Lunar Lake also has a quad-core cluster of Skymont-based E-cores, and the die annotation indicates that the entire cluster is just a little larger than a single Lion Cove core. This isn’t surprising since Meteor Lake’s Redwood Cove P-core is about the same size as its quad-core Crestmont E-core cluster. However, it is notable that Intel was able to keep E-cores small despite Skymont packing 38% higher integer and 68% higher floating point IPC.
That's because cache plays a big part of it. Without L2 cache, Skymont is 33.2% as big as Lion Cove. Gracemont was about 29.6% as big as Golden Cove, after excluding L2 cache.

In Lunar Lake, both the P-cores and the E-cores got bigger. It's just that the rate of increase in the E-cores' size & complexity was a little higher than that of the P-cores.

Regarding those IPC figures, Chips & Cheese found that Skymont really can't stretch its legs in Lunar Lake. This is due to the entire cluster being implemented as a low-power island, rather than as a proper peer of the P-cores. Here's how they put it:

"Despite massive architecture improvements, Skymont’s performance is hit or miss compared to Crestmont. Lunar Lake’s different cache hierarchy plays a large role in this, and highlights the difficulties in having one core setup play both the low power and multithreaded performance roles. It also highlights the massive role caches play in CPU performance. Even a dramatically improved core can struggle to deliver gains if the cache subsystem doesn’t keep up. That’s especially important with LPDDR5X, which has high latency and can be a handicap in low core count workloads."

 
Last edited:

bit_user

Titan
Ambassador
I'm surprised they've seemingly managed to keep the same E-core proportions when compared to the P-cores given the performance boost.
Well, not quite. Excluding L2 cache Golden Cove was 5.37 mm^2. Gracemont was 1.59 mm^2. So, that's a ratio of 29.6% in Alder Lake and Raptor Lake, whereas I think Lunar Lake has a ratio of 33.2%.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae57d01d-12a5-4bbb-84d1-61d5275bf970_1185x636.jpeg


Source: https://locuza.substack.com/p/die-walkthrough-alder-lake-sp-and

Given how area-intensive cache is, I think we could expect an even greater disparity if we could compare them after excluding L1 (and L0, in Lion Cove's case).

The other thing to keep in mind is that Lion Cove also increased in IPC and complexity. Because Redwood Cove was so big and complex, the relative improvement wasn't as big. Another way of looking at it is that Redwood Cove is probably a lot further down the path of diminishing returns. So, just to get a ~15% IPC increase they had to increase gate count by more than what it would take to make a comparable improvement to Crestmont.

It'll be interesting to see how they perform on ARL given the direct L3 access. Also hoping that they do a new N series even though I don't need to buy more of them and this would likely make me want to.
I think Intel would probably rather use its own nodes for the low cost, low margin stuff. So, I'd expect we'll get N-series based on Crestmont and using the Intel 3 node. That's just a guess.
 
Last edited:
Well, not quite. Excluding L2 cache Golden Cove was 5.37 mm^2. Gracemont was 1.59 mm^2. So, that's a ratio of 29.6% in Alder Lake and Raptor Lake, whereas I think Lunar Lake has a ratio of 33.2%.
The client Lion Cove cores also don't contain HT which saves some on area comparatively speaking. It's also plausible they don't have AVX512 which definitely increased GC/RC core size. I mostly just expected Skymont to end up being larger than it is since Intel is making the E-cores more full featured.
 

bit_user

Titan
Ambassador
The client Lion Cove cores also don't contain HT which saves some on area comparatively speaking. It's also plausible they don't have AVX512 which definitely increased GC/RC core size.
I hope it doesn't have AVX-512, since the server cores are already deviating in areas like hyperthreading.

It seems to me the main reason why client cores have had AVX-512, since Ice Lake, was because Intel actually wanted to support it on them. When you consider how the server P-cores already differed from client P-cores in the number of AVX-512 ports, it does seem kind of pointless to integrate it if you really have no intention of enabling it.

I mostly just expected Skymont to end up being larger than it is since Intel is making the E-cores more full featured.
TBH, I did actually expect them to be closer in size. If it turned out that Skymont were a little more than half the size of Lion Cove, I think I wouldn't have been surprised.

However, upon reflection, it does seem to me that probably a lot of the size difference is simply due to supporting higher clock frequencies in Lion Cove. If we consider how much smaller AMD's C-cores are, which have the exact same microarchitecture as the full-sized ones, Zen 4C is only 64.6% as big as regular Zen 4 (excluding L3). When you consider the additional differences and longer critical paths of Skymont, I guess I really should've expected it to be more in the range of 30 to 40% as big as Lion Cove.
 
  • Like
Reactions: thestryker
However, upon reflection, it does seem to me that probably a lot of the size difference is simply due to supporting higher clock frequencies in Lion Cove. If we consider how much smaller AMD's C-cores are, which have the exact same microarchitecture as the full-sized ones, Zen 4C is only 64.6% as big as regular Zen 4 (excluding L3). When you consider the additional differences and longer critical paths of Skymont, I guess I really should've expected it to be more in the range of 30 to 40% as big as Lion Cove.
The ARL leaks have all shown the E-cores hitting 4.6Ghz which is higher than Gracemont on any of the RPL SKUs. This was somewhat unexpected given that LNL caps out at 3.7Ghz on even the 288V (MTL U series went up to 3.8Ghz). It makes me even more curious about the Skymont efficiency curve than I was before especially with what we know about how Intel blew Gracemont's efficiency to gain MT on desktop.
 

bit_user

Titan
Ambassador
The ARL leaks have all shown the E-cores hitting 4.6Ghz which is higher than Gracemont on any of the RPL SKUs. This was somewhat unexpected given that LNL caps out at 3.7Ghz on even the 288V (MTL U series went up to 3.8Ghz). It makes me even more curious about the Skymont efficiency curve than I was before especially with what we know about how Intel blew Gracemont's efficiency to gain MT on desktop.
That's interesting. If that were on a higher-density node (20A), it would make more sense to me, as you'd get some additional frequency pretty much for free (design-wise).

In terms of perf/W, it does seem like Intel might be interested in trying to juice the Symont cores to help offset the loss of hyperthreading, when it comes to MT performance. Too bad we're not going to get the rumored versions with 32 E-cores, as that really would've been something to behold!