News Intel's next-gen CPU boosts to 2.8 GHz without Hyper-Threading — Lunar Lake chip with eight cores, eight threads has a bigger L2 cache than the L3...

Status
Not open for further replies.

usertests

Distinguished
Mar 8, 2013
969
857
19,760
The design looks like a good low-power successor to Alder Lake-U and Meteor Lake-U, and far superior to gimped chips like Alder Lake-N (0+8 cores, but typically quad-core, 16/24/32 EUs, limited to single-channel memory).

Compare 4+4, 8 threads with no hyperthreading to:

Alder/Raptor Lake-U, with 2+8 cores, 12 threads.
Meteor Lake-U, with 2+8+2 cores, 14 threads. The last two being the new LP-E cores.

Increasing the P-cores, decreasing the E-cores seems like a good development which should increase its gaming performance. You probably won't miss the extra threads in a low-power device, and there should be at least some minor IPC gains from Lion Cove and Skymont. It's a little unusual to see LP E-cores thrown out, but who cares?

The YuuKi-AnS leak from November revealed some other details: A choice of only 16 GB or 32 GB memory-on-package, 16 GB being a good minimum these days. The memory speed is apparently fixed at LPDDR5x-8533. iGPU is 7/8 Xe2 cores. which is 112/128 execution units instead of the 64 EUs maximum in Meteor Lake-U. The NPU is faster for what that's worth. There's H.266 video decode and Wi-Fi 7 support.

It just looks like a good x86 low-power laptop chip when you look at it in full. The only thing I'll be complaining about is the pricing most likely.
 
Last edited:
This seems squarely aimed at the same market Qualcomm seems to be trying to break into so at least Intel will have a very real threat to deal with which might keep pricing under control. Of course Qualcomm isn't exactly known for their rational pricing so maybe it won't be such an issue.

It's going to be very interesting to see how LNL and ARL turn out to be designed in the end.
 
Last edited:

rluker5

Distinguished
Jun 23, 2014
914
595
19,760
The design looks like a good low-power successor to Alder Lake-U and Meteor Lake-U, and far superior to gimped chips like Alder Lake-N (0+8 cores, but typically quad-core, 16/24/32 EUs, limited to single-channel memory).

Compare 4+4, 8 threads with no hyperthreading to:

Alder/Raptor Lake-U, with 2+8 cores, 12 threads.
Meteor Lake-U, with 2+8+2 cores, 14 threads. The last two being the new LP-E cores.

Increasing the P-cores, decreasing the E-cores seems like a good development which should increase its gaming performance. You probably won't miss the extra threads in a low-power device, and there should be at least some minor IPC gains from Lion Cove and Skymont. It's a little unusual to see LP E-cores thrown out, but who cares?

The YuuKi-AnS leak from November revealed some other details: A choice of only 16 GB or 32 GB memory-on-package, 16 GB being a good minimum these days. The memory speed is apparently fixed at LPDDR5x-8533. iGPU is 7/8 Xe2 cores. which is 112/128 execution units instead of the 64 EUs maximum in Meteor Lake-U. The NPU is faster for what that's worth. There's H.266 video decode and Wi-Fi 7 support.

It just looks like a good x86 low-power laptop chip when you look at it in full. The only thing I'll be complaining about is the pricing most likely.
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.

I wonder how it will compare to Apples offerings on battery life.

And that max boost clock is questionable, even at 33% load. Under a power savings type power plan, which is default for mobile, clocks fluctuate a lot. Also hopefully Intel has found a way to downclock better than what the chips have been doing lately. They could save more power.
 
  • Like
Reactions: usertests

usertests

Distinguished
Mar 8, 2013
969
857
19,760
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.
I'm not sure about the graphics, but it looks promising.

The leak says "8 Xe2 cores, 64 Vector Engines (16 wide)".

Wikipedia says Meteor Lake-H has 8 Xe cores, 128 Vector Engines. Meteor Lake-U has 4 Xe cores, 64 Vector Engines. So a newer generation of Arc/Xe graphics but some aspects may be halved since it's efficiency-focused. Or maybe that's just Vector Engines being wider and more powerful in Xe2-LPG, I don't know.

The apparent default speed of LPDDR5x-8533 is 14% higher than LPDDR5x-7467 top speed in Meteor Lake, and that's only in the top models.

So it would either be faster than only Meteor Lake-U, or faster than Meteor Lake-H/Phoenix which would actually be kinda serious to me at least. Older games, emulators, and some newer games would be fine with that level of iGPU for 1080p and the 4+4 cores, 8 threads. Microsoft may try to harness the NPU for its unannounced upscaling technique. The leak says it will run at 8W in fanless designs, or 17-30W with a fan, which would be the better choice for clocks and gaming performance.
 
Last edited:

purposelycryptic

Distinguished
Aug 1, 2008
51
67
18,610
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.

I wonder how it will compare to Apples offerings on battery life.

And that max boost clock is questionable, even at 33% load. Under a power savings type power plan, which is default for mobile, clocks fluctuate a lot. Also hopefully Intel has found a way to downclock better than what the chips have been doing lately. They could save more power.
I think you mean productivity crowd. Workstation processors would be the Xeon-W series and AMD's Threadripper PRO series processors.

Workstations generally have the meanest processors around, with the highest balance of clock speed and core/thread count of any production CPU line. Gaming processors may have higher clock speeds, server processors may have more cores, but workstation processors are the true all-rounder bruisers. You can definitely game on them - if you have the money. They are far on the other end of the price spectrum from this processor - you can easily spend $5k-$10k+ on one.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
Critical specifications for an early sample of Intel's upcoming Lunar Lake CPU have [uleaked out[/u]
Saying "leaked out" is redundant, as the "out" is implicit. The first place I noticed using such a construction is WCCFTech, which doesn't surprise me as I think that site is based in Pakistan, hence their English is sometimes a bit odd.

Meteor Lake was on stepping C0 (with the C meaning the third major revision), Raptor Lake was B0, and Alder Lake was C0. So, this Lunar Lake chip is probably not the final product.
I also thought the letters were simple iterations, but then I learned that the Alder Lake-S H0 stepping is a fundamentally different die than the C0 stepping. The H0 die is the one with only 6 P-cores and no E-cores, in its fully-enabled form. It's actually different silicon than the C0 stepping, which is the one with 8P + 8E. I don't know if these dies have other names, but the only distinction I've seen is the C0 vs. H0 stepping.

the Lunar Lake sample has only 12MB of L3 cache, lower than the 14MB of L2 cache. Usually, a higher level of cache means more capacity, and often significantly more, so it's very unintuitive that Lunar Lake should have less L3 than L2 cache. This directly contradicts an earlier leak that showed 16MB of L3 cache for Lunar Lake but has identical specifications otherwise.
Not weird. First, Intel CPUs tie L3 cache to the core tile. As cores are disabled, so are their L3 cache slices. So, if this isn't a fully-enabled sample, then that could explain certain L3 differences. For instance, Raptor Cove had a 3 MB slice of L3 per core, in which case 4 P-cores would yield 12 MB of L3. 6 P-cores would give you 18 MB.

The other thing to know about L3 cache is that Intel has been implementing as exclusive of L2 contents since Skylake-SP, making it somewhat complementary. That makes a lot of sense if you've got big L2 caches, and it's how they avoid needing so much L3.

Just look at the L3 cache in Alder Lake & Raptor Lake. Here's a plot I made, but note that the Y-axis is logarithmic.

dw83PlR.png

 
  • Like
Reactions: Nyara

jp7189

Distinguished
Feb 21, 2012
532
303
19,260
Saying "leaked out" is redundant, as the "out" is implicit. The first place I noticed using such a construction is WCCFTech, which doesn't surprise me as I think that site is based in Pakistan, hence their English is sometimes a bit odd.


I also thought the letters were simple iterations, but then I learned that the Alder Lake-S H0 stepping is a fundamentally different die than the C0 stepping. The H0 die is the one with only 6 P-cores and no E-cores, in its fully-enabled form. It's actually different silicon than the C0 stepping, which is the one with 8P + 8E. I don't know if these dies have other names, but the only distinction I've seen is the C0 vs. H0 stepping.


Not weird. First, Intel CPUs tie L3 cache to the core tile. As cores are disabled, so are their L3 cache slices. So, if this isn't a fully-enabled sample, then that could explain certain L3 differences. For instance, Raptor Cove had a 3 MB slice of L3 per core, in which case 4 P-cores would yield 12 MB of L3. 6 P-cores would give you 18 MB.

The other thing to know about L3 cache is that Intel has been implementing as exclusive of L2 contents since Skylake-SP, making it somewhat complementary. That makes a lot of sense if you've got big L2 caches, and it's how they avoid needing so much L3.

Just look at the L3 cache in Alder Lake & Raptor Lake. Here's a plot I made, but note that the Y-axis is logarithmic.
dw83PlR.png
Initially I thought the same re:L3, but that explanation doesn't make sense considering the L2 size. If cores were disabled, then L2 would be lost too.

I have a hard time imagining a use case for a cache that's both smaller and slower even if it's exclusive. I guess maybe if L2 is core specific and L3 is shared pool, there could be some single thread cases...
 

bit_user

Titan
Ambassador
Initially I thought the same re:L3, but that explanation doesn't make sense considering the L2 size. If cores were disabled, then L2 would be lost too.
You see the graph in my post, no? The per-core L2 amount has been creeping up, while they've been holding per-core L3 constant, for the past 3 generations. I suppose I should update it with Meteor Lake.

Edit: it looks like the per-core specs on Redwood Cove's caches are the same as Raptor Cove's. The biggest change seems to be that the L3 cache is no longer shared with the iGPU. Interestingly, the CPU die's quad-Crestmont tiles appear to have dropped back to having a 2 MB slice of shared L2, each (Raptor Lake increased this to 4 MB). In spite of this, the E-cores are the one area of Meteor Lake featuring higher IPC than Raptor Lake, from what I've seen.

I have a hard time imagining a use case for a cache that's both smaller and slower even if it's exclusive. I guess maybe if L2 is core specific and L3 is shared pool, there could be some single thread cases...
That's exactly how it works. L2 unifies code + data, for a single core (L1 separates them). Except in the case of E-cores, where the L2 is shared across the quad-core cluster. On Intel CPUs, L3 is global. On AMD CPUs, L3 only unifies the specific compute die, which makes L3 comparisons between 12+ core AMD CPUs and Intel CPUs artificially lopsided in AMD's favor.

BTW, note that current lithography techniques have reached a point where SRAM size has virtually stopped improving with new nodes. So, that's an argument for why Intel might be preferring to spend more of its area-budget on L2 cache than L3. In other words, they might get a better improvement in perf/mm^2 by enlarging L2 cache by some amount, rather than enlarging L3 by the same amount. Transistor-wise, the cost is about the same.

The traditional reason to keep L2 smaller is that lookups take longer, the larger it gets, and L2 is typically more latency-sensitive than L3. However, that cost increases logarithmically vs. size, so it's not an obvious win not to enlarge L2 a little bit more.
 
Last edited:

jp7189

Distinguished
Feb 21, 2012
532
303
19,260
You see the graph in my post, no? The per-core L2 amount has been creeping up, while they've been holding per-core L3 constant, for the past 3 generations. I suppose I should update it with Meteor Lake.

Edit: it looks like the per-core specs on Redwood Cove's caches are the same as Raptor Cove's. Interestingly, the Crestmont tiles appear to have dropped back to having a 2 MB slice of shared L2, each.


That's exactly how it works. L2 unifies code + data, for a single core (L1 separates them). On Intel CPUs, L3 is global. On AMD CPUs, L3 only unifies the specific compute die, which makes L3 comparisons between 12+ core AMD CPUs and Intel CPUs artificially lopsided in AMD's favor.

BTW, note that current lithography techniques have reached a point where SRAM size has virtually stopped improving with new nodes. So, that's an argument for why Intel might be preferring to spend more of its area-budget on L2 cache than L3. In other words, they might get a better improvement in perf/mm^2 by enlarging L2 cache by some amount, rather than enlarging L3 by the same amount. Transistor-wise, the cost is about the same.

The traditional reason to keep L2 smaller is that lookups take longer, the larger it gets, and L2 is typically more latency-sensitive than L3. However, that cost increases logarithmically vs. size, so it's not an obvious win not to enlarge L2 a little bit more.
I was responding to your point of a small L3 being the result of disabled cores. If disabled cores were the cause, then we'd see a reduction of L2 also.

Otherwise, I agree with your thoughts.
 
Status
Not open for further replies.