News Intel unwraps Lunar Lake architecture: Up to 68% IPC gain for E-cores, 16% IPC gain for P-Cores

cyrusfox

Distinguished
Fascinating the disparity in improvement between Floating point vs integer single thread uplift. Huge FP uplift! Eye popping improvements!!!
3E5Ne7Zamn98gvcP2rYhy7.jpg

That is lame though, comparing to the gimped e-cores...
but the comparisons are once again being made to the low-power Meteor Lake E-core instead of the full E-core
iGPU uplift is looking really solid, this seems to check all the boxes of what I need in a device. For toting around I really don't need 24+ cores, 8 is plenty and with this much GPU as well as hopefully maturing platform. Excited to see the products and where the pricing lands. Hope Framework gets a flavor of this.
 
Last edited:

jenci8888

Distinguished
Jun 24, 2013
6
0
18,510
68% ipc gain e-core? That doesn't seem right... I think they meant 68% gain e-core on specfp. It should be 30% ipc around.
 
Well it seems we have an answer regarding HT and that is it depends. It'll be interesting to see which version of the Lion Cove desktop ARL uses. I wouldn't be surprised if mobile ARL went without HT, but it seems like desktop could probably keep it even though they emphasize hybrid when referring to dropping it.

Will be looking forward to seeing real world performance on LNL and ARL.
That is lame though, comparing to the gimped e-cores...
Just a guess based upon the testing Chips and Cheese did on the MTL LP E-cores removal from the ring bus and thus the L3 cache can have an outsized impact on performance. This would in theory be the closest comparison to an existing product.
 

Giroro

Splendid
I think cheap Mini PCS using Intel's N100 all E-core processor are a perfectly usable office machine for many people, and I would love to see those upgraded with the new E cores.

That said, I would never buy one, because they would definitely spec those machines with a worthless amount of non-upgradeable memory.
One of the major features adding value of the N100 machines, is that you can usually upgrade the RAM.

Now for the high end Lunar Lake products... There is no high end. If Intel doesn't convince manufacturers to keep ultrabooks with the highest available configuration under $1200, then they're going to have a problem
 

bit_user

Polypheme
Ambassador
The article said:
38% and 68% IPC gains in the new Skymont architecture.
This is based on a somewhat biased performance comparison (see below).

Fascinating the disparity in improvement between Floating point vs integer single thread uplift. Huge FP uplift! Eye popping improvements!!!
3E5Ne7Zamn98gvcP2rYhy7.jpg
That is lame though, comparing to the gimped e-cores...
Initially, I missed what you probably meant by "gimped". As I see @thestryker has pointed out, comparing to the LP E-cores is indeed quite lame of them, since its lack of L3 cache has been shown to disadvantage it relative to the Crestmont cores on Meteor Lake's CPU tile.

Dang. I was really excited for a minute, there.
: (
 
Last edited:

bit_user

Polypheme
Ambassador
I think cheap Mini PCS using Intel's N100 all E-core processor are a perfectly usable office machine for many people, and I would love to see those upgraded with the new E cores.
Yeah, they seem to be somewhere around the performance of a Sandybridge or Haswell i5, which is still pretty usable. Of course, their iGPU is much better than those CPUs'.

That said, I would never buy one, because they would definitely spec those machines with a worthless amount of non-upgradeable memory.
One of the major features adding value of the N100 machines, is that you can usually upgrade the RAM.
You can find some that take a DDR5 SO-DIMM. I have 32 GB in my N97 machine. It doesn't need that much, but I did it just to get dual-rank, for the small performance boost it provides.
 
Last edited:

bit_user

Polypheme
Ambassador
The article said:
Surprisingly, intel turned to TSMC for its leading-edge 3nm N3B process node for its compute tile, ... It also uses the TSMC N6 node for the platform controller tile... In fact, the only Intel-fabbed silicon on the chip is the passive 22FFL Foveros base tile
This is a lovely ad for Intel's foundry business.
/s

The article said:
Lunar Lake’s new microarchitectures pave the way for ... even its Xeon 6 lineup, too.
Uh, no. Granite Rapids uses Redwood Cove, not Lion Cove, and Sierra Forest uses Crestmont, not Skymont.

The article said:
Intel places two stacks of LPDDR5X-8500 memory directly on the chip package, in 16GB or 32GB configurations, to reduce latency
Citation needed, @PaulAlcorn . The distances involved aren't big enough to impact latency. However, using LPDDR5(X), instead of regular DDR5, will definitely increase latency!

The article said:
The compute tile ... incorporates a new 8MB ‘side cache’ that can be shared among all the various compute units to improve hit rates and reduce data movement, thus saving power. However, it doesn’t technically fit the definition of an L4 cache because it is shared between all of the units.
The industry lingo for this is a "System-Level Cache", sometimes abbreviated SLC.

The article said:
fueling an up to 2x increase in single-threaded performance and up to 4x more peak performance in multi-threaded workloads than the Meteor Lake LP E-cores.
The end notes confirm that the 2x single-threaded and 4x MT both reference a regular quad-Skymont cluster vs. the dual-Crestmont block in the Low Power island on Lunar Lake's SoC tile - a misleading comparison, if there ever was one!

The article said:
The Meteor Lake design employed two E-cores placed in the SoC tile for extreme low-power workloads, with four additional E-cores on the compute tile along with the P-cores.
That should be eight additional Crestmont cores, on the CPU tile.

The article said:
so any core in the new design can sustain nine instruction decodes per clock.
No, I think it means the peak decode throughput is 9 instructions. Intel hasn't been very clear about the set of circumstances for achieving parallelism between decoder blocks. At one point, I saw some speculation that each decoder block could only work on a separate branch target, though I'm not sure if that's accurate.

The article said:
Intel targeted a 2X improvement in vector performance, made by going from the two 128-bit FP and SIMD vector pipes to four with Skymont.
Probably motivated, in part, to keep up with the FP throughput enhancements in Lion Cove.

The article said:
Previous E-core clusters had a shared 2MB L2 cache, but that has now been expanded to 4MB
Except in Raptor Lake, where it was also 4 MiB.

The article said:
Intel also provided comparisons for Skymont vs Raptor Lake’s P-core, which uses the Raptor Cove architecture. Intel claims a 2% advantage for Skymont in integer and floating point.
Read the slide carefully, @PaulAlcorn . In the lower left, it says this is iso-frequency!

TiYbegfgjonfifxe84JFc8.jpg
On the next slide, Intel is clear that Raptor Cove still offers better peak performance.

SCtAd7LHV9me82igNmQej8.jpg
 
Last edited:

Giroro

Splendid
You can find some that take a DDR5 SO-DIMM. I have 32 GB in my N97 machine. It doesn't need that much, but I did it just to get dual-rank, for the small performance boost it provides
I wish I had the DDR5 version. Single channel DDR4 turns out is a problem.
The iGPU could be really good at encoding, i can usually put 3 webcams with an overlay in an OBS stream and go live 24/7 on YouTube at a high bitrate... but enabling even basic color or exposure correction to a single webcam in OBS pegs the GPU at 95%-100% utilization and the stream falls apart. I think it's the memory bandwidth bottleneck, which is limiting, and a bummer.
 
  • Like
Reactions: bit_user

Notton

Prominent
Dec 29, 2023
510
445
760
LNL's on-package memory will be a significant limitation for adoption. It works for Apple, because Apple owns its walled garden. I understand the rationale, which is to lower power use by any means to compete against ARM, but OEMs will want to have more diverse memory configurations. Along with low core-count, LNL will only fit into the premium ultramobile niche, while Arrow Lake mobile will fill out the rest of the segments.
I think LNL not coming with an 8GB option is a win.
But it does need a 24GB option.
Also, the 4P+4E core config is the same as what I am using right now, an i5-12450H with 32GB DDR4 3200.
 
  • Like
Reactions: usertests

usertests

Distinguished
Mar 8, 2013
585
547
19,760
That said, I would never buy one, because they would definitely spec those machines with a worthless amount of non-upgradeable memory.
One of the major features adding value of the N100 machines, is that you can usually upgrade the RAM.
16 GB soldered is common with Alder Lake-N and enough for many people. I won't entertain the arguments about this so don't bother, there was a whole thread for arguing about that recently. I will definitely buy non-upgradeable systems, if the price is right. Of course, you can pair up to 48 GB SO-DIMM with the ODROID-H4 and probably others, and possibly 64 GB in the future after 32 Gb RAM chips hit the market.

Lunar Lake is clearly designed for laptops/handhelds. I'm not sure that Intel would be quick to copy the approach for an Alder Lake-N successor using Skymont. Alder Lake-N is used in a wide variety of devices including the embedded market. If they did, it could be like Meteor Lake-U, some SKUs with/without memory-on-package. Except those MoP models (164U/134U) are missing in action. 🤷‍♂️

What we really need to see is dual-channel (128-bit) memory support, which was found in Jasper Lake and prior generations. Maybe Intel will be forced to do it given those (apparently) massive gains for Skymont, and newer Intel Graphics.

As noted in the Ars piece, the issue facing LNL is that it has 4P+4E cores, while its predecessor MTL has 6P+8E+2LPE cores. Despite all the talk of per core improvement, and undoubtedly better single-core scores, multi-core perf will likely take a major hit, especially now sans HT. There was no direct comparison with MTL in the presentation.
It may take a hit in multi. It will likely shine in mobile gaming, where a quad-core can be enough (Steam Deck).

Also, Lunar Lake is more of a successor or sidegrade to the smaller MTL-U, which only has 2 P-cores. Lunar gains 2 P-cores vs. that and loses 4 E, 2 LPE, but with big IPC gains.

There will be an Arrow Lake-H and probably Arrow Lake-U that should have the usual core counts.

Also, credit to Ars for parsing the hype.
No credit for them. They goofed on some of their parsing, like the unfixed caption: "But when compared to full-fat E-cores from a 13th/14th-gen Raptor Lake CPU, they roughly break even, albeit with slightly lower power usage."

No, Intel is comparing Skymont to Raptor Cove, the 14th gen P-core, with Skymont apparently coming +2% ahead in IPC (albeit with +/- 10% CYA margin of error).

I think a lot of people and outlets don't know what to make of this presentation. Intel has made some huge claims, but hasn't made all of the information available. How many Skymont cores fit in the area of a single Lion Cove core, for example.
 

bit_user

Polypheme
Ambassador
As noted in the Ars piece, the issue facing LNL is that it has 4P+4E cores, while its predecessor MTL has 6P+8E+2LPE cores.
I missed that. Very good point.

LNL's on-package memory will be a significant limitation for adoption.
Disagree. They're starting a fair bit higher than Apple's baseline. Also, the use of memory compression can make it feel like even more than that (which is one of the tricks Apple uses).

That said, ARM CEO's throwaway claim of 50% of PC market share by 2029 is so ludicrous that I'm surprised that THW gave it mention in two separate pieces. The smell of clickbait is strong.
Disagree. Yes, it's newsworthy, even if you find it unbelievable.
 
AMD gets out Zen2 with 1 or 2 CPU chiplets and an IO chip : Intel, "feh - a glued together CPU. That's ridiculous."
Intel gets Lunar Lake out : the whole CPU is literally glued together, "revolutionary".
And there have been plenty of articles about how high the cross ccx lag is, still to this day people go for single ccx dies for this reason, because it's still high.

I don't know enough about foveros but I didn't see any article either saying that it has bad lag.
 
  • Like
Reactions: rtoaht
AFAIK, MTL-U didn't make it into any laptop in the US.
Dell, Lenovo and HP all sell laptops using them (others probably do too, but I just looked at the first that came to mind). LNL is absolutely aimed at the same market as the U series of processors.
As said, LNL is niche, and is basically a taster for the coming ARL mobile. That it exists at all this year is a direct response to MS' adoption of ARM for its halo Surface line. Intel & AMD had to scramble, which is probably why both LNL & Ryzen 300 only have 2 products in their respective line-ups.
This is a really far off the mark assessment of the situation and shows a distinct lack of market knowledge. LNL shares core architecture with ARL and that's about it. It doesn't share the same tile configuration, core configuration or graphics technology. Now one could make the argument that it was fast tracked due to Qualcomm bringing Nuvia cores to market, but that's about it.
We'll see. One handheld design win for MTL was MSI Claw, and it was savaged in reviews--to wit, The Verge's "The MSI Claw is an embarrassment." Maybe the Claw successor based on LNL will fare better. Then again, maybe not.
There's plenty of evidence that this was due to MSI shoving it out the door without actually optimizing the firmware. It was slower than MTL laptops using the same chip at the same power levels and had crazy idle when it launched. It sounds like it's just now getting to where it should have launched at. While I suspect LNL will be a great chip for handhelds I certainly wouldn't be an early adopter.
I missed that. Very good point.
Disagree completely about it being a good point. That would be like comparing it to a N305 because they both have 8 cores, or comparing it to a 14900HX because they're both laptop CPUs. LNL is clearly intended to fit into the same market as the U series processors and beat them at performance efficiency if not raw performance.
 
  • Like
Reactions: TheSecondPower

bit_user

Polypheme
Ambassador
And there have been plenty of articles about how high the cross ccx lag is, still to this day people go for single ccx dies for this reason, because it's still high.
Cross-CCD communication isn't very common, though. Especially for NUMA-aware schedulers, which operating systems are becoming more adept at doing.

Also, most inter-thread communication involves synchronization primitives which aren't themselves very cheap.
 

bit_user

Polypheme
Ambassador
Disagree completely about it being a good point. That would be like comparing it to a N305 because they both have 8 cores, or comparing it to a 14900HX because they're both laptop CPUs. LNL is clearly intended to fit into the same market as the U series processors and beat them at performance efficiency if not raw performance.
Sorry, I meant about Lunar Lake being 4 + 4 being a good point. Comparing it to the 6/8/2 configuration of Meteor Lake was obviously comparing across different market segments, but if you compare it to the 2/8/2 configuration, then we see a 14-thread CPU being replaced by an 8-thread one. I expect there will be some multithreaded benchmarks where Lunar Lake will be coming up short.
 
  • Like
Reactions: thestryker