News Intel launches Lunar Lake: claims Arm-beating battery life, world’s fastest mobile CPU cores

YSCCC · Sep 5, 2024

TheHerald said:
E cores don't do much for games that don't use a lot of cores. P cores don't do much either for those games. That's not an ecore problem, that's a "game doesn't scale with cores" problem. Still, if you don't get any performance regression from those games, while you get a huge boost on the games that use those extra cores, what are we even talking about?

Im not running any background tasks btw, i'm running ghost ultra light with every windows fluff removed (store, xbox, windows defender, cloud, what have you).

you realise you just explained why in your test disabling e cores don't help???

coz you removed all background tasks except the windows own stuffs or the FPS monitor!

In a normal gaming session, ppl don't just always turn on the PC, disable everything else and start the game. They have antivirus, recording/streaming apps, discord for communication in online multiplayers, maybe music playing on USB dac or so. when a game, say utilize 6 cores 12 threads, the higher clocked P cores with more cache runs best, but if those run on startup things like RGB/antivirus and hwinfo etc hold the P core threads and so the game got pushed to 4 P cores and put 4 threads to 2 E cores, the FPS could drop a lot. and that's when APO gets the massive gain of FPS in those games, and before APO release, or before APO even tested and tailored a Thread schedulling for the game, disabling the E core force all threads onto the P cores, even sharing a P core get better performance than let the background app or youtube occupying the P cores and the games scheduled to E cores.

TheHerald · Sep 5, 2024

YSCCC said:
you realise you just explained why in your test disabling e cores don't help???

coz you removed all background tasks except the windows own stuffs or the FPS monitor!

In a normal gaming session, ppl don't just always turn on the PC, disable everything else and start the game. They have antivirus, recording/streaming apps, discord for communication in online multiplayers, maybe music playing on USB dac or so. when a game, say utilize 6 cores 12 threads, the higher clocked P cores with more cache runs best, but if those run on startup things like RGB/antivirus and hwinfo etc hold the P core threads and so the game got pushed to 4 P cores and put 4 threads to 2 E cores, the FPS could drop a lot. and that's when APO gets the massive gain of FPS in those games, and before APO release, or before APO even tested and tailored a Thread schedulling for the game, disabling the E core force all threads onto the P cores, even sharing a P core get better performance than let the background app or youtube occupying the P cores and the games scheduled to E cores.

Good grief, you are correct. I'm out.

I'd be willing to test with 50 background tasks as well but you'll come up with a new excuse about why ecores on performs better so this is tiresome.

YSCCC · Sep 5, 2024

thestryker said:
To be fair that's not exactly what APO does from the testing that analyzed threads upon the initial release at least. It optimized where the threads were doing so fewer clusters of E-cores lit up and those that were happened to be more consistently used. If Intel could figure out how to do that without having a software layer hybrid architectures would basically have zero negatives beyond pure capability.

They've optimized thread director again with LNL so I wonder if APO will still be a thing for ARL and beyond (it's only desktop CPU die right now).

My bad that I was on phone this morning so typed a few words in a rush and missed things, what I meant was it limits the main threads in games to p cores and use the E coress after those are scheduled, so in turn it stops those unimportant background stuffs from occupying the much higher performance P cores.

And that's why it need individual optimization and in the early ADL days, some found disabling E cores getting more FPS and less stuttering

Newer games should figure those out the thread schedulling better as it have to optimize for core parking for AMD as well so likely there could be some sort of built in layer to schedule the threads for the usage. But for older titles I believe APO will stick around and those who enable/disable e cores could still find it useful for those games

bit_user · Sep 5, 2024

YSCCC said:
ironically that's also kind of true for AMD, that's why the 8 Core 7800X3D just tops the gaming performance charts while the like of 7950X3D have essentially a "Pcore" CCD with the X3D and the "E core" CCD without X3D for games, the games parking the wrong CCD.

IMO, the problem of scheduling 7950X3D is actually more interesting than that. The complexity comes from the fact that one CCD has 3x the L3 cache, while the other has more frequency headroom. So, the thread scheduler needs to decide whether a given thread prefers higher clock frequency or more L3 cache.

I think it's not such a hard problem, in theory. You can collect statistics on L3 cache misses from each thread, periodically. This can tell you which threads are heavy L3 users. You could even see how much they appear to improve after being migrated to the X3D die, although it's harder to know if any observed change was due to the migration vs. unrelated changes in the code or data being processed by the thread, during the two different time intervals.

When you also consider there's a bit of a knapsack problem aspect to the decision about where to place which threads, it really does seem so much simpler to just keep them all on the CCD with the 3D cache (or just use a 7800X3D, in the first place).

I would also observe that some parallel computing APIs take a rather different approach to this sort of problem, by having the programmer organize threads into workgroups. This specifies which threads are doing the most communication and data sharing. It also has the concept of local memory, which you have to explicitly allocate. Although cache doesn't really work like that, a runtime could track the local memory allocations, in order to get an idea of how heavily each workgroup will be hitting L3.

I'd imagine lots of programmers would hate to deal with such hassles, but I think many game engines would support it, if the benefits were great enough. Indeed, these are some of the same sorts of APIs they use for GPU programming, and they sure find the wherewithal to do that!

YSCCC · Sep 5, 2024

bit_user said:
IMO, the problem of scheduling 7950X3D is actually more interesting than that. The complexity comes from the fact that one CCD has 3x the L3 cache, while the other has more frequency headroom. So, the thread scheduler needs to decide whether a given thread prefers higher clock frequency or more L3 cache.

I think it's not such a hard problem, in theory. You can collect statistics on L3 cache misses from each thread, periodically. This can tell you which threads are heavy L3 users. You could even see how much they appear to improve after being migrated to the X3D die, although it's harder to know if any observed change was due to the migration vs. unrelated changes in the code or data being processed by the thread, during the two different time intervals.

When you also consider there's a bit of a knapsack problem aspect to the decision about where to place which threads, it really does seem so much simpler to just keep them all on the CCD with the 3D cache (or just use a 7800X3D, in the first place).

I would also observe that some parallel computing APIs take a rather different approach to this sort of problem, by having the programmer organize threads into workgroups. This specifies which threads are doing the most communication and data sharing. It also has the concept of local memory, which you have to explicitly allocate. Although cache doesn't really work like that, a runtime could track the local memory allocations, in order to get an idea of how heavily each workgroup will be hitting L3.

I'd imagine lots of programmers would hate to deal with such hassles, but I think many game engines would support it, if the benefits were great enough. Indeed, these are some of the same sorts of APIs they use for GPU programming, and they sure find the wherewithal to do that!

I think it would be actually better for AMD to get it for either all CCD X3D or normal version with higher frequencies, it will be much easier to get L3 heavy or not, having half the cores performs differently isn't good for coding, just get the choice to get more cache or higher frequency and overclockability. either way it won't perform like S___ anyway, E cores in theory is easier to schedule, as it always perform worse than P cores.

Pierce2623 · Sep 5, 2024

TheHerald said:
Nice video, there are basically 5 games that perform worse with Ecores on (2% or more difference) and 16 that perform better with it. And he hasn't even tested some heavy hitters like TLOU, cyberpunk, once human or in general games that use 16+ cores. And it's more impressive later on in the video when he tests 1% lows. I've always been saying that turning off ecores destroys your lows and that video shows it quite clearly.

I’ll guarantee you that if we get benchmarks on those Bartlett Lake designs that are all P core with up to 12 p cores, they’ll perform better. The old tiny e cores wouldn’t have been replaced with cores wider than Redwood Cove in Lunar Lake and Arrow Lake if Intel considered them a success.

bit_user · Sep 5, 2024

YSCCC said:
I think it would be actually better for AMD to get it for either all CCD X3D or normal version with higher frequencies, it will be much easier to get L3 heavy or not, having half the cores performs differently isn't good for coding,

Yeah, I'd tend to agree. The theory of being able to decide between frequency and cache at runtime is nice, but with the current state of thread scheduling technology, it would probably work better for the cores just to all to have one or the other. I guess the exception to that is if the user explicitly uses a tool like Process Lasso to herd all of a program's threads on to one CCD or another, but most users probably won't want to bother with that.

YSCCC said:
E cores in theory is easier to schedule, as it always perform worse than P cores.

Well, hyperthreading does add another wrinkle. However, Intel is removing that extra complexity from their hybrid CPUs.

bit_user · Sep 5, 2024

Pierce2623 said:
I’ll guarantee you that if we get benchmarks on those Bartlett Lake designs that are all P core with up to 12 p cores, they’ll perform better.

Better for games, but not for multithreaded compute tasks, like rendering and compilation.

Pierce2623 said:
The old tiny e cores wouldn’t have been replaced with cores wider than Redwood Cove in Lunar Lake and Arrow Lake if Intel considered them a success.

Eh, they might've widened the E-cores to reduce scheduling hazards, but it also could've been due to issues of thermal density they encountered when shrinking down E-cores of Gracemont-level complexity.

TheHerald · Sep 5, 2024

Pierce2623 said:
I’ll guarantee you that if we get benchmarks on those Bartlett Lake designs that are all P core with up to 12 p cores, they’ll perform better. The old tiny e cores wouldn’t have been replaced with cores wider than Redwood Cove in Lunar Lake and Arrow Lake if Intel considered them a success.

Maybe 12p cores will perform better in games than 8+16 but that doesn't change the fact that 8+16 perform better than 8+0

DS426 · Sep 5, 2024

Giroro said:
The Core Ultra 9 288V, has a nearly identical looking spec sheet to the Core Ultra 5 268V. That extra GPU core and minor clock boost might make their top tier product perform at best around 20% better than entry level in Integrated-graphics tier games, maybe 10% better elsewhere. It will probably be more significantly better at post-boost multi-core workloads (the kind you really shouldn't want to run on an ultrabook), thanks only to its much higher TDP. But, it might even have worse battery life.
They didn't even bother (or couldn't) to give it an extra boost in memory capacity. 32GB is not enough when you consider they expect this processor to be in $2000+ laptops.

If Intel thinks it will be able to demand "Core i9" premium pricing from customers with their top tier product, I don't think it's going to work out for them this gen.

Plus they need to radically overhaul how they put together these slides and present this kind of marketing information. It comes across as both unexciting, and untrustworthy.

I agree that it's strange that Intel thinks they have to have that many different SKU's and yet have such minor changes between models, especially Core 7 to 5 as you mentioned. Don't get me wrong, I appreciate that they didn't just completely gouge the bottom end, but it's apparent that Intel is mainly differentiating their performance market levels on AI performance rather than traditional CPU performance metrics.

I'd rather see Intel continuing to manufacturer at least some of their chip package with their own foundries as TSMC certainly needs the pressure of competition against them. Rather than shedding IFS, just slow down on new production coming online since we're seeing limited new customer adoption. They need a sufficient amount of scaling to make their R&D costs worthwhile, but I think they just projected way too high in production scaling. That said, yes, Intel is benefiting from fabbing from TSMC this go-around relative to their own fab capabilities.

TheHerald · Sep 5, 2024

Intel wasn't joking, battery life looks incredible

YSCCC · Sep 5, 2024

DS426 said:
I agree that it's strange that Intel thinks they have to have that many different SKU's and yet have such minor changes between models, especially Core 7 to 5 as you mentioned. Don't get me wrong, I appreciate that they didn't just completely gouge the bottom end, but it's apparent that Intel is mainly differentiating their performance market levels on AI performance rather than traditional CPU performance metrics.

I'd rather see Intel continuing to manufacturer at least some of their chip package with their own foundries as TSMC certainly needs the pressure of competition against them. Rather than shedding IFS, just slow down on new production coming online since we're seeing limited new customer adoption. They need a sufficient amount of scaling to make their R&D costs worthwhile, but I think they just projected way too high in production scaling. That said, yes, Intel is benefiting from fabbing from TSMC this go-around relative to their own fab capabilities.

jumping in AI at this moment of time seems quite wrong.. since 99.99% if not all consumers don't really have anything to do with their own AI at home, nor it will generate anything useful at a single or a few dozen CPU clusters for AI, all those kinda useful AI features runs in an array fo Nvidia GPUs costing more than lifetime savings of normal civilian, yet results are mostly "it helps, but still need fine tuning" level. Wasting development resources on this when the bubble is near if not already burst sounds like some kind of disaster coming to happen, it's also true for AMD AI line.

DS426 · Sep 5, 2024

YSCCC said:
jumping in AI at this moment of time seems quite wrong.. since 99.99% if not all consumers don't really have anything to do with their own AI at home, nor it will generate anything useful at a single or a few dozen CPU clusters for AI, all those kinda useful AI features runs in an array fo Nvidia GPUs costing more than lifetime savings of normal civilian, yet results are mostly "it helps, but still need fine tuning" level. Wasting development resources on this when the bubble is near if not already burst sounds like some kind of disaster coming to happen, it's also true for AMD AI line.

I agree and I think most of us, whether "average" consumer, prosumer/enthusiast, or what have you, agree. This is why it's just more facepalming and cringing as this over-marketing of AI marches on and just adds to more confusion for the customer on the weird model naming. Yes, AMD is mostly just as guilty, although specs actually have good spread between models.

danny009 · Sep 8, 2024

Still no apologise for claiming 10GHz Pentium 4 and claiming bunch of nonsense including "4 cores all you need" statement?

Nah hun, I'm with AMD now. Go away.

bit_user · Sep 8, 2024

danny009 said:
Still no apologise for claiming 10GHz Pentium 4

It's not like they ever sold a product which they claimed to run that fast. What they said was that they expected the architecture to scale up to such speeds. That might actually have been true, but the performance of their successive manufacturing process nodes became the limiting factor.

I sometimes wonder how well Pentium 4 would run on more recent process nodes. I'll bet it would have no trouble reaching 10 GHz on the Intel 7 node, if not also 14 nm or 22 nm (the best they actually tried was 65 nm and overclockers were able to get them to about 7 GHz). Still not terribly efficient, but at least it'd hit their design target.

thestryker · Sep 8, 2024

bit_user said:
I sometimes wonder how well Pentium 4 would run on more recent process nodes. I'll bet it would have no trouble reaching 10 GHz on the Intel 7 node, if not also 14 nm or 22 nm (the best they actually tried was 65 nm and overclockers were able to get them to about 7 GHz). Still not terribly efficient, but at least it'd hit their design target.

First CPUs to break 8Ghz on hwbot were 65nm Celeron Ds.

bit_user · Sep 8, 2024

YSCCC said:
jumping in AI at this moment of time seems quite wrong.. since 99.99% if not all consumers don't really have anything to do with their own AI at home,

I expect AI-enhanced apps to become commonplace, if not the norm, before the next upgrade cycle. So, for a lot of people, it probably does make sense to get a PC with such acceleration.

Have you seen demos of things like AI-based in-painting?

View: https://www.youtube.com/watch?v=Sp6K3qpVFO0

In a client PC, AI is also useful for much less glamorous things, like enhanced noise removal and background replacement, in video conferencing apps.

Even source code and document editors are starting to integrate AI features.

YSCCC said:
nor it will generate anything useful at a single or a few dozen CPU clusters for AI, all those kinda useful AI features runs in an array fo Nvidia GPUs costing more than lifetime savings of normal civilian,

You're thinking of model training. For inference, you don't need nearly so much compute power.

YSCCC · Sep 8, 2024

bit_user said:
I expect AI-enhanced apps to become commonplace, if not the norm, before the next upgrade cycle. So, for a lot of people, it probably does make sense to get a PC with such acceleration.

Have you seen demos of things like AI-based in-painting?

View: https://www.youtube.com/watch?v=Sp6K3qpVFO0

In a client PC, AI is also useful for much less glamorous things, like enhanced noise removal and background replacement, in video conferencing apps.

Even source code and document editors are starting to integrate AI features.

You're thinking of model training. For inference, you don't need nearly so much compute power.

I personally am using photo generative AI, but afaik they need to be used online, and seems like it’s the same speed on the decade old i5 and the 14900k, which, I suspect is just running via internet and do the AI (and at the same time training it based on options rejected or accepted), so client side AI capability on the chip is useless.

bit_user · Sep 8, 2024

YSCCC said:
I personally am using photo generative AI, but afaik they need to be used online, and seems like it’s the same speed on the decade old i5 and the 14900k, which, I suspect is just running via internet and do the AI (and at the same time training it based on options rejected or accepted), so client side AI capability on the chip is useless.

Probably because not enough PCs have the horsepower to inference locally and doing so would require downloading an AI model several GB in size.

Eventually, I expect these app companies would rather reduce their costs and offload the processing to clients. It could also result in a more responsive user experience and less sensitivity to connection issues.

YSCCC · Sep 9, 2024

bit_user said:
Probably because not enough PCs have the horsepower to inference locally and doing so would require downloading an AI model several GB in size.

Eventually, I expect these app companies would rather reduce their costs and offload the processing to clients. It could also result in a more responsive user experience and less sensitivity to connection issues.

It could be, but personally I think they would love to have millions of data input daily to let the ai model to evolve itself essentially free, and claim your agreement to let them use the data you input so they can issue revisions quick and cheaply rahther than let client side deal with it locally. And as consumer I rather not pay extra for a chip with ai focus just to use it.

only exception maybe for phones to have AI capabilities so for example it could help us find a way during travel.

bit_user · Sep 9, 2024

YSCCC said:
It could be, but personally I think they would love to have millions of data input daily to let the ai model to evolve itself essentially free, and claim your agreement to let them use the data you input so they can issue revisions quick and cheaply rahther than let client side deal with it locally. And as consumer I rather not pay extra for a chip with ai focus just to use it.

Data privacy is one thing I thought would be a selling point of client-side processing. For the software/service provider wanting to collect user data, they can probably reach a point of diminishing returns rather quickly, at which point the economics favor letting people run the models client-side. They still have options for collecting your data without doing the processing server-side, such as offering you a discount if you allow them to upload some of your data for training.

Another example of a client AI use case is video scaling. Like DLSS, except general-purpose. For games, you could use the NPU to relieve the GPU from doing it (or maybe it can't even, if it's an iGPU). However, you could use it for non-game applications, like youtube and other videos.

For document editing, MS and Google already offer predictive text. Being able to use a full LLM for that or code editing would be another case where the economics would heavily favor doing it client-side.

I think the argument we should wait until there's a killer app is a bit like saying that iGPU shouldn't do 3D rendering until it's needed for something the average user can't live without. TBH, I still couldn't really argue iGPUs need to accelerate 3D, but having it there opens the door for more apps to take advantage of it and lets more users play lighter-weight games, if they should so choose.

graham006 · Sep 14, 2024

thestryker said:
Intel 7, 4 and 3 are already used in shipping parts so that makes 3/5.

20A is cancelled since (according to Intel) 18A is healthy.
It remains to be seen if PTL-P parts shipping on Intel 18A will indeed improve on the stellar performance per watt of the N3B parts in Lunar Lake.
If there is a significant jump in performance per watt on Panther Lake then 1) That is going to be a scary good processor and 2) Intel will be back at manufacturing parity if not leadership with TSMC. PTL parts are scheduled for next year around this time.

bit_user · Sep 14, 2024

graham006 said:
20A is cancelled since (according to Intel) 18A is healthy.

That's a little simplistic. The plain of "5 nodes in 4 years" didn't seem to treat 20A as a contingency for problems in 18A. They really didn't plan to have all the measures in place for it to be treated as such (i.e. a complete cell library for 20A and the tools support needed to make it usable by external customers).

News Intel launches Lunar Lake: claims Arm-beating battery life, world’s fastest mobile CPU cores

Estimable

Respectable

Estimable

Titan

Estimable

Commendable

Titan

Titan

Respectable

Prominent

Respectable

Estimable

Prominent

Honorable

Titan

Judicious

Titan

Estimable

Titan

Estimable

Titan

Distinguished

Titan

Share this page