News Intel Panther Lake processors could pack up to 16 cores, maximum of four performance cores according to leak

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

rluker5

Distinguished
Jun 23, 2014
902
574
19,760
Sorry, but what does the number of P-cores have to do with ST performance? We've seen plenty of examples of CPUs with low base clocks, which is the lower-bound on all-core clocks, and much higher single-core maxes (plus turbo tables with high boost frequences for smaller numbers of cores).


For whom, though? Look at Meteor Lake, where Intel decided to use a 6P + 8E + 2LPE configuration, yielding 22 threads. This Panther Lake will have 4P + 8E + 4LPE, yielding 16 threads (not 20, due to the lack of HT in the P-cores). Intel has a recent track record of delivering more than 16 threads for laptop users, even in their mid-tier models.

A lot of software developers would tell you they can use more than 16 threads for building and testing their code. My dev box at work has 24 threads and I could certainly use more.


Sure, not every laptop CPU needs a ton of threads, but just because you have a bunch of cores doesn't mean it's inefficient. Look at this perf/W curve for Redwood Cove vs. Lions Cove:
2KrGtKU6cUPA3Psp3X3Hk9.jpg

The data is not presented with units, but perf/W curves like this are the norm. Performance always tapers off, as you near the high end of the power envelope. In this case, their hypothetical curve has the new P-cores delivering 84.3% as much performance at half of the peak power. So, if you start with a number of P-cores that can all reach peak GHz within the power envelope and then double the number, without increasing the power budget, you end up with 68.6% more performance. That's an artificial example, I know, but the phenomenon is real and explains how server CPUs are able to scale performance to hundreds of cores while only consuming a few times more power than the TDP of performance desktop CPUs.

I want to clarify that I'm not taking a firm stand that "6 P-cores good; 4 P-cores bad" (with apologies to George Orwell), but I think it's interesting and will be looking closely at their scaling data to see if they did it really because the marginal benefit of more P-cores would be negligible or maybe more for area/cost reasons. For one thing, I'm dying to know just how the area of Skymont cores compares to Lions Cove.

P.S. a thought that's been kicking around my head, for a while, is that maybe one of the reasons they decided to grow Skymont cores was for thermal density reasons. It'll be really interesting to see how much power they can each burn, once we know their area, and compare that to other CPUs on a similar node.
You generally get more single core performance at the same power consumption with fewer of the same cores. Cases where mobile chip users prioritize maximum total compute over single thread is the exception rather than the rule and your own chart shows that higher single thread performance can be reached by consuming more of the LIMITED power budget. Don't most people that need a lot of compute go with desktops when they need their stuff to get done quickly?

I'm guessing Intel wanted to prioritize the performance of things like MS Office and Teams while not sacrificing too much multithread to sell to the most people. It isn't going to be perfect for everyone, it just has to be the best choice for most.

Edit: They will still sell the less popular laptops with desktop CPUs and/or dgpus for those that want them. Not all of the silicon needs to be the same.
 
Last edited:
I understand the argument of the price of electricity or heat output. But that can be resolved by lowering the SKU of CPU (taking something less powerful).

I don't agree with the rest of your argument. CPU cores are not infinitly powerful. An e-Core capacity will stop well before the p-Core capacity is reached. You are not limited by the power sent to the CPU but by what the CPU cores are capable of.

It doesn't really matter if your cores are extremely efficient if they can only run at 1 Mhz and achieve little (reductio ad absurdum)
Find a single CPU from Intel or AMD that runs maximum boost during all core loads. This is of course impossible because neither one sells one which will without feeding it a lot more power. This is entirely the point being made: the CPU's real world performance is the important thing here. Unless you're doing an AVX512 workload there isn't a single thing Intel 12th Gen+ can't do that AMD can with their desktop SKUs. E-cores scale significantly better within the power envelope and die size available which is likely why Intel made the shift.
Computers are really running either a < 5% load (more like 1% load), a single core 100% load or at 100% load. In the first case there is not need for an hybrid architecture; in the second case you would only need 1 p-core, and in the last case you would need only a ton of e-core following your logic.
You literally just explained why a hybrid architecture can be beneficial... rarely do people specifically only do a single thing with their system.
 
  • Like
Reactions: TheHerald

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
It makes no sense on a desktop.

Why then just no pack the die with e-cores ? Why bother with p-cores at all ? If you get more performance per area, that what you should do. Obviously, they don't.
It doesn't make sense to have a faster cpu on a desktop? How so?

They don't pack a cpu with ecores cause that cpu would be slower on workloads that use less than 8 cores. If all of your workloads scaled on n threads then a full ecore cpu would be better, but that is rarely a desktop usecase.
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
Find a single CPU from Intel or AMD that runs maximum boost during all core loads. This is of course impossible because neither one sells one which will without feeding it a lot more power. This is entirely the point being made: the CPU's real world performance is the important thing here. Unless you're doing an AVX512 workload there isn't a single thing Intel 12th Gen+ can't do that AMD can with their desktop SKUs. E-cores scale significantly better within the power envelope and die size available which is likely why Intel made the shift.

You literally just explained why a hybrid architecture can be beneficial... rarely do people specifically only do a single thing with their system.

I run workloads at 100% CPU utilisation every day (12 cores).

You keep coming back to efficiency, which I explained is irrelevant on a desktop.

You skipped all my arguments.

If e-core are that much better, there is no reasons not to put all e-cores.

The fact is e-cores are efficient but less powerful version of regular cores, and their efficiency is their only benefit. As I said, you will encounter a lot of workloads where the threads are not completely independent and must communicate between themselves. Fast threads will end up waiting for slow threads and it will act as you had only e-cores. Or you don't use e-cores at all and then you'd wish you had all fast cores.

In most uses cases I described, there is no need for an hybrid architecture.

The need for an hybrid architecture comes only from trying to better battery performance not to look ridiculous vis-a-vis ARM (ie Apple).

People are not multitaskers, while their computer maybe. They mostly are doing a single task on their computer, and other threads are largely inconsequential in computational needs.
 
Last edited:

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
Of course there is, 3 people have already explained it to you.

People have repeated Intel marketing for sure.

And I explained several times why it is BS.

There is almost no workload in wich an hybrid architecture makes senses (other than for economising your battery).

I guess Intel just discovered the benefits of an hybrid architecture, years after it was used on mobile phones. Probably because they didn't know before, they didn't check it or they just couldn't do it themselves, for some reasons.

Please ...

Also, you skipped all my arguments.
 
Last edited:
I run workloads at 100% CPU utilisation every day (12 cores).
I'm sure it's running at 100% utilization, but that has literally nothing to do with the CPU clockspeeds.
You keep coming back to efficiency, which I explained is irrelevant on a desktop.
Explain how it's irrelevant then. You don't seem to understand power limits or boost clocks, but by all means I'd love to see your explanation.
The fact is e-cores are efficient but less powerful version of regular cores, and their efficiency is their only benefit.
Yes space efficiency and absolute power consumption are the benefit of E-cores (they're not more power efficient at the clockspeeds they tend to run at).
As I said, you will encounter a lot of workloads where the threads are not completely independent and must communicate between themselves. Fast threads will end up waiting for slow threads and it will act as you had only e-cores. Or you don't use e-cores at all and then you'd wish you had all fast cores.
Find even one example of this as I'd love to see it.
 

bit_user

Titan
Ambassador
It makes no sense on a desktop.
I don't know how to explain it any better. Maybe someone else can try.

Why then just no pack the die with e-cores ? Why bother with p-cores at all ? If you get more performance per area, that what you should do. Obviously, they don't.
Intel and AMD are doing this with some of their server CPUs. Intel also has some E-core -only laptop CPUs (which can also be found in some mini-PCs).

However, the reason not to just use E-cores everywhere is that single- and lightly- threaded performance is still very important for client workloads.

Computers are really running either a < 5% load (more like 1% load), a single core 100% load or at 100% load. In the first case there is not need for an hybrid architecture; in the second case you would only need 1 p-core, and in the last case you would need only a ton of e-core following your logic.
It's not like apps are either single-threaded or heavily multi-threaded. Plenty of them use a few threads, and that's why having a few P-cores makes sense. Particularly games, which are latency-sensitive and most aren't really designed to utilize E-cores very effectively.

Having an hybrid achitecture means really that you CPUs can't be fully utilised when you want to be fully utilised. If you have more time-sensitive threads than you have p-cores, your p-cores will act as e-cores because they will be waiting for them. Or you don't use the e-core at all in this case.
It's not as if most apps will go out of their way to avoid using E-cores. If they don't then the OS will just schedule them to run on whatever core is available, typically preferring a P-core (unless this is a background task or your power-saving settings are maxed out).

The reason this exists on a desktop is because they are not designing SKUs for desktop but for laptops, which is a much bigger market. The rest is mostly marketing.
It's the other way around. If we take the example of Alder Lake, Intel made three laptop dies and two desktop dies.

First, the desktop dies are commonly referred to as C0 (8+8) and H0 (6+0) steppings:

khWzHTi8aooCAqRE9rbDwP.png


Next, the main laptop dies are P (6+8) and U (2+8), but they also reused the C0-stepping desktop die for H-class processors (don't confuse this with the stepping number, mentioned above):

12th-gen-intel-core-mobile-media-and-analyst-deck_03.jpg


Finally, there's the N-series budget laptop die (0+8), which consists of only E-cores:

h4_blockdiagram_rev0.1.png

 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
I'm sure it's running at 100% utilization, but that has literally nothing to do with the CPU clockspeeds.

Explain how it's irrelevant then. You don't seem to understand power limits or boost clocks, but by all means I'd love to see your explanation.

Yes space efficiency and absolute power consumption are the benefit of E-cores (they're not more power efficient at the clockspeeds they tend to run at).

Find even one example of this as I'd love to see it.

My CPU is running at 100% utilization, all 12 cores.

I DONT CARE WHAT POWER IT USES

I JUST WANT IT TO GO AS FAST AS POSSIBLE IN THAT CASE

The rest is irrelevant.

Threads have been communicating for a long time. Far before the start of the millenium.
 

bit_user

Titan
Ambassador
People are not multitaskers, while their computer maybe. They mostly are doing a single task on their computer, and other threads are largely inconsequential in computational needs.
Many people will have a few web browser windows open, at any given time. These often chew up some CPU time.

My work PC also runs varying amounts of annoying services, in the background. Stuff like security scans and I have no idea what else, but it's not uncommon to se several cores get consumed that way.

My CPU is running at 100% utilization, all 12 cores.

I DONT CARE WHAT POWER IT USES

I JUST WANT IT TO GO AS FAST AS POSSIBLE IN THAT CASE
If those cores can each use 35 W, then I already said you're talking about 420 W. You don't have a 420 W-capable CPU cool. I'm sure of that.

So, that means it's power-limited (or thermally; which isn't much different, in practice). If it's power-limited, then you could get more performance, in the same power budget, by having more cores. That's because perf/W scales at a sub-linear rate. So, you would benefit from having a 16-core CPU, even if the TDP were unchanged. However, 16 P-cores would cost a lot of money, if we're talking Intel. It's cheaper to make some of them E-cores, even though they're individually slower.
 
Last edited:

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
I don't know how to explain it any better. Maybe someone else can try.


Intel and AMD are doing this with some of their server CPUs. Intel also has some E-core -only laptop CPUs (which can also be found in some mini-PCs).

However, the reason not to just use E-cores everywhere is that single- and lightly- threaded performance is still very important for client workloads.


It's not like apps are either single-threaded or heavily multi-threaded. Plenty of them use a few threads, and that's why having a few P-cores makes sense. Particularly games, which are latency-sensitive and most aren't really designed to utilize E-cores very effectively.


It's not as if most apps will go out of their way to avoid using E-cores. If they don't then the OS will just schedule them to run on whatever core is available, typically preferring a P-core (unless this is a background task or your power-saving settings are maxed out).


It's the other way around. If we take the example of Alder Lake, Intel made three laptop dies and two desktop dies.

First, the desktop dies are commonly referred to as C0 (8+8) and H0 (6+0) steppings:
khWzHTi8aooCAqRE9rbDwP.png

Next, the main laptop dies are P (6+8) and U (2+8), but they also reused the C0-stepping desktop die for H-class processors (don't confuse this with the stepping number, mentioned above):
12th-gen-intel-core-mobile-media-and-analyst-deck_03.jpg

Finally, there's the N-series budget laptop die (0+8), which consists of only E-cores:
h4_blockdiagram_rev0.1.png


You may notice that I talk about Desktop CPUs, not about laptop (which need efficiency) nor servers (which are multithreaded by nature).

There is no use, on a desktop PC, of an hybrid architecture.

Because you are either :
-at idle or almost
- 100% on a single thread
- on a multi-threaded applications, where there is no need for different cores (either slow or fast).

People are not multitaskers.

They use only one application at the time.

If your background tasks takes more than 1% of your CPU, you're doing it wrong and that's on you. Also, with all the computing power there is available today, you shouldn't need tons of extra cores to run you OS. Just saying.
 
Last edited:

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
You may notice that I talk about Desktop CPUs, not about laptop (which need efficiency) nor servers (which are multithreaded by nature).

There is no use, on a desktop PC, of an hybrid architecture.

Because you are either :
-at idle or almost
- 100% on a single thread
- on a multi-threaded applications, where there is no need for different cores (either slow or fast).

People are not multitaskers.

They use only one application at the time.
Surely you do realize that the majority of desktop workloads do not scale at n cores, right? Games, browsing, excel spreadsheets, all of the content creation stuff from Adobe (photoshop, light room, premiere) etc only use one or a few cores.

So for those tasks, having pcores is better than having ecores. Since those are the majority of workloads on a desktop chip, not having any pcores on a desktop chip would be stupid.

For the rest of the workloads that do in fact scale to n cores, filling the die with ecores is better than filling it with pcores, since performance per mm is higher.

If we take the 12900k as an example, it could either be a 10+0 chip or an 8+8. The only scenario where a 10+0 chip would be faster than an 8+8 would be a workload that uses more than 10 cores but less than 12. Do workloads like that exist and how many are there? In every other scenario 8+8 is faster.

Since I have a 12900k I can simulate a 6+8 chip vs an 8+0, can you give me some workloads that the 8+0 would be faster so I can try and clarify it for you?
 
Threads have been communicating for a long time. Far before the start of the millenium.
I asked you for a single example of E-cores causing threads on P-cores to run slower and this is what you respond with? Hard to take you seriously.
My CPU is running at 100% utilization, all 12 cores.

I DONT CARE WHAT POWER IT USES

I JUST WANT IT TO GO AS FAST AS POSSIBLE IN THAT CASE

The rest is irrelevant.
You don't seem to understand how anything regarding CPUs work and don't seem to care to. CPUs have power limits unless they're removed in which case thermal limits. All CPUs limit their boost clocks because of power/heat restrictions. If Intel were to have used all P-cores the boost clocks would have been significantly reduced due to these restrictions.

Here's a real world example from Intel and AMD (notice the clockspeed going down as the core usage goes up):
 

bit_user

Titan
Ambassador
you will encounter a lot of workloads where the threads are not completely independent and must communicate between themselves. Fast threads will end up waiting for slow threads and it will act as you had only e-cores.
There's a little truth to this point, but there's a lot you're glossing over.
  1. Developers try to minimize communication between threads, because it adds overhead and contention (where one thread is waiting on another) limits the amount of speedup available from multi-threading.
  2. When one thread is blocked on another, the CPU core it was running on becomes available. This means something else could use it, including the thread it's blocking on. Thread migration sounds expensive, but it's really not if you're not doing it millions of times per second.
  3. When one thread blocks on another and there's nothing else that can utilize its CPU core, the core idles and that frees up power budget that can potentially be used to run other cores at higher speed (including the one running the thread that the first core is blocking on.
  4. There could easily be a thread on an E-core that's waiting on a thread running on a P-core. In this case, the P-core thread would block the E-core thread for less time. This is the converse of what you said.
  5. Threads are often not symmetric, anyhow.

Expanding on point #1, if your threads are spending most of their time waiting on each other, then you're already getting terrible performance, even if they're all running on P-cores.

We also haven't really touched on hyperthreading/SMT, but Intel said of Alder Lake that two E-cores are faster than putting those two threads on the same P-core.
 

bit_user

Titan
Ambassador
And I understand quite well that all threads have to synchronize eventually, and having different speeds of threads is stupid.
So, then you must be disabling HyperThreading or SMT for quite some time, now! Because, that's exactly what it does. When a core is running 2 threads, each of those threads is slower than one running on a core by itself.

As a matter of fact, Intel said the performance discrepancy between P and E cores is less than between a P core with half vs. full occupancy!
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
Expanding on point #1, if your threads are spending most of their time waiting on each other, then you're already getting terrible performance, even if they're all running on P-cores.

We also haven't really touched on hyperthreading/SMT, but Intel said of Alder Lake that two E-cores are faster than putting those two threads on the same P-core.

Yeah, imagine that : Intel saying good things about itself. It's like we are in another dimension.
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
I already asked you to give me the worst case scenario so I can test between 6+8 vs 8+0. You completely skipped over that. Clearly this is trolling now.

You are asking for work and I'm not going to work for you if you don't pay me.

Consider that trolling if you want. I DON'T CARE.
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
Then maybe you should stop talking about things you don't understand.

PS: scheduling is a thing and since you claim to know software you should be aware of it.

I love you too.

A CPU in itself does nothing. Did you know that ?

It's just a piece of inert material.

And none of what you said invalidated anything I said.

You're are trying to spout Argument from authority

And scheduling, yes ... but when your display thread is waiting for your physics thread, it has to wait ... because it doesn't know where the objects are. That its physical thread is whisked away is irrelevant : it still can't draw.
 
Last edited:

TheHerald

Respectable
BANNED
Feb 15, 2024
1,633
502
2,060
You are asking for work and I'm not going to work for you if you don't pay me.

Consider that trolling if you want. I DON'T CARE.
I'm going to do the work, I'm asking you about a workload that a 6+8 would be worse than 8+0 so I can test it, since you clearly know what you are talking about it wouldn't be hurt to give me a specific workload, is it?
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
In most segments, of course it is.

Efficiency as everything else is measured ceteris paribus. Is it's either iso power or iso. In both methods intel is lightyears ahead.

Efficiently going to the unemployment office. Lightyears ahead.
 

MacZ24

Proper
BANNED
Mar 17, 2024
79
80
110
I'm going to do the work, I'm asking you about a workload that a 6+8 would be worse than 8+0 so I can test it, since you clearly know what you are talking about it wouldn't be hurt to give me a specific workload, is it?

I have absolutely no confidence in you being truthful about the results.

I won't bother.

You miss the point of a forum. By a lightyear (ahead).
 
  • Like
Reactions: YSCCC

Thunder64

Distinguished
Mar 8, 2016
201
285
18,960
In most segments, of course it is.

Efficiency as everything else is measured ceteris paribus. Is it's either iso power or iso. In both methods intel is lightyears ahead.

If you really believe that, I don't know what to say. Even the most realistic die hard Intel people say the performance is there but not the efficiency. And they believe 20A/18A will fix that. We should have Lunar Lake news in a few hours. That will tell us something.
 
  • Like
Reactions: YSCCC
Status
Not open for further replies.