News Core i9-14900K, Core i7-14700K CPUs Benchmarked

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

flagrantvagrant

Prominent
Jul 15, 2022
9
7
515
There is Thermal Velocity Boost and there is Turbo Boost Max 3.0 they are not one thing.
Also if their CPUs could suddenly handle 20% more power it would be a miracle of engineering and would result in their FABs to be declared as GOD level.
There are other FABs that make CPUs that melt and/or blow up at ~250W and need water cooling even for below 230W.
Yeah, totally doesn't suck 20% more power. 12900K capping around 240-260 watts with whatever boosting alg it was using, TVB2 pt whatever; 13900K pulling an extra fifty watts with the TVB3 PL1/PL2 LOCKED (I don't care what stupid name they change it to, it's still the same coprolite on a shingle, just with extra coprolite, or less actual algorithmic finesse in the boosting alg). And when TEE VEE BEE THREE is set to unlocked it goes to OVER 350 WATTS. Yes, very god like. At least you won't need to buy a space heater for the cold months. Because TEE VEE BEE THREE is a hyper simplification of TVB2. Its only throttle in unlocked mode is COOLING DISSIPATIVE CAPACITY. In other words, it's limit is dictated whole by an array of on-die thermal sensors. More/better cooling capacity = more power dumped in to the chip, until it hits a thermal safety wall and starts throttling; thus my 280x280 AIO (using 4x Noctua 140mm 3k RPM PPC industrial fans) driven by an outdoor fountain pump (because they have much higher water column driving capacity) and given enough tubing length to suspend the monstrosity in another room (or drill holes in the floor if you have a basement to run the thermal exhaust in an already cooler environment and not use your HYPER ULTRA 560 POWER PUMP AIO [TM I declare copyright on that name] as a secondary space heater) then hang it from a crossbeam and cinch the hoses through the holes in the floor, you too will be able to drive a 400 watt circuit breaker popper. Because let's not forget your 600 watt 4090 and the other 200 watts of overhead. (I hope people out there expecting to make a monstrosity like this know basic house electrical wiring principles and how to run NM cable and get the right circuit breaker and know how to knock out dry wall to install an electrical socket for a dedicated circuit just for these machines; I know I did a few years ago when I saw where power consumption was going on high end PC parts.)
 
  • Like
Reactions: hyperflux
Yeah, totally doesn't suck 20% more power. 12900K capping around 240-260 watts with whatever boosting alg it was using, TVB2 pt whatever; 13900K pulling an extra fifty watts with the TVB3 PL1/PL2 LOCKED
Only schockingly it doesn't.
The 12900k was limited at 241w while the 13900k is limited at 251w.
 

M42

Reputable
Nov 5, 2020
99
48
4,560
The introduction section, which I quoted from, discussed both 10.1 and 10.2. The discusison of the "Converged implementation" was not restricted in scope to 10.1. If you look at Table 1-3 (Feature Differences Between Intel® AVX-512 and Intel® AVX10), it even provides a 5-way comparison between AVX-512, AVX10.1/256, AVX10.2/256, AVX10.1/512, and AVX10.2/512. So, the document was clearly written with full knowledge of what's coming in AVX10.2.

If Intel were planning on enabling 512-bit on only some cores in P+E-hybrid CPUs, they would be crystal clear about that fact! The reason being that they would want software developers and their apps to be ready, which is why Intel goes to the trouble of circulating these specifications well in advance of any products reaching the market.

Instead, what we got was a very clear statement to the contrary, which I'll reiterate:
"This converged version will be supported on both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length"

They're not going to say that, and then turn right around and contradict themselves a generation later. Especially when it would be a extremely nontrivial undertaking for apps to effectively use different vector widths on different cores. If that's what they're planning, then they would leave the door open to different vector widths, rather that taking such a clear stance.
Well, you can view it that way, but it is NOT cast in stone. Intel is leaving an opening for AVX10.2/512 on p-cores. Look at Figure 1.2 in section 1.5 at this link, which in the AVX10.2 column clearly indicates all p-cores and e-cores get the AVX-10 instructions and there is an optional FP/Int 512-bit variant:
 
I'm not confusing anything.

Let's leave aside obsolete products and focus on modern cores, shall we?

In modern cores, HyperThreading typically is not a big win for floating-point workloads, and sometimes even a detriment, as this clearly shows:
127436.png

Compare 73.92 (8P2T + 0E) vs. 72.38 (8P1T + 0E). That's a meager 2.1% benefit.
And 74 to 79 is a meager 6.5% even though in the article they say that the e-cores are like 50% the performance of the p-cores with htt enabled, even in the graph the e-cores alone are at 38 but together with the p-cores they only add 5 and drop to adding only 6.5%
At full load power for each core drops significantly, for ryzen as well, resulting in all kind of messy numbers.
 

bit_user

Titan
Ambassador
Well, you can view it that way, but it is NOT cast in stone. Intel is leaving an opening for AVX10.2/512 on p-cores. Look at Figure 1.2 in section 1.5 at this link, which in the AVX10.2 column clearly indicates all p-cores and e-cores get the AVX-10 instructions and there is an optional FP/Int 512-bit variant:
Yes, I had seen that diagram and indeed that document. Yes, the ISA allows it, and maybe someday there will be a hybrid CPU which implements AVX10.x/512 on all cores.

What seems clear to me is that, with their stance on 256-bit "Converged Implementation" being applied symmetrically to the P-cores and E-cores of hybrid CPUs, they are avoiding having 512-bit on P-cores until they can also enable it on E-cores.

What's disturbing is that Intel is creating a situation where app developers for client machines are incentivized only to write AVX10/256 code. So, there won't be much in the way of application code that stands to benefit from enabling 512-bit on client processors. That makes it less likely that they'd eventually do so.

In contrast, ARM's SVE (Scalable Vector Extensions) supports variable length implementations. So, you can write code which automatically benefits from whatever with the given CPU supports. However, in the case of AVX10, the vector width is baked into the instruction opcode. The only way to make it support different implementation widths is to compile the code multiple times for each supported width, and dispatch to the proper version at runtime. While that's entirely feasible, developers tend to be lazy and few will bother to do it.
 

bit_user

Titan
Ambassador
And 74 to 79 is a meager 6.5% even though in the article they say that the e-cores are like 50% the performance of the p-cores with htt enabled,
First, you should round consistently. In actual fact, 79.76 / 73.92 -> 7.9% faster.

Second, you're only looking at float. If we're trying to answer the general question of how much the E-cores help, then we can also look at int. The reason I previously restricted the discussion to float was to focus on what insights the this data could shed onto the hyperthreading aspect of the TechPowerUp efficiency test, which was based on CineBench. However, if we're looking at the general question of hybrid performance, then you should also acknowledge that the E-cores add 25.9% more integer performance!

Third, we know that Alder Lake tends to be power-limited, in heavily-threaded workloads. So, the E-cores are limited in what they can add, which is probably one of the bigger factors holding back the float workload (float workloads tend to use more power).

Fourth, we know that Alder Lake has a penalty for enabling the E-cores, in that the ring-bus speed is decreased to what the E-core clusters can handle. This has either been mitigated or eliminated in Raptor Lake, which is another reason it would be better to look at Raptor Lake.

Finally, without looking at how linear scaling is before you add the E-cores, we cannot know how well the workload would scale to more cores/threads of either kind.

Conclusion: you're reading far too much into a pair of datapoints. If CPU analysis were that simple you wouldn't have blogs like Chips & Cheese writing at least 5, long, detailed articles to try and tease out all the nuances of Alder Lake.

even in the graph the e-cores alone are at 38 but together with the p-cores they only add 5 and drop to adding only 6.5%
Yes, exactly. Their standalone performance should tell you that scaling is being artificially hampered by some of the factors I mentioned, above.
 
Last edited:
Third, we know that Alder Lake tends to be power-limited, in heavily-threaded workloads. So, the E-cores are limited in what they can add, which is probably one of the bigger factors holding back the float workload (float workloads tend to use more power).
Ahhh! But you think that HTT is magic and doesn't need more power right?!
Also you think that an idealized workload that pushes as much instructions through a CPU as possible will be the same as a workload that actually has some purpose and will be forced to only use as many instructions as it can use?!
(Leaving much more instructions empty for HTT to do its work)
 
These tests seem to measure only package power, showing the E-cores do quite well for most of their frequency envelope:
image-17-1.png
image-18-1.png


But the P-cores have more absolute performance. So, when you're looking at perf/(W_cores + W_system) the numerator is big enough to overcome the W_system term of the denominator. I know the test isn't exactly defined that way, but just to illustrate my point.
So you show it yourself, the e-cores are only better if you run at upto 3-3.5Ghz so at default speeds they DEcrease the efficiency of the whole CPU especially in the libx264 one, how do you link a test without even reading it and only linking the pictures that you like?!
But then again you do this all the time, you just post a wall of text and hope that people just get bored with you...

"This looks bad for the Gracemont based E-Cores. According to Intel, they can’t beat the P-Cores at any power level, meaning the E-Cores are only efficient in terms of area. Anyway, let’s run our own tests.

With a vectorized workload, Gracemont only beats Golden Cove when running at ultrabook-throttlefest speeds and drawing under 6W. Remember that Gracemont isn’t optimized for 256-bit vectors. That’s over 17% of instructions in libx264, so Gracemont is not having a good time. It looks much better with a pure integer workload:"

" Golden Cove can be efficient too – just not at stock. Between 3 and 4 GHz, these P-Cores can give the E-Cores a run for their money. In an integer workload, Golden Cove consumes about the same amount of total energy while completing the task faster. With a vectorized load, Golden Cove finishes the task so much faster that it ends up using less total energy than Gracemont, even though Gracemont draws less power That means running Gracemont above 3.2 GHz is pointless if energy efficiency is your primary concern. Running the E-Cores at 3.8 GHz basically makes them worse P-Cores. But that’s exactly what Alder Lake does by default."

The e-cores are only good at extracting a bit more performance at extremely low power. As you can see at 5.5W per 4 cores the p-cores already start to dominate the e-cores.


image-151.png
 

tamalero

Distinguished
Oct 25, 2006
1,192
211
19,670
The X3d parts are junk and prone to catching on fire or frying your motherboard. Check the video by gamers Nexus where this problem with X3d parts is shown to be relatively easy with overclocking

We appreciate your enthusiasm, but you might want to re-check the video you mention to get your facts straight.
Because its the opposite... motherboard manufacturers giving way too much power beyond specs to the AMD chip.
 
  • Like
Reactions: bit_user
Because its the opposite... motherboard manufacturers giving way too much power beyond specs to the AMD chip.
Thats been happening since forever, that's the only reason that intel CPUs show up with 315-330W power draw at "stock" in benches very often, intel never had CPUs burning up despite having them tortured at well above 30% above warranty levels of power.

If AMD can't make a product that can handle available mobos then there is no reason to protect them, they failed at making a product that is up to specs and had to reduce the specs to keep it from happening.

Imagine if intel had forced every mobo maker to release a bios that cuts you off at their warranty limit and would then still call them k CPUs...no actually months after release they would reduce the warranty limit even more and force it upon everybody.
 
  • Like
Reactions: thestryker
So you show it yourself, the e-cores are only better if you run at upto 3-3.5Ghz so at default speeds they DEcrease the efficiency of the whole CPU especially in the libx264 one, how do you link a test without even reading it and only linking the pictures that you like?!
But then again you do this all the time, you just post a wall of text and hope that people just get bored with you...

"This looks bad for the Gracemont based E-Cores. According to Intel, they can’t beat the P-Cores at any power level, meaning the E-Cores are only efficient in terms of area. Anyway, let’s run our own tests.

With a vectorized workload, Gracemont only beats Golden Cove when running at ultrabook-throttlefest speeds and drawing under 6W. Remember that Gracemont isn’t optimized for 256-bit vectors. That’s over 17% of instructions in libx264, so Gracemont is not having a good time. It looks much better with a pure integer workload:"

" Golden Cove can be efficient too – just not at stock. Between 3 and 4 GHz, these P-Cores can give the E-Cores a run for their money. In an integer workload, Golden Cove consumes about the same amount of total energy while completing the task faster. With a vectorized load, Golden Cove finishes the task so much faster that it ends up using less total energy than Gracemont, even though Gracemont draws less power That means running Gracemont above 3.2 GHz is pointless if energy efficiency is your primary concern. Running the E-Cores at 3.8 GHz basically makes them worse P-Cores. But that’s exactly what Alder Lake does by default."

The e-cores are only good at extracting a bit more performance at extremely low power. As you can see at 5.5W per 4 cores the p-cores already start to dominate the e-cores.


image-151.png
You're both talking circles around each other at this point. Golden Cove has much higher IPC than Gracemont so if it wasn't winning when both are running out of the peak efficiency curve something would be very wrong. Gracemont also doesn't have anywhere near the FP/Vector scheduler capability of Golden Cove so you have wider gaps in certain workloads. What you end up with is situationally better efficiency for each type of core.

Now I certainly agree the only reason Intel uses the e-cores currently is due to area as it has nothing to do with operating efficiency or they'd be more concerned about running them in a better power envelope. This is a problem that both Intel and AMD have made for themselves by chasing that top position. Hopefully around the time of Zen 5 things might have changed a bit, but honestly at this point I doubt it.
 
What you end up with is situationally better efficiency for each type of core.
And the only way to extract that would be for the thread director to send workloads to the appropriate core that would get you more efficiency.

E-cores will never show any benefit to power-efficiency in benchmarks that always run the same instruction all of the time because they are never the most power-efficient unless the whole CPU is running at super low power, with 5w for transcode and 15w for zip, per 4 cores, there is never too little power available for the p-cores to not be more power-efficient otherwise.
 

bit_user

Titan
Ambassador
Ahhh! But you think that HTT is magic and doesn't need more power right?!
Whether or not it's power-limited, it added far less performance than the E-cores. That performance question is ultimately what dictates efficiency. So, in this context, the reason HT didn't add significant performance is somewhat irrelevant.

In the abstract, it would be fair to revisit the question of how beneficial it is in non- power-limited floating point workloads.

Also you think that an idealized workload that pushes as much instructions through a CPU as possible will be the same as a workload that actually has some purpose and will be forced to only use as many instructions as it can use?!
I don't follow. SPECbench uses real-world apps and workloads. It's not synthetic.

(Leaving much more instructions empty for HTT to do its work)
CPU pipeline occupancy isn't the only thing determining HTT's impact. Increasing the number of threads also increases cache contention. And, if memory bandwidth is really the bottleneck, then simply adding more threads probably won't help.

It's a complex subject. You won't find generalizations which apply in all cases. Even in the SPEC2017 fp scores, you can find some places where HTT provided worthwhile benefits, if you examine the subscores.
 

bit_user

Titan
Ambassador
So you show it yourself, the e-cores are only better if you run at upto 3-3.5Ghz so at default speeds they DEcrease the efficiency of the whole CPU especially in the libx264 one,
That's a mistaken assumption. What the graph is showing is that, if you have a single thread and want it to run with greatest efficiency for a given performance level, the point where you'd want to switch it over to a P-core is at 3.1 GHz.

However, that's not how these CPUs operate in practice. If you really cared about efficiency, you'd only run the E-cores at 1.1 GHz and the P-cores at 1.4 GHz.

Furthermore, the graph shows that if you run a P-core above 4.4 GHz, its efficiency is worse than the worst-case of the E-core. So, if you're running your P-cores at 4.4 GHz, then you can add more E-cores @ 3.8 GHz without hurting overall efficiency!! If the P-cores are running even faster or the E-cores are running even slower, then the E-cores would improve overall efficiency!

how do you link a test without even reading it and only linking the pictures that you like?!
This is getting pretty close to crossing the line to a personal attack. You don't know that I didn't read it. Until we discuss it, you can't know why our interpretations of it differ. Accusing me of not reading it is therefore a baseless accusation.

As I've explained, the data is complicated and nuanced. You have to think about what it's telling you. Even then, there's lots of assumptions all around.

But then again you do this all the time, you just post a wall of text and hope that people just get bored with you...
No, what I do is try to get to the heart of the matter. I try to be data-driven, not agenda-driven. I'm not someone who just says "brand X sucks". I want to look at the data and analysis and understand what it's telling us. If you push for a simplistic conclusion where one might not exist, then you're going to get frustrated. That's on you, not me.

"This looks bad for the Gracemont based E-Cores. According to Intel, they can’t beat the P-Cores at any power level, meaning the E-Cores are only efficient in terms of area. Anyway, let’s run our own tests.
As I explained above, they're only talking about peak-efficiency. That's not how either core is actually used, in practice.

If you look at the graph and think about what it's actually telling you, Gracemont's worst efficiency is about 0.22 FPS/W. However, there's a long part of its curve where it delivers 0.34 FPS/W. Golden Cove (x4) can't go past about 22.5 W before it gets less efficient than that! That's only 6.75 W per Golden Cove, before the Gracemont cores start to become more efficient, at a decent performance level.

In the case of an integer workload - which you conveniently opted not to cite, Gracemont did even better.

Furthermore, that's where they started the article! If that pair of graphs was all there was to say about the matter, they would've ended the article right there!

"if energy efficiency is your primary concern"
This should've been your boldest quote. If energy efficiency were your primary concern, you'd just run the E-cores at 1.1 GHz and the P-cores at 1.4 GHz. Once you try to scale performance, the addition of E-cores can improve overall efficiency at all points on the scale. Whether they're actually operated in that way is another matter.
 
Last edited:
This is getting pretty close to crossing the line to a personal attack. You don't know that I didn't read it.
e-core gets less than 6FPS at ~27W p-core gets more than 8FPS at the same ~27W that is more than 33% better efficiency that you were trying to discard with this article...
Actually 33% is way more than the 20% that I was stating but then again different software different results.
Or 25% less if you go down from 8 to 6, either way it's more than 20%
It's impossible to go through the slides and not see that.
image-151.png


That's a mistaken assumption. What the graph is showing is that, if you have a single thread and want it to run with greatest efficiency for a given performance level, the point where you'd want to switch it over to a P-core is at 3.1 GHz.

However, that's not how these CPUs operate in practice. If you really cared about efficiency, you'd only run the E-cores at 1.1 GHz and the P-cores at 1.4 GHz.
The graph you are talking about is still about a cluster of four cores so not one thread but at least four.

And sure ok, if you want to state that in the future I'm 100% with you because it's at least correct but just saying that e-cores are more efficient is outright 100% wrong, unless you say at below 5-15W per 4 cores or at below 1.1Ghz with it.
Furthermore, the graph shows that if you run a P-core above 4.4 GHz, its efficiency is worse than the worst-case of the E-core. So, if you're running your P-cores at 4.4 GHz, then you can add more E-cores @ 3.8 GHz without hurting overall efficiency!! If the P-cores are running even faster or the E-cores are running even slower, then the E-cores would improve overall efficiency!
Nope, e-cores don't go to 4.4 so this is completely irrelevant.

Also you are back to overall efficiency, as I said before you are confusing the discussion with interchanging the whole CPU and individual cores.
The e-cores might add efficiency to a CPU as a whole under some circumstances (changing the ratio of e to p cores) but the e-cores are not more power-efficient than the p-cores, unless you go below 1.1Ghz which nobody ever does.
 

bit_user

Titan
Ambassador
e-core gets less than 6FPS at ~27W p-core gets more than 8FPS at the same ~27W that is more than 33% better efficiency that you were trying to discard with this article...
As I carefully explained, the graph can be misleading, if you don't think about the way cores in a hybrid CPU are used in practice. For instance, one thing that graph shows is that the P-cores @ max power get only 0.14 FPS/W, which is far worse than worst-case 0.22 FPS/W the E-cores deliver. And you don't have to run the E-cores at their max power, either.

Hybrid core scheduling is complex. In the best case, some of these graphs (i.e. the core-power ones) are fairly raw data that you would feed into such a scheduler, so that it can dial in the optimal combination of core clocks for a given power target. That's all it is. Trying to read more into it will almost certainly lead you to bad conclusions.

Also, that graph is not a cores vs. cores comparison, but rather an artificial CPU vs. CPU comparison. Because it measures package power, it's comparing a hypothetical 4x Gracemont CPU (with all the extra baggage of Alder Lake-S) vs. a 4x Golden Cove CPU. That extra overhead is what's dragging down the apparent efficiency of Gracemont vs. the graphs I quoted that show core power.

Actually 33% is way more than the 20% that I was stating but then again different software different results.
The two are further apart than that. They're also measuring package vs. system power and groups of 4 cores.

The graph you are talking about is still about a cluster of four cores so not one thread but at least four.
My statement roughly holds, as far as looking at cross-over points. If it were core vs. core, then we should expect to see a crossover at about the same spot.

saying that e-cores are more efficient is outright 100% wrong, unless you say at below 5-15W per 4 cores or at below 1.1Ghz with it.
It's not wrong, if you let the P-cores go above 4.4 GHz. At that point, it becomes more efficient to add further performance via the E-cores running at any clock they support!

image-17-1.png


I think this part needs to be emphasized: it's not an either/or scenario. You can continue to operate E-cores in their efficiency sweet-spot, regardless of what the P-cores are doing. Doing so improves the overall efficiency of the CPU, because performance is cumulative, while efficiency is an average.

Nope, e-cores don't go to 4.4 so this is completely irrelevant.
Please re-read. I never said nor implied the E-cores run at 4.4 GHz. What I said was:

"if you're running your P-cores at 4.4 GHz, then you can add more E-cores @ 3.8 GHz without hurting overall efficiency!! If the P-cores are running even faster (than 4.4 GHz) or the E-cores are running even slower (than 3.1 GHz), then the E-cores would improve overall efficiency!"

the e-cores are not more power-efficient than the p-cores
Again, you're cherry-picking. You focus only on the x264 case, yet we know the E-cores aren't as good at floating-point and vectorized workloads as integer. As a reminder, here are the core-power curves for 7zip:

image-18-1.png


Here, Gracemont is the same or more efficient across the entire range. The only thing that makes it worse is if you compare package-level energy usage, but then we're not comparing core-to-core, but instead measuring them in a somewhat artificial context.

I guess the real kicker is this: if the E-cores aren't more energy-efficient, then why is Intel packing the laptop CPUs with so many of them at the expense of fewer P-cores? If you were right, it wouldn't make sense!
 
Last edited:
As I carefully explained, the graph can be misleading, if you don't think about the way cores in a hybrid CPU are used in practice. For instance, one thing that graph shows is that the P-cores @ max power get only 0.14 FPS/W, which is far worse than worst-case 0.22 FPS/W the E-cores deliver. And you don't have to run the E-cores at their max power, either.
But you are forced to run the p-cores at their max power right?! That's why this statement makes sense to you?!
Yes, overclocking or even clocking very high wrecks efficiency, have you informed scientific american yet?
Balls to the walls is not how anybody has ever measured power efficiency EVER in the whole history of the world.
It's not wrong, if you let the P-cores go above 4.4 GHz. At that point, it becomes more efficient to add further performance via the E-cores running at any clock they support!
image-17-1.png

I think this part needs to be emphasized: it's not an either/or scenario. You can continue to operate E-cores in their efficiency sweet-spot, regardless of what the P-cores are doing. Doing so improves the overall efficiency of the CPU, because performance is cumulative, while efficiency is an average.
Yes you are not wrong as long as you use enough qualifiers, that's what I already proposed you do.

You can also operate both types of cores at their sweet-spot, imagine that, crazy right?!
Again, you're cherry-picking. You focus only on the x264 case, yet we know the E-cores aren't as good at floating-point and vectorized workloads as integer. As a reminder, here are the core-power curves for 7zip:
image-18-1.png

Here, Gracemont is the same or more efficient across the entire range. The only thing that makes it worse is if you compare package-level energy usage, but then we're not comparing core-to-core, but instead measuring them in a somewhat artificial context.
When you are accusing people of cherry-picking you shouldn't be cherry-picking yourself...or I guess this is more of a goal post shifting.
The same is not better, there is a whole range of clocks that the e-cores are not more power-efficient which is what you claimed.

The e-cores are only more power-efficient if you add a ton of qualifiers, but then you can do the same with the p-cores.
I guess the real kicker is this: if the E-cores aren't more energy-efficient, then why is Intel packing the laptop CPUs with so many of them at the expense of fewer P-cores? If you were right, it wouldn't make sense!
For the same reason they do it on the desktop, the fewer expensive p-cores they need to use per sold unit increases the money they make per unit and also allows them to make more units to make even more money.
Using twice as many p-cores in a product for example would cut the amount of product they can make in half.
 

bit_user

Titan
Ambassador
But you are forced to run the p-cores at their max power right?!
Of course not. That's why I talked about core-scheduling and dialing in the optimal blend of clock speeds for a given power target and number of threads.

That's why this statement makes sense to you?!
My point about comparing efficiency at peak utilization was to show that the P-cores' efficiency becomes worse than the worst of the E-cores', past a certain point. Much worse. If you focus only on the overlap, you can easily lose sight of that.

On the x264 workload, that threshold seems to be the P-cores running at 4.4 GHz. You can see that in this graph, based on where they draw equal on the Y-axis:

image-17-1.png


Yes you are not wrong as long as you use enough qualifiers, that's what I already proposed you do.
Complex data defies simple explanations. Nuances matter.

The picture is fairly complex, but not surprisingly so. We know the efficiency of just about every core in existence falls off a cliff, once you push clock speeds high enough. So, it's almost a given there will be a crossover point. However, nobody is saying you have to run every core in the CPU at the same clockspeed or power level. The OS driver that's deciding what clock speeds to run the different cores needs to balance the performance demands (i.e. how many threads are running) against the efficiency curves for each of the cores.

You can also operate both types of cores at their sweet-spot, imagine that, crazy right?!
That was just a thought experiment. In reality, a user expects a certain level of responsiveness or throughput, is running a certain amount of threads, and is prepared for a certain amount of power/heat to be dissipated. If we take the simplistic case of a performance-oriented scheduler, it will nearly always push the cores past their "sweet spot", in order to maximize performance. That's often not a matter of simply running any of the cores at max speed/power, unless the number of threads is low or the power limit (and cooling capacity) is very high.

When you are accusing people of cherry-picking you shouldn't be cherry-picking yourself...or I guess this is more of a goal post shifting.
I'm not. I just included a dataset you omitted. I've never shifted goalposts, either. My contention is the same as it always was: E-cores are a more energy-efficient way to scale performance.

The same is not better, there is a whole range of clocks that the e-cores are not more power-efficient which is what you claimed.
The problem is that you're taking this raw data outside the context of how it's used. In practice, you're not running both P-cores and E-cores at the same frequency, which is the only way that graph shows parity. In practice, you're usually running the P-cores at significantly higher frequency than the E-cores, which is why adding E-cores to the mix helps improve the overall efficiency of the solution.

As I said, core scheduling is complex and this is merely raw data.

The e-cores are only more power-efficient if you add a ton of qualifiers, but then you can do the same with the p-cores.
That's a false equivalence. The precise interpretation of the data matters. The tagline of Chips & Cheese is: "The Devil is in the Details", which is why they do such in-depth micro-benchmarking, mixed with extensive analysis. Even when I don't agree with their analysis, the data is still very useful.

For the same reason they do it on the desktop, the fewer expensive p-cores they need to use per sold unit increases the money they make per unit and also allows them to make more units to make even more money.
Using twice as many p-cores in a product for example would cut the amount of product they can make in half.
Not to agree with you, but just to explore this line of reasoning: you're saying Intel intentionally builds a mobile product with worse battery-life, just to save money?

At certain price-points, I can see the argument. However, not if we're talking about even their premium, low-power models. Let's look at some examples, like the i7-1365U, which seems to be their highest-end U-series CPU and has 8 E-cores + 2 P-cores.

If what you're saying is right, you should be able to point to some Intel mobile CPU with the same or lower power-budget and more P-cores.
 
Last edited:

tamalero

Distinguished
Oct 25, 2006
1,192
211
19,670
Thats been happening since forever, that's the only reason that intel CPUs show up with 315-330W power draw at "stock" in benches very often, intel never had CPUs burning up despite having them tortured at well above 30% above warranty levels of power.

If AMD can't make a product that can handle available mobos then there is no reason to protect them, they failed at making a product that is up to specs and had to reduce the specs to keep it from happening.

Imagine if intel had forced every mobo maker to release a bios that cuts you off at their warranty limit and would then still call them k CPUs...no actually months after release they would reduce the warranty limit even more and force it upon everybody.
I have yet to see an intel cpu being fed 20% to 40% voltages above the LIMITS (not recommended guidance.. LIMITS) in both the memory side AND the core side.
 
The problem is that you're taking this raw data outside the context of how it's used. In practice, you're not running both P-cores and E-cores at the same frequency, which is the only way that graph shows parity. In practice, you're usually running the P-cores at significantly higher frequency than the E-cores, which is why adding E-cores to the mix helps improve the overall efficiency of the solution.

As I said, core scheduling is complex and this is merely raw data.
No you are the one that takes the graphs with the fixed clock speeds and tries to use them to make your point...
I said multiple times that it's above ~5W for vector and ~15W for integer per 4 cores, if you go above that, if you give more power to your CPU than that, then the p-cores are more efficient until you hit the other side of the efficiency curve cliff.


I'm talking about these graphs, at 15W times 6 clusters of 4 cores so from 90W up to about 25W per 4 cores so up to 150W for the whole CPU the p-cores are more power-efficient.
For the transcode it's from 6*5W=30w up to again around 25*6=150W.

Processor base power for the 12900k is 125W so the way you are supposed to use the CPU the p-cores will always be more power efficient because it will be working within the range, under 150W.

Only if you use your overclocking allowance and go to maximum turbo power will the p-cores fall off a cliff.
image-16-1.png

Not to agree with you, but just to explore this line of reasoning: you're saying Intel intentionally builds a mobile product with worse battery-life, just to save money?

At certain price-points, I can see the argument. However, not if we're talking about even their premium, low-power models. Let's look at some examples, like the i7-1365U, which seems to be their highest-end U-series CPU and has 8 E-cores + 2 P-cores.

If what you're saying is right, you should be able to point to some Intel mobile CPU with the same or lower power-budget and more P-cores.
I don't know much about the laptop landscape and I don't care much about it, this is just my guess of why they are doing it.
I can only guess that they research has found out that most people have their high end devices always plugged in anyway,
 
I'm not. I just included a dataset you omitted. I've never shifted goalposts, either. My contention is the same as it always was: E-cores are a more energy-efficient way to scale performance.
But only if you believe the media and think that running a CPU not only at its limit but way above it is the normal way to use a CPU.
I have yet to see an intel cpu being fed 20% to 40% voltages above the LIMITS (not recommended guidance.. LIMITS) in both the memory side AND the core side.
Do you even know what the voltage limit is for intel???
Because if you don't I can just show you any level and say that that's 20-40-50% above the limit...
 

bit_user

Titan
Ambassador
No you are the one that takes the graphs with the fixed clock speeds and tries to use them to make your point...
The energy vs. frequency plots show how efficiency changes over the cores' respective operational ranges.

I said multiple times that it's above ~5W for vector and ~15W for integer per 4 cores, if you go above that, if you give more power to your CPU than that, then the p-cores are more efficient until you hit the other side of the efficiency curve cliff.

I'm talking about these graphs, at 15W times 6 clusters of 4 cores so from 90W up to about 25W per 4 cores so up to 150W for the whole CPU the p-cores are more power-efficient.
For the transcode it's from 6*5W=30w up to again around 25*6=150W.

Processor base power for the 12900k is 125W so the way you are supposed to use the CPU the p-cores will always be more power efficient because it will be working within the range, under 150W.

Only if you use your overclocking allowance and go to maximum turbo power will the p-cores fall off a cliff.
You don't seem to understand how frequency-scaling is managed, in these CPUs. It's not like you just mash your pedal on the gas and run a core at its maximum frequency, because if you have a multithreaded workload, you'll often be constrained by power or thermal limits.

image-16-1.png

See, if I'm designing a core frequency governor, based on this data, what I'd do is run the E-cores up to 3.1 GHz, where they achieve 1.33 MB/s per W. Then, don't increase them until performance demands and power budgets push the P-cores beyond 31 W. At that point, they're only doing 0.90 MB/s per W, whereas the worst that the E-cores do is 0.96 MB/s per W. So, then it makes sense to boost the E-cores to their max frequency before devoting any more power to the P-cores.

I don't know much about the laptop landscape and I don't care much about it, this is just my guess of why they are doing it.
I can only guess that they research has found out that most people have their high end devices always plugged in anyway,
But, they're using the same cores in a way contrary to how you describe. Can Intel really be so dumb or oblivious to battery life? Battery life is something every laptop review measures and it's an issue for key market segments, like premium corporate laptops by execs who don't want to always be hunting for a place to plug in. It's also an area where Apple is excelling.

But only if you believe the media and think that running a CPU not only at its limit but way above it is the normal way to use a CPU.
No, I don't see how that's related to anything I said.
 
The energy vs. frequency plots show how efficiency changes over the cores' respective operational ranges.


You don't seem to understand how frequency-scaling is managed, in these CPUs. It's not like you just mash your pedal on the gas and run a core at its maximum frequency, because if you have a multithreaded workload, you'll often be constrained by power or thermal limits.

image-16-1.png

See, if I'm designing a core frequency governor, based on this data, what I'd do is run the E-cores up to 3.1 GHz, where they achieve 1.33 MB/s per W. Then, don't increase them until performance demands and power budgets push the P-cores beyond 31 W. At that point, they're only doing 0.90 MB/s per W, whereas the worst that the E-cores do is 0.96 MB/s per W. So, then it makes sense to boost the E-cores to their max frequency before devoting any more power to the P-cores.
Show me one benchmark that shows that happening because for the last forever years every single benchmark is with the clocks running pedal to the metal, either with unlimited power or with the a certain power target but always with all cores running as fast as the power allows.
Because every single benchmark is made to push as much IPC through every core as possible.
But, they're using the same cores in a way contrary to how you describe. Can Intel really be so dumb or oblivious to battery life? Battery life is something every laptop review measures and it's an issue for key market segments, like premium corporate laptops by execs who don't want to always be hunting for a place to plug in. It's also an area where Apple is excelling.
As I said I don't know much about the laptop world, how is the battery life at very low power where the e-cores are the most efficient?
Or are you going by cinebench or similar numbers that run the laptop as inefficiently as possible?
Is the common laptop user after battery life in cinebench or idle/browsing and so on?
image-16-1.png


No, I don't see how that's related to anything I said.
Look at this picture, if you can allow your CPU to use more than 90W and unless you have to run every core in your CPU at full power all the time then you can replace every single e-core with the equal amount of p-cores and make those replacement p-cores use the same amount of power as the e-cores do now and you would get higher performance for the same amount of total power, hence the e-cores are not more power efficient to scale performance,

unless you are forced to use your desktop PC at below 90W but still with all its cores enabled for some reason or if you are forced to run all available cores only at full speed all the time.
 
Status
Not open for further replies.