News Intel CEO says it's "too late" for them to catch up with AI competition — claims Intel has fallen out of the "top 10 semiconductor companies" a...

Its attempt to modernize x86 with a hybrid big.LITTLE architecture, à la ARM, failed to make a meaningful impact
It had the impact of setting back AVX-512 adoption by 5+ years.

It seems to do well at boosting low-end multi-threading, e.g. 10-14 core chips vs. AMD's 6-cores. The next test for E-cores will be Wildcat Lake, finally bringing hybrid to the Atom lineup with the main benefit being a huge increase in single-threaded performance.
Lunar Lake chips barely registering a response against AMD’s cache-stacked X3D lineup
Those products don't compete with each other. Lunar Lake does well against AMD's Kracken Point, which I guess is the competitor in configuration and even price.
Intel, instead plans to shift its focus toward edge AI, aiming to bring AI processing directly to devices like PCs rather than relying on cloud-based compute.
By the time anyone cares about an NPU in their PC, it will be in every new AMD chip. Maybe starting with Zen 6.
This development follow's
 
The article said:
Its attempt to modernize x86 with a hybrid big.LITTLE architecture, à la ARM, failed to make a meaningful impact
Beyond what @usertests said, I'd characterize the impact of hybrid as yielding mixed results.

Hybrid definitely helped on the multithreading front and created some breathing room for Intel to double-down on its P-cores (which are significantly bigger and more complex than AMD's). That emphasis on making the P-cores as strong as possible has also helped their lightly-threaded performance.

On the negative side of the ledger, their hybrid CPUs have been beset with thread scheduling woes that have dimmed the view of gamers towards Intel's E-cores. ThreadDirector was their deus ex machina solution to these problems, but turned out to fall far short of the billing. I think there's no way Intel can truly solve these problems only on the backend. They need to work with both OS and application developers to find better solutions for hybrid thread scheduling.

The article said:
Intel wants to be more like AMD and NVIDIA, who are faster, meaner, and more ruthless competitors these days
Intel has always been mean. For instance, in its dirty dealings with OEMs to try and block AMD's access to the markets it dominated. In the modern chip industry, I think only Nvidia and Qualcomm are possibly meaner.
 
Last edited:

Intel CEO says it's "too late" for them to catch up with AI competition — claims Intel has fallen out of the "top 10 semiconductor companies" as the firm lays off thousands across the world​

This entire story was sourced from a 'leaked' memo. Your current headline makes it sound like a matter of fact, which it isn't. Tom's Hardware, like other sites, probably has editors pushing for view-grabbing headlines, but a leak or tip should require more careful wording. It puts you in the same clickbaity category as WCCFtech. Consider alternatives like 'reportedly', 'allegedly', 'purportedly', etcetera.
Its attempt to modernize x86 with a hybrid big.LITTLE architecture, à la ARM, failed to make a meaningful impact
How so? Skymont has been a key driver behind Lunar Lake's efficiency, where lightweight tasks can be offloaded to the LPE-cores without engaging the P-core ring. The deficits of axing HT from ARL/LNL have been more than compensated for by Skymont. Heck, even AMD is rumored to take the LPE-core approach with Zen 6 for efficiency purposes. The only downside to E-cores I can think of is the lack of AVX-512, which should be returning with Nova Lake?

Only made worse by last-gen's Lunar Lake chips barely registering a response against AMD’s cache-stacked X3D lineup

Lunar Lake is not "last-gen". And these are entirely different product stacks you're comparing. Lunar Lake is a mobile-only, efficiency-first chip, while the only X3D chip on mobile I'm aware of is Fire Range (Ryzen 9000HX3D); apples and oranges. They're not even comparable, as one is designed for gaming laptops and portable workstations. At the same time, the other is tailor-made from the ground up for lightweight and low-power devices, as a Windows/Linux alternative to MacBooks.

The final nail may have come with Intel’s recent loss of contract manufacturing for its upcoming flagship 18A node

Missing keyword: reportedly

However, it's too early to speculate given that 18A, Intel's proposed savior, is still a year away, so until Nova Lake launches, we'll just be witnesses to a new Titanic.

The first 18A product you'll see is Panther Lake, not Nova Lake; the former is slated for an early 2026 launch, likely at CES, or in about 5-6 months.
 
The only downside to E-cores I can think of is the lack of AVX-512,
Then you haven't been following the matter in gaming circles. They often either use tools to prevent game threads from being scheduled on E-cores or just disable them in BIOS. That said, there are some games that manage to perform better with E-cores enabled, which almost makes the situation worse, since it means there's no blanket solution that applies to all games.

VR is another area where E-cores are viewed very negatively. Pretty much any realtime application is susceptible to performance detriments from critical-path threads being scheduled on E-cores.

Intel's latest solution to this mess it created is to make E-cores bigger and more powerful, thereby narrowing the gap between them and P-cores. However, this comes at the expense of the E-cores' traditional strengths. You'd get better performance density and efficiency by keeping the E-cores smaller, lower-clocking, and just adding more of them.
 
Then you haven't been following the matter in gaming circles. They often either use tools to prevent game threads from being scheduled on E-cores or just disable them in BIOS. That said, there are some games that manage to perform better with E-cores enabled, which almost makes the situation worse, since it means there's no blanket solution that applies to all games.

VR is another area where E-cores are viewed very negatively. Pretty much any realtime application is susceptible to performance detriments from critical-path threads being scheduled on E-cores.

Intel's latest solution to this mess it created is to make E-cores bigger and more powerful, thereby narrowing the gap between them and P-cores. However, this comes at the expense of the E-cores' traditional strengths. You'd get better performance density and efficiency by keeping the E-cores smaller, lower-clocking, and just adding more of them.
Most of these can be resolved through proper scheduling, which is easier said than done since we haven't achieved a proper solution, yet.

The Skymont LPE-core cluster on LNL measures 6.89mm^2 (N3B), compared to 5.90mm^2 on MTL (Intel 4). Not really apples to apples, since we're comparing LPE to E cores, but this increase in size does come with a massive bump to the IPC and performance, and that's important since Crestmont LPE-cores on MTL were too slow (and on an older process node) to run background tasks properly. It's a tradeoff, and I think it really shines in products like Lunar Lake, where the LPE-cores are strong enough to handle background applications without engaging the P-core ring bus.

At the same point, I get your idea. You're pushing for density (smaller E-cores, and we might get 6-8 E-cores in place of a traditional cluster). The engineers are Intel likely know much more than us, and they must've considered this approach sometime during development. It's essentially a case between 4 powerful E-cores versus 8 less-powerful E-cores.
 
Most of these can be resolved through proper scheduling, which is easier said than done since we haven't achieved a proper solution, yet.
Agreed, but it cannot be done solely by the CPU and kernel. It needs the involvement of userspace and this is where I've seen zero movement. Intel continues to steadfastly act as though it believes these problems can be solved entirely on the backend, but they cannot.

The Skymont LPE-core cluster
Why are you focused on LPE? Most Skymont cores aren't LPE-cores, they're regular E-cores. Lunar Lake is a fairly niche product. Arrow Lake is the volume product. I never restricted what I said to just LPE cores, either. I was talking about the inclusion of E-cores in mainstream products. If Intel had limited its use of E-cores to just LPE cores in laptops, they'd be much less controversial.

The engineers are Intel likely know much more than us, and they must've considered this approach sometime during development.
Their track record says otherwise. The only reason we're calling their decisions into question is precisely because they designed products with such significant tradeoffs.

I give them credit for their willingness to take the bold move of going hybrid. However, their execution was clearly not flawless and I think their efforts at damage-control have needlessly undermined their strategy. They should've worked the problem from all angles, but someone at Intel seems to have decided that touching threading APIs was a red line. The longer they refuse to go there, the more their solution will get watered down and the longer we'll go without a proper solution.
 
  • Like
Reactions: usertests
This entire story was sourced from a 'leaked' memo. Your current headline makes it sound like a matter of fact, which it isn't. Tom's Hardware, like other sites, probably has editors pushing for view-grabbing headlines, but a leak or tip should require more careful wording. It puts you in the same clickbaity category as WCCFtech. Consider alternatives like 'reportedly', 'allegedly', 'purportedly', etcetera.

I see "reportedly" right there in the headline. Regardless, leaked memos in the hands of skilled journalists have become some of the most reliable insights behind corporate and government opaqueness, whether you like what the memos say or not.
 
Then you haven't been following the matter in gaming circles. They often either use tools to prevent game threads from being scheduled on E-cores or just disable them in BIOS. That said, there are some games that manage to perform better with E-cores enabled, which almost makes the situation worse, since it means there's no blanket solution that applies to all games.
A lot of the people advocating and doing this aren't doing so for real reasons. There were certainly a lot of random issues on ADL launch, but currently not so much. I'm sure there are specific outliers and anyone affected probably screams about it as is required on the internet though.

This was the last writeup I remember seeing: https://www.techpowerup.com/review/...-i9-13900k-e-cores-enabled-vs-disabled/2.html
They need to work with both OS and application developers to find better solutions for hybrid thread scheduling.
Intel of 5-6 years ago this had a chance of happening, but today? No. I do certainly agree that this would be the best way forward. Scheduling isn't some sort of mystery either as CDPR was able to fix CP2077 (ARL performance was bad) in a patch that came ~1.5mo after ARL launch.
In the modern chip industry, I think only Nvidia and Qualcomm are possibly meaner.
Avago... I mean Broadcom is absolutely king of this hill. 🤣
 
  • Like
Reactions: rluker5
Beyond what @usertests said, I'd characterize the impact of hybrid as yielding mixed results.

Hybrid definitely helped on the multithreading front and created some breathing room for Intel to double-down on its P-cores (which are significantly bigger and more complex than AMD's). That emphasis on making the P-cores as strong as possible has also helped their lightly-threaded performance.
Very true. The thermal density of a just P cores chip would limit the multicore performance and single core performance compared to what they are doing with ARL:
06_small.jpg

Note how nicely they broke up those P core heat islands.

On the negative side of the ledger, their hybrid CPUs have been beset with thread scheduling woes that have dimmed the view of gamers towards Intel's E-cores. ThreadDirector was their deus ex machina solution to these problems, but turned out to fall far short of the billing. I think there's no way Intel can truly solve these problems only on the backend. They need to work with both OS and application developers to find better solutions for hybrid thread scheduling.
The E cores usually, but not always helps gaming performance. HT usually, but not always hurts gaming performance from what I've seen. Sometimes by a lot. Shame there isn't a ton of testing, but here is a big chunk, even if it is a bit old:
View: https://youtu.be/LcQUUmi3rWI?t=684

View: https://youtu.be/I8DJITHWdaA?t=637

Personally, I think taking care of the outliers with something like Intel APO is probably the best solution, but Intel seems to have stopped updating that after they made it available for 13th gen.
Intel has always been mean. For instance, in its dirty dealings with OEMs to try and block AMD's access to the markets it dominated. In the modern chip industry, I think only Nvidia and Qualcomm are possibly meaner.
Was Intel making deals with the OEMs to sell their products during or after AMD was selling products that were reverse engineered copies of Intel chips made from stolen IP?
And isn't this the worst example of any foundry ever stealing their clients IP, and isn't it used as an example of why you can't trust Intel as a fab when they were the victim?

The fact that Intel allowed AMD to continue to exist after this is an example of how they weren't always mean.
 
Very true. The thermal density of a just P cores chip would limit the multicore performance and single core performance compared to what they are doing with ARL:
06_small.jpg

Note how nicely they broke up those P core heat islands.
I know what you mean, but traditionally the E-cores actually have higher thermal density. I haven't run the numbers for Arrow Lake, but I expect it's still true.

It's conceivable that both are actually true, but often in different contexts. Like, gaming or other realtime tasks, where most of the action is in the P-cores, then they would be hotter cores. However, something like rendering might still heat up the E-cores more, especially now that they have much closer floating-point performance to the P-cores.

Personally, I think taking care of the outliers with something like Intel APO is probably the best solution, but Intel seems to have stopped updating that after they made it available for 13th gen.
APO is another backwards solution. It's more effective, because is has more specific knowledge about the apps, but imagine the apps could be written in a way that told the scheduler what APO knows about them? Then, any such app could be as fast as the APO version (or faster, for reasons I won't go into), without Intel or any 3rd party having to know anything about it!

That's the power of a better threading API. But, for some reason, they seem to think everyone wants to keep multithreading apps like it's still the 1990's.
 
  • Like
Reactions: thestryker
I know what you mean, but traditionally the E-cores actually have higher thermal density. I haven't run the numbers for Arrow Lake, but I expect it's still true.
That is very easy to empirically test on my stuff, but things may have changed with ARL and later. Here's what HWinfo says about my core's individual temps under a quick load:
Screenshot-206.jpg

But ARL doesn't have HT and that changes the thermal density of the P-cores so I turned HT off and put on a quick load again:
Screenshot-207.jpg

The temps are still higher on the P cores but they are closer.
ARL gets more IPS out of the E cores relative to its P cores so you may be right on ARL, but it would be harder to test since they aren't all grouped together by type like RPL. But the E cores still probably get more total MT perf/watt, if no longer putting out less heat/area.
It's conceivable that both are actually true, but often in different contexts. Like, gaming or other realtime tasks, where most of the action is in the P-cores, then they would be hotter cores. However, something like rendering might still heat up the E-cores more, especially now that they have much closer floating-point performance to the P-cores.


APO is another backwards solution. It's more effective, because is has more specific knowledge about the apps, but imagine the apps could be written in a way that told the scheduler what APO knows about them? Then, any such app could be as fast as the APO version (or faster, for reasons I won't go into), without Intel or any 3rd party having to know anything about it!
I guess my point was that APO had better chance of being applied. It is backwards and it would be better if either Windows and/or software developers optimized their stuff to use the better core/thread type for the job. Or like you said, there could be some sort of Windows based requirement for the devs to have some sort of tag on steps that Windows can recognize and send that to the chip scheduler or properly schedule to the right type of core/thread itself, but that isn't happening as much as I would like. We've had SMT for how many years and Windows is still getting tripped up over that so we may be waiting for a while for them to get their act together.

My options seem to be: be ok with the perhaps reduced performance I'm getting, test and optimize myself, or hope that Intel does that for me and pushes through a fix that costs me very little effort.

I'd like best performance, don't trust Windows or devs to optimize when they have a mediocre record, and don't want to waste time testing and rebooting so that leaves my best hope for lazily reaping the best performance of my hybrid CPU being some intern at Intel who tests these things and reports the results to some guy who will update APO. Compared to what they have to do with GPU drivers it doesn't seem like too much to ask.
 
Last edited:
That is very easy to empirically test on my stuff, but things may have changed with ARL and later. Here's what HWinfo says about my core's individual temps under a quick load:
Pics got blocked. (I saw your update. I'll edit my post once they're working.)
: (

Temp monitoring doesn't directly tell us about thermal density, since you're drawing heat away from the die in a nonuniform pattern. Also, I wonder how you tested. You'd ideally want to hit the E-cores with a load that maxes both their clocks and also power consumption, if we're trying to measure peak thermal density. I know roughly how to do that on Linux, but not Windows. Probably something like SuperPi would be a good choice.

But ARL doesn't have HT and that changes the thermal density of the P-cores so I turned HT off and put on a quick load again:
Lion Cove (client) isn't the same as a HT-capable core with the feature fused off. Part of Intel's justification for dropping HT from clients is that it enables silicon changes that give them higher PPA for 1-thread-per-core use cases.

ARL gets more IPS out of the E cores relative to its P cores so you may be right on ARL, but it would be harder to test since they aren't all grouped together by type like RPL.
No, the way to test is quite easy. You find a workload that maximizes power consumption and then simply divide by core area, which has been derived from high-quality die shots.

But the E cores still probably get more total MT perf/watt, if no longer putting out less heat/area.
This is an interesting subject. Alder Lake's E-cores weren't more efficient, for much of the range, but that paradoxically doesn't mean you can't reach higher overall efficiency on a CPU which incorporates them, since there's nothing that says you need to run all cores at the same power level.

AFAIK, it's always the case that, when you push an E-core to the upper limit of its frequency envelope, it becomes less efficient than a P-core at the same power.

I guess my point was that APO had better chance of being applied. It is backwards and it would be better if either Windows and/or software developers optimized their stuff to use the better core/thread type for the job. Or like you said, there could be some sort of Windows based requirement for the devs to have some sort of tag on steps that Windows can recognize and send that to the chip scheduler or properly schedule to the right type of core/thread itself, but that isn't happening as much as I would like. We've had SMT for how many years and Windows is still getting tripped up over that so we may be waiting for a while for them to get their act together.
I think the right solution would've been to do the API work so that games & other apps could classify threads appropriately, but simultaneously work on things like ThreadDirector and APO, knowing it's unrealistic that all or even most apps will be updated to use the new API. For games, a lot of the effort can be handled in the engine itself, leaving most games to benefit almost for free.

Intel isn't new to this stuff. They created the Thread Building Blocks (TBB) library, long ago, to help app developers add concurrency to their programs. Intel has open sourced TBB, but I assume they've still be contributing to it somewhat recently. This would've given them a further way to help out apps with this stuff.

Edit: hold on a minute, having issues with image filesize. let me shrink it
Okay, I'll go ahead and save my post now. Will update, once the pics are coming through.
 
I think the right solution would've been to do the API work so that games & other apps could classify threads appropriately, but simultaneously work on things like ThreadDirector and APO, knowing it's unrealistic that all or even most apps will be updated to use the new API. For games, a lot of the effort can be handled in the engine itself, leaving most games to benefit almost for free.

Intel isn't new to this stuff. They created the Thread Building Blocks (TBB) library, long ago, to help app developers add concurrency to their programs. Intel has open sourced TBB, but I assume they've still be contributing to it somewhat recently. This would've given them a further way to help out apps with this stuff.
You guys are very very very off topic...so I can't not join in.

Thread director doesn't need any specific API, devs just need to code properly.
The issue is that devs code for the consoles and only do anything PC specific if the game totally and completely tanks on the PC.
Intel has pages up to teach people how to code games for PCs that use windows, nobody gives a flying monkey.
If devs would follow the rules then thread director would work a lot better, so even with a new API games would still run like crap because devs would still only code for consoles.

https://www.intel.com/content/www/u...imizing-threading-for-gaming-performance.html
 
You guys are talking about the tech while completely missing the point.

Intel's strength has always been in the public knowing that Intel=best. That's over. They're about 2 public evolutions behind AMD (X3D and breaking the 7ghz ceiling with the next gen). AMD is still innovating rapidly while Intel is saying "uh.. something like X3D probably? Maybe soon? Looks good compared to our older stuff...?"

To the techies, they are now made up almost entirely of broken promises and incompetence. "Next generation we'll be back on top!" and then, nah, for years now.

Gamers on social platforms (the root of public opinion about anything IT because we're very noisy) are already seeing Intel as second-rate. That being completely true doesn't help Intel's case.

They're already out of workstations and servers. The 40 year old assumption of "best" is almost completely dead. A larger percentage of people have gone to the competition and realized that their day to day is exactly the same.

Another thing to consider - when your average costumer moves from old Intel to new AMD, it's so much faster! The techies say "wow, it's almost like you jumped 9 generations of CPUs or something" but the public never sees it that way. To them, it just means AMD=better. In the past, they would've only gone Intel to Intel, not giving them the option for this confused "revalation," but in 2025, people are flipping just because FOMO.

Ironically, Intel's older CPUs still working fine is the reason for the disconnect. A 4th gen Intel is still fine for office work but oops! A wild Windows 11 has appeared! "I cannot upgrade! I'm running out of time! My trusty Intel hardware has failed me!"

Intel can't say they're objectively better without inviting a lawsuit from AMD or Nvidia. Intel's historically garbage methods of maintaining market share mean they have the eyes of regulatory bodies glued onto them. They aren't innovativing enough to even fudge a superiority claim. Until they can do that, no one will look back at them. They're now firmly in the past.

Then there are the fab units. They seem to have wrapped them entirely into their ego. Financially, they should have been spun off or sold or demolished when they realized they missed the 7nm mark entirely. The last CEO knew that. The one before him suspected. The current one is fully aware that it's a big part of their overall failure. Yet it remains. They don't have anything innovative that they can fully make in their own fabs anymore. All they're doing is shopping out around while other companies just raise an eyebrow and say "lolno" and/or "but why?" They're allowing its weight to sink the boat because of... pride, I guess?

Wildest part of all of this is that they are still the larger company than AMD! They still have 60-70% market share in gaming and even more in laptops. That they're failing with such a high market share clearly indicates that there's a lot of old hardware still in use (5-15 years old), and that market share is at least partially born from people who are less likely than ever to default to Intel when they replace said hardware. Once Windows 10 security updates are truly done, we'll see a pretty big shift of market share. Once the first big Windows 10 exploit hits due to no security updates, we might see an 80/20 market split favoring AMD, especially if Intel is still so far behind. But even if they're not behind at all but then, that may not mean recovery at all since the public no longer sees them as a default choice. If they continue changing chipsets every third generation, they will overall remain more expensive to run and the beancounters will say "nope, cheaper to upgrade your AMD than an Intel. Keep using AMD," where they would've said "what's a AMD?" Five years ago. And that's always where the money comes from.
 
I will say the one thing Intel does better than AMD is platform level cohesiveness in OEM laptops. Dell, Lenovo, etc. We have so many weird bugs in AMD based laptops, (wifi, webcam, etc, etc, etc.) which just doesn't seem to happen in Intel based systems.

We've actually switched back to Intel based laptops at my job, even though they run hotter, and get lower battery life simply to reduce support issues. Is this on Dell/Lenovo for not properly validating their AMD platforms like they do with their Intel platforms? Absolutely. But it still presents as an issue to the customers who are buying these laptops.
 
Thread director doesn't need any specific API, devs just need to code properly.
It's true that the Thread Director doesn't have an API. In fact, its very existence is to work around having explicit knowledge about optimal thread scheduling. Unfortunately, it's a backwards solution to a (mostly) straight-forward problem. It can sometimes complement what applications could specify about thread scheduling, but it's an inadequate substitute, by itself.

The evidence of what I'm saying is quite clear. If Thread Director were a perfect solution, there would never be any need to disable E-cores or prevent apps from using them.

The issue is that devs code for the consoles and only do anything PC specific if the game totally and completely tanks on the PC.
No, it's really not even console vs. PC. The same problems exist with SMT/HT, though perhaps not to as great an extent.

Intel has pages up to teach people how to code games for PCs that use windows, nobody gives a flying monkey.
If devs would follow the rules then thread director would work a lot better, so even with a new API games would still run like crap because devs would still only code for consoles.

https://www.intel.com/content/www/u...imizing-threading-for-gaming-performance.html
Thanks for the link. But, for instance, it still highlights deficiencies in the concurrency model, if you have to statically-size thread pools, while maintaining multiple of them, and sort your tasks accordingly. This forces app developers to compensate for the foundations of the concurrency model being stuck in 1990's era technology.
 
Gamers on social platforms (the root of public opinion about anything IT because we're very noisy) are already seeing Intel as second-rate. That being completely true doesn't help Intel's case.
There's a lot in your post I don't necessarily agree with, but I'll just pick on the matter of their CPUs' gaming prowess.

ChipsAndCheese recently published an analysis of gaming performance on Arrow Lake, where they tried to find the bottlenecks holding it back. What they discovered/confirmed came as little surprise to those of us who'd been following their analysis since its launch, which is that Arrow Lake suffers mostly from its SoC architecture, and gaming workloads are disproportionately affected by this.

Lion Cove is significantly larger and more complex than Zen 5. When comparing on single-threaded workloads, in particular, it's capable of higher-throughput than Zen 5. However, gaming workloads turn out to be relatively low-IPC and quite sensitive to memory latency. So, on the one hand, they don't benefit from Lion Cove's strengths while, on the other hand, they're susceptible to the weaknesses of Arrow Lake's SoC architecture (i.e. its high L3 and memory latency). This latter point also helps explain why 3D V-Cache is such a win for gaming.

Intel has a history of successes, just when it appears to be on the ropes. Core 2 Duo and Alder Lake both stand out as examples of this. If Nova Lake manages to incorporate the lessons they learned from Meteor Lake and Arrow Lake, it could indeed be quite good. I certainly wouldn't count them out, just yet. Not even for gaming.
 
Pics got blocked. (I saw your update. I'll edit my post once they're working.)
: (

Temp monitoring doesn't directly tell us about thermal density, since you're drawing heat away from the die in a nonuniform pattern. Also, I wonder how you tested. You'd ideally want to hit the E-cores with a load that maxes both their clocks and also power consumption, if we're trying to measure peak thermal density. I know roughly how to do that on Linux, but not Windows. Probably something like SuperPi would be a good choice.
I'm pulling heat from the die with an AIO that has a coldplate and I'm using one of those anti bend brackets so the heat removal is probably as uniform as the die configuration will allow.
I also used CPUZ stress test to load the cores because it is fast and easy, and it is indicated in the screenshots. This also places the E cores at 100% load as indicated in the task manager. As far as the clocks, these are my daily clocks and they are attained by finding the lowest stable volts for 5.5GHz on the P cores and then finding the maximum clocks on the E cores and cache that are stable with that. With my 13600ks 4.4 on the E cores is stable with 5.5 on the P cores but that should only make the E cores run a little hotter with the 13900kf in the screenshots.
Lion Cove (client) isn't the same as a HT-capable core with the feature fused off. Part of Intel's justification for dropping HT from clients is that it enables silicon changes that give them higher PPA for 1-thread-per-core use cases.


No, the way to test is quite easy. You find a workload that maximizes power consumption and then simply divide by core area, which has been derived from high-quality die shots.
Wouldn't measuring the temperature at roughly equivalent heat dissipation per area show you which section is putting in the most heat per area because of Fourier's law? I mean if the same amount of heat is flowing out per area, the hotter sections would have to generate more heat per area to have that higher temperature difference. Or they could also be on the end and have a bit more heat flowing out per area, but not in the middle.
This is an interesting subject. Alder Lake's E-cores weren't more efficient, for much of the range, but that paradoxically doesn't mean you can't reach higher overall efficiency on a CPU which incorporates them, since there's nothing that says you need to run all cores at the same power level.

AFAIK, it's always the case that, when you push an E-core to the upper limit of its frequency envelope, it becomes less efficient than a P-core at the same power.
My very simple test says that isn't even remotely true. But you need more numbers to do the arithmetic to show it:
So far I have a screenshot with all threads and all but HT threads. To complete the comparison I rebooted twice more and did the same with only changing the CPU configuration to E cores off, HT on, and E cores and HT off.
E-cores off:
Screenshot-208.jpg

E cores and HT off:
Screenshot-210.jpg

You can verify number of threads in the task manager, clockspeed, voltage and wattage in HWinfo, and you may note that the CPUZ stress test is giving the CPUZ multithreaded score in real time during the screenshot. Mind you the scores are slightly low because HWinfo takes some CPU to run and they are slightly variable.
But you can take the CPUZ multithreaded score and current IA core power from each screenshot to compare points/watt with more points/watt being more efficient.
For the different configurations: 1. Every thread enabled = 75.3 points/watt, 2. All but HT = 73.5, 3. All but E cores = 61.4

And by subtracting the numbers from the test with the missing chip threads from the whole you get: HT threads only = 91 points/watt, and E cores only = 100.1 points/watt which makes them easily more efficient than P cores with or without HT in CPUZ multithreaded.

As a check I included a screenshot of no HT, no E cores to see if the total points and total watts from this + arithmetic derived HT + arithmetic derived E cores added up and they are pretty much margin of error close.
And the P cores only without HT only got 55.5 points per CPU core watt. So not that efficient even though they had 1.66x the points per thread as the E cores at their respective max clocks at that voltage.

This is only in CPUZ, but it is a good example of max clocks.
Also I don't know how this compares to Arrow Lake.
I think the right solution would've been to do the API work so that games & other apps could classify threads appropriately, but simultaneously work on things like ThreadDirector and APO, knowing it's unrealistic that all or even most apps will be updated to use the new API. For games, a lot of the effort can be handled in the engine itself, leaving most games to benefit almost for free.

Intel isn't new to this stuff. They created the Thread Building Blocks (TBB) library, long ago, to help app developers add concurrency to their programs. Intel has open sourced TBB, but I assume they've still be contributing to it somewhat recently. This would've given them a further way to help out apps with this stuff.


Okay, I'll go ahead and save my post now. Will update, once the pics are coming through.
 
I'd be quite disturbed by Tan's comments as an Intel employee and Board member, and it's already apparent this guy is the wrong person they need to try and rebuild. Talking down your own company in public sends a great message to already demoralised staff in the wake of mass layoffs. It's really not clear what this guy wants Intel to become, but maybe he's going to let it fail so he can one day sell off the IP for a song, and make a personal killing and move on. Do not let foreigners run your company, they do not have your best interests at heart.
 
  • Like
Reactions: Jame5
It's really not clear what this guy wants Intel to become, but maybe he's going to let it fail so he can one day sell off the IP for a song, and make a personal killing and move on.
However much you might dislike or disagree with what he's doing, this degree of cynicism is not called for. Not without any evidence of malfeasance.

Do not let foreigners run your company, they do not have your best interests at heart.
Oh yeah, AMD totally messed up big time, with Lisa Su! Not to mention Google and Microsoft. I guess you'd say Nvidia doesn't count, since Huang was a founder.
🙄

Meanwhile, look at what good 'ol American-born leadership has done to industry stalwarts like GE:

Even in Intel's case, the seeds of its demise were sewn in the 2010's. Its failure to plan and invest effectively is why Gelsinger had so much trouble saving it and set the stage for a hatchet man like Tan to be brought in to do what he's doing.
 
Last edited:
My very simple test says that isn't even remotely true. But you need more numbers to do the arithmetic to show it:
No, I wasn't clear enough in what I was saying. I actually did say that the E-cores benefited overall efficiency, as you've shown, but they did that while paradoxically having lower perf/W for much of their operating envelope. The subject was explored quite thoroughly by ChipsAndCheese:
The first is an example of an scalar integer workload, the second a vector workload. The only problem with that article is in the conclusion, where the author failed to connect the dots and see how you could still end up with more efficient multi-threaded performance.

I took the data from that article and extrapolated it into a few different scenarios, to demonstrate the benefit provided by E-cores:

bRJ9olV.png


pEomQRf.png


According to that 8P + 8E was a definite win over a hypothetical 10P + 0E and even 12P + 0E on integer, but had only a narrow lead over the latter for float.
 
Thanks for the link. But, for instance, it still highlights deficiencies in the concurrency model, if you have to statically-size thread pools, while maintaining multiple of them, and sort your tasks accordingly. This forces app developers to compensate for the foundations of the concurrency model being stuck in 1990's era technology.
The "concurrency model" is how many cores the devs have available and what they do with them.... it doesn't matter if it's 3099 or 1970 a dev only has as many cores, each with as much compute, as they have.

Determining Maximum Concurrency​

The next stage is to determine the maximum concurrency the game can expect from the OS based on the hardware available:



  • Determine processor count based on processors’ relative performance.
    • A: Count processors from higher relative performance cores.
    • B: Count processors from lower relative performance cores.
  • Determine cache hierarchy.
    • Remove any processors that don’t share a suitable last level cache.
  • Your maximum concurrency is the sum of processors in A and B.

You just explained "lazy devs" .... if devs have to do any work, then they won't do it...well, in that case it doesn't matter how good the API is or how much groundwork ms or anybody does.
 
The "concurrency model" is how many cores the devs have available and what they do with them.... it doesn't matter if it's 3099 or 1970 a dev only has as many cores, each with as much compute, as they have.
No, what I'm trying to say is that a thread-based concurrency model is part of the problem. The game should not be creating and managing statically-sized thread pools. That's the old way of thinking.

The problem, here, is that it pretty much requires groundwork at the OS level. I doubt that exists in Windows and I'm pretty sure it doesn't exist on Linux. Intel could at least do it in Linux and use that to show Microsoft what needs to happen in Windows.

It doesn't have to be Intel, either, but they're the ones who brought hybrid CPUs into the mainstream, so they stood to benefit most.

You just explained "lazy devs" .... if devs have to do any work, then they won't do it...well, in that case it doesn't matter how good the API is or how much groundwork ms or anybody does.
Actually, if you make it easier for them to do the right thing than for them to do the wrong, a lot of devs will come over to the new way. Also, if the game engine does it the right way, then a lot of games will benefit with the game devs having to do little or nothing.
 
No, what I'm trying to say is that a thread-based concurrency model is part of the problem. The game should not be creating and managing statically-sized thread pools. That's the old way of thinking.
Yes, that's exactly what intel says as well in that link...

Do not do this, which is workload : number of cores, that's what the static threads of console based coding are.
Upon closer inspection, it’s evident that the games reviewed adapt their code based on the number of processors in the system, creating one thread for each physical core and attempting to distribute large parts of the workload evenly. This inadvertently results in games using a multitude of threads, each performing only a small amount of work and not actually reducing the work on the game’s critical path.
Instead do this, which is to make a main loop that will handle all the serial stuff and make the multithreaded stuff only use up as much CPU cycles as they actually need to be in sync with the main thread, instead of just workload : number of cores.
    • Identify the critical path work (per-frame tasks that could impact the critical path) and background/asynchronous work.
      • Arrange separate thread pools based on per-frame and background work.
    • Limit concurrency and thread pool size to what your workload really needs.
      • This will have to be benchmarked (ideally across target system configurations).
        • Are you getting the expected gains on the overall/realistic workload when increasing thread pool size?