News Intel's next-gen Arrow Lake CPUs might come without hyperthreaded cores — leak points to 24 CPU cores, DDR5-6400 support, and a new 800-series chipset

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Wouldn't this also need to be fully supported by the OS? Since Win11 already craps the bed with hybrid architecture, is this just going to make it worse?
Win 11 doesn't crap the bed, console games are being made for consoles, they don't stick to propper windows protocol, why we have so much trouble with them, on any version of windows.
This and benchmarkers are sticking to benchmarking methods from 30 years ago...
You do realize that Golden Cove was the P-Cores used in AlderLake & Sapphire Rapids, right?
Well, it will increase the IPC of whatever e-core by around 30% because that's how far behind whatever p-core they are, because they will run heavy code on the p instead of the e cores.
Ask Intel, they decided to abandon HyperThreading and go for Rentable Units on their P-Cores.
The leak decided, we have zero idea what intel is planning to do.
 
  • Like
Reactions: rtoaht
I think there is a typo in the article:

"DisplayPort output at a UHB20 rate and Thunderbolt 4 connectors,"

should probably read:

DisplayPort output at a UHBR20 rate and Thunderbolt 5 connectors,"

Thunderbolt 4 is not fast enough to support UHBR20 (80 Gigabits per second)
 
Care to expand on this, please?
Hybrid is supposed to increase efficiency by taking on all the garbage background threads so that the p-cores can concentrate on running the main workload(s) , and the same goes for this rented units thing, heck even for hyperthreading and that is decades old now, hyperthreading can add 100% performance increase to certain things but on benches the most you see is ~30% because of what and how they test.

Benchmarks run one single thing to only see the overall throughput of all cores combined, that can't show you any benefit of hybrid or ru because that only tests throughput and you would have the exact same result by just adding more cores or making the available cores run faster.
 
  • Like
Reactions: rtoaht
There's actually a very effective mitigation for SMT-related security issues: don't schedule threads from different VMs or processes on the same core! That still lets you attain most of the benefits of SMT without the associated risks. In Linux, this feature is known as "Core Scheduling", and has been in the main kernel for a while.
I know very well about core scheduling as I'm a VMware Administrator and had to enable the mitigations for Intel CPUs before.
 
Hybrid is supposed to increase efficiency by taking on all the garbage background threads so that the p-cores can concentrate on running the main workload(s) ,
We're never going to agree about this, but I know you agree that E-cores are more area-efficient. Hence, you should agree that they're a more cost-effective way to scale performance in highly-threaded workloads.

Otherwise, it wouldn't make any sense, whatsoever, for Intel to have gone up to 16 of them, in Raptor Lake - there are never so many low-priority background threads running on a system! If that's all they were for, you'd only need like 4 of them, at most. Plus, there'd be no need for Meteor Lake to have two separate classes of E-cores. If you just need them for low-priority background tasks, then the two LP E-cores on the SoC tile would probably be enough (or maybe they'd have increased those to 4 and gotten rid of the E-cores on the CPU tile).

In my opinion, offloading background tasks to E-cores is probably the least interesting aspect of them (other than for battery-powered devices).

hyperthreading can add 100% performance increase to certain things but on benches the most you see is ~30% because of what and how they test.
The 100% increase scenarios are corner cases and don't generally reflect realistic workloads. Pretty much the first thing I did on a Pentium 4 with hyperthreading was to write a program designed to see if a 100% increase was truly possible, and it was. That doesn't mean it's typical, however.

I spend a lot of time compiling software, and I promise you that you don't get a 100% speedup from hyperthreading on those workloads. It's still more than enough to be worthwhile, but I've never seen more than about a 50% speedup from using HT on compilation jobs. I'll have to check what sort of speedup it's giving me on Alder Lake...
 
Last edited:
We're never going to agree about this, but I know you agree that E-cores are more area-efficient. Hence, you should agree that they're a more cost-effective way to scale performance in highly-threaded workloads.

Otherwise, it wouldn't make any sense, whatsoever, for Intel to have gone up to 16 of them, in Raptor Lake - there are never so many low-priority background threads running on a system! If that's all they were for, you'd only need like 4 of them, at most. Plus, there'd be no need for Meteor Lake to have two separate classes of E-cores. If you just need them for low-priority background tasks, then the two LP E-cores on the SoC tile would probably be enough (or maybe they'd have increased those to 4 and gotten rid of the E-cores on the CPU tile).

In my opinion, offloading background tasks to E-cores is probably the least interesting aspect of them (other than for battery-powered devices).
Get some context clues, we are talking about possible ways to speed up how the cores run threads here...
and one way that the e-cores do that is by taking on lighter stuff so that the main cores can run the heavy stuff faster.
The 100% increase scenarios are corner cases and don't generally reflect realistic workloads. Pretty much the first thing I did on a Pentium 4 with hyperthreading was to write a program designed to see if a 100% increase was truly possible, and it was. That doesn't mean it's typical, however.

I spend a lot of time compiling software, and I promise you that you don't get a 100% speedup from hyperthreading on those workloads. It's still more than enough to be worthwhile, but I've never seen more than about a 50% speedup from using HT on compilation jobs. I'll have to check what sort of speedup it's giving me on Alder Lake...
They are only corner cases if all of your workload is fully parallel, like rendering,video encoding...or compiling.
That was my whole point, that everybody only tests based on extremely parallel workloads that don't show much if any benefit from new technologies.

For a normal user that does none of these things, or extremely rarely, htt giving close to 100% is pretty normal.
 
  • Like
Reactions: rtoaht
Otherwise, it wouldn't make any sense, whatsoever, for Intel to have gone up to 16 of them, in Raptor Lake - there are never so many low-priority background threads running on a system! If that's all they were for, you'd only need like 4 of them, at most. Plus, there'd be no need for Meteor Lake to have two separate classes of E-cores. If you just need them for low-priority background tasks, then the two LP E-cores on the SoC tile would probably be enough (or maybe they'd have increased those to 4 and gotten rid of the E-cores on the CPU tile).
More cores will obviously help in tasks that can spread that wide. Power draw grows linear with more cores, but exponential with more clock.

That is the reason we ended up with multiple cores in consumer systems to begin with.
And today's Intel chips are closing in on 6 GHz and 300 W, which is pretty close to what they aimed at with Tejas before cancelling it back in 2004/05-ish.

A render task on 16 E-cores on 3.0-3.5 GHz will run pretty close as on 8 P-cores with 6 GHz while needing half the power.

And for narrow tasks, SMT doesn't do that much, just have "enough" P-cores and it will bring more performance just adding one of them instead of letting the already running ones do double duty.

But I would like to see them offering more than 8 P-cores on home consumer platforms. The last one was the 10900K (I know they weren't called P-cores back then, but today's P-cores pretty much come from there, while the E-cores are sort of an Atom offshoot)
 
  • Like
Reactions: bit_user
Get some context clues, we are talking about possible ways to speed up how the cores run threads here...
and one way that the e-cores do that is by taking on lighter stuff so that the main cores can run the heavy stuff faster.
Not 16 threads' worth. There's not that much "lighter stuff".

They are only corner cases if all of your workload is fully parallel, like rendering,video encoding...or compiling.
In order to even get the second thread on P-cores scheduled, you first have to achieve partial occupancy of all the P-cores. So, it's a given you won't be using hypethreading unless your workload is highly-parallel.

Apart from that, the degree of parallelism has nothing really to do with it. In order to gain a 100% speedup from HTT, you need a fundamentally serial workload with extremely low ILP (instruction-level parallelism), which leaves enough execution resources free that the two threads don't interfere with each other. By definition, this is pretty rare in computing. Otherwise, it wouldn't make much sense to build wide, superscalar cores, in the first place.

In other words, to see a 100% speedup, you almost have to feed it artificial workloads designed for that purpose.

That was my whole point, that everybody only tests based on extremely parallel workloads that don't show much if any benefit from new technologies.
Extremely parallel workloads indeed show a benefit from hyperthreading. They also show a benefit from hybrid architectures.

For a normal user that does none of these things, or extremely rarely, htt giving close to 100% is pretty normal.
No, for a user who does not use highly-parallel workloads, hyperthreading is basically something they can ignore, because they will almost never be using it.
 
If Intel entertains the idea of removing HT, I think it more like to be a security measure rather than a performance consideration. I'm not a chip engineer, but I think it would be quite hard to block all possible side channel snoops as long as there's a shared pipeline.
 
If Intel entertains the idea of removing HT, I think it more like to be a security measure rather than a performance consideration. I'm not a chip engineer, but I think it would be quite hard to block all possible side channel snoops as long as there's a shared pipeline.
There's actually a flip-side to it, though. It's not a great argument, but I believe that if you've got multiple threads scheduled on a core, that should make it more difficult for a thread running on a different core to infer what either one of them is doing (e.g. by monitoring the behavior of a shared cache or the core's clock speed).

I think the main reason SMT is still with us is probably that it's too valuable in fending off ARM server CPUs.
 
  • Like
Reactions: jp7189
so instead of moving the technology further by allowing 3 threads per core they are cancelling all together ?
I wouldn't necessarily classify SMT as more advanced. None of the ARM chips are using SMT, and Apple M series has best in class IPC. SMT was a nice way to increase MT performance on lower core count chips - it's utility is questionable in a world were you can just spam small E cores to fill this role more effectively.
 
  • Like
Reactions: rtoaht
I wouldn't necessarily classify SMT as more advanced. None of the ARM chips are using SMT, and Apple M series has best in class IPC.
Here we go, again: another one assumes that because Apple isn't doing it, that it's not the best.

No, there's a different explanation for why ARM and Apple didn't embrace SMT. That's because SMT is about extracting more performance per silicon area (i.e. so-called "area efficiency"), but it doesn't really help you with energy efficiency. ARM and Apple have been prioritizing the mobile market (i.e. phones, tablets, and laptops), which results in them building low-clocking, high-IPC cores that prioritize energy-efficiency above all else.

Conversely, Intel and AMD are focused on delivering the most performance per mm^2 of silicon, because that provides the best performance per dollar. So, naturally, they would gravitate towards SMT (and let's recall that AMD was a relative latecomer, only adding SMT in Zen 1, circa 2017). It also helps explain why they gravitate towards relatively lower-IPC, higher-clocking designs.

SMT was a nice way to increase MT performance on lower core count chips - it's utility is questionable in a world were you can just spam small E cores to fill this role more effectively.
Sort of true. I think the area-impact of adding SMT to small cores is relatively greater, making it not as much of a net win for area-efficiency on E-cores.
 
which results in them building low-clocking, high-IPC cores that prioritize energy-efficiency above all else.
I wouldn't at all be surprised to see Intel trying to implement this going forward - of trying to improve IPC at the detriment of higher clocks. It'll be beneficial to their mobile and datacenter products, which aren't clocking high to begin with.

There's really two possible scenarios here: Intel is intentionally omitting SMT for a net benefit, or Intel for some reason couldn't get SMT working and this is going to be a detriment.

I'm inclined to believe that they believe that this is a net benefit. That they see value in having P cores be even more focused on ST and E core clusters reserved for MT. Plenty of applications already benefit from disabling SMT, you don't have to worry about two threads fighting over the same L1, you simplify scheduling, you improve security, and in enough workloads, you improve performance.


That's because SMT is about extracting more performance per silicon area
Which is precisely the stated goal of E cores.
 
  • Like
Reactions: rtoaht
I wouldn't at all be surprised to see Intel trying to implement this going forward - of trying to improve IPC at the detriment of higher clocks. It'll be beneficial to their mobile and datacenter products, which aren't clocking high to begin with.
Their E-cores already put them on that trajectory. It's largely a question of how far down that path they want to go. However, they do face more limitations on IPC with x86 than other ISAs, like ARM and RISC-V.

I assume you're aware of Sierra Forest, due out later this year?

Plenty of applications already benefit from disabling SMT,
IMO, this is due to archaic threading technology. Threading APIs aren't expressive enough to enable apps to figure out the optimal number of threads to spawn, or for the OS to know how best to schedule them. That's part of the reason Intel needed to create a hack like the Thread Director.

Unfortunately, Intel needs to deliver performance & efficiency improvements on a regular cadence of years, while fundamental changes in threading technology can take the better part of a decade to develop, deploy, and get applications to adopt. In this regard, Apple has a real advantage, due to having the silicon and OS all under one roof.
 
Not 16 threads' worth. There's not that much "lighter stuff".
Not if the one and only thing you are doing is compiling, no.
Also what?! You think that HTT (and ru in the future) only exists on the 1x900k skus?
In order to even get the second thread on P-cores scheduled, you first have to achieve partial occupancy of all the P-cores. So, it's a given you won't be using hypethreading unless your workload is highly-parallel.
OR, and bare with me here, this is really revolutionary, if you run more that one thing on your $600 + CPU,
AT THE SAME TIME.... mind blown.
And again there are CPUs with only 4 p-cores, heck there are still dual cores.
Apart from that, the degree of parallelism has nothing really to do with it. In order to gain a 100% speedup from HTT, you need a fundamentally serial workload with extremely low ILP (instruction-level parallelism), which leaves enough execution resources free that the two threads don't interfere with each other. By definition, this is pretty rare in computing. Otherwise, it wouldn't make much sense to build wide, superscalar cores, in the first place.
That's more than 90% (just my estimate) of what a normal user would run on a daily basis.
Basically anything that isn't being used as a benchmark falls into this category.
Plus almost all games.
Extremely parallel workloads indeed show a benefit from hyperthreading. They also show a benefit from hybrid architectures.
Yes, that's the argument I started with...that they do show benefit but the smallest amount possible.
No, for a user who does not use highly-parallel workloads, hyperthreading is basically something they can ignore, because they will almost never be using it.
Because everybody is caught in the year 1999 alongside you, not comprehending that you can run many things at once on your PC.
(Or has dished out for an 1x900k or x959x CPU)
Hey, that's also an argument I started with.



If your argument is that people don't need the amount of cores they buy then I'm totally with you but then it doesn't matter if it's htt or ru or anything else, nothing will be used, not even all the main cores, if you have way too many cores for the work that you do.
 
Terry, Bit, Jeremy - my 2c:

If one considers what I believe to be the lion's share of Intel's profits (servers and enterprise laptops) HT makes a ton of sense. On datawarehouse servers (and DBs in general) we've observed a speedup of up to 80% with HT, zero slowdowns. Yes, enterprises are "migrating to the cloud!" but then the CPUs just go to the cloud providers.

If one thinks about what is on a typical enterprise laptop, there's a heck of a lot of background malware scanning and even more GDPR/POPIA monitoring, which _saps_ CPU grunt. Combine that with the (stupid, IMHO) idea of making laptops "as thin as possible" there's a ton of heat that needs to be gotten rid of with a really small heatsink.

Also, benchmarking from '90s, as a consumer the games I play still only max out 4-cores on my i7-6700. It's the ai workloads that I'm fiddling with that NAIL my CPU and make me want a 7950X3D. I occasionally stream while I play and the streaming tool uses the hardware encoding on my CPU or my GPU, it barely touches the CPU cores. Doing archival work and compressing a lot of LZMA2 workloads makes me want to reach for the 7960X3D again.

HT matters, so RU better blooming-well do a better job than HT.
 
HT is rather energy inefficient, doubly so when it's not needed. This is why you can disable it on the high end Intel desktop CPUs like 13900K/14900K and end up with better power consumption and not notice a difference (or even get better performance) in lightly threaded tasks like gaming.

With the way Intel's modern server CPUs boost I'd bet that at the highest end under best case scenario you'd be lucky to get +50% perf out of HT and likely not even that much. This behavior comes from being very TDP limited in all core loads.

All of this comes back to how schedulers aren't always doing a good job though (as exposed by Intel Application Optimization). Thread Director sets the stage for the CPU being more involved with scheduling and Rentable Units seems like it would be an evolution of that to where the CPU is hands on.
 
  • Like
Reactions: bit_user
Not if the one and only thing you are doing is compiling, no.
Also what?! You think that HTT (and ru in the future) only exists on the 1x900k skus?
You appear to have gotten yourself confused, here. That point was about E-cores.

OR, and bare with me here, this is really revolutionary, if you run more that one thing on your $600 + CPU,
AT THE SAME TIME.... mind blown.
I never stipulated that "workload" corresponded to a single program.

The point remains that for hyperthreads to get used, you first need to have one thread on all the P-cores and thus there's by definition some significant parallelism happening. Either the user is running at least one multithreaded program, or they're running a lot of single-threaded ones (e.g. compiling multiple source files of a software project).

And again there are CPUs with only 4 p-cores, heck there are still dual cores.
The only reason we're even talking about parallelism in this context is your unsupported claim that hyperthreading performs poorly because people benchmark it on highly-parallel workloads. Usually, such benchmarking happens on higher core-count CPUs, which is why I focused on that scenario, but the principle also applies to fewer-core CPUs - you need to achieve partial occupancy, before you can sustain full occupancy on any cores.

Furthermore, speaking of unsupported claims, you should really provide some data showing hyperthreading providing such speedups in any real-world workload.

That's more than 90% (just my estimate) of what a normal user would run on a daily basis.
Basically anything that isn't being used as a benchmark falls into this category.
Plus almost all games.
I think you're very wrong about that. But, if you're right, then you should have absolutely no trouble finding data to support your claims about the speedup provided by hyperthreading.

So, no excuses now. Unless you're wrong about one or both claims!

Yes, that's the argument I started with...that they do show benefit but the smallest amount possible.
Most of the additional multi-threaded performance Raptor Lake provides vs. Alder Lake is due to its extra E-cores. Their collective impact isn't small, which is why Intel doubled their number.
 
Last edited:
  • Like
Reactions: TJ Hooker
On datawarehouse servers (and DBs in general) we've observed a speedup of up to 80% with HT, zero slowdowns.
Source? I've never seen anything like that.

If one thinks about what is on a typical enterprise laptop, there's a heck of a lot of background malware scanning and even more GDPR/POPIA monitoring, which _saps_ CPU grunt. Combine that with the (stupid, IMHO) idea of making laptops "as thin as possible" there's a ton of heat that needs to be gotten rid of with a really small heatsink.
Yes, my experience matches the notion that there's lots of stuff running in the background. So, additional performance is appreciated, but then you're sort of arguing the other side by citing the waste heat (which correlates with energy use - another undesirable thing, for laptops).
 
Status
Not open for further replies.