News Zen 5 SMT-focused testing suggests Intel made a mistake ditching Hyper-Threading on Lunar Lake

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
That's the most silly comparison I've ever seen for so many reasons it's unfathomable to me that a serious platform like phoronix went ahead with this, lol.

1) Intel doesn't have HT on all of it's cores. Turning off HT on eg. a 14900k drops performance by 9-12%, because only 8 of it's 24 cores have it in the first place. Common sense, right? On AMD it drops performance by 20-25% cause all of it's cores have it

2) Comparing HT on / off on a CPU that was made (used silicon) with HT is ridiculous. Intel could be using that 5% die space to add an extra small core completely negative the impact of not having HT.
I think they are testing with what is available in the market now. Essentially most retail CPUs will come with HT/ SMT, especially for AMD where they have been offering HT on all their Ryzen CPUs. In any case, it won't be that long before we know how well Lunar Lake performs.
 
That's the most silly comparison I've ever seen for so many reasons it's unfathomable to me that a serious platform like phoronix went ahead with this, lol.

1) Intel doesn't have HT on all of it's cores.
They didn't compare Ryzen AI to any Intel CPU. It was the same CPU with/without HT.

Turning off HT on eg. a 14900k drops performance by 9-12%, because only 8 of it's 24 cores have it in the first place.
If one wanted to gauge the effectiveness of HT on Intel CPUs, you could either use a CPU without E-cores or simply disable them. That would make the experiment a pure one of HT on/off.

Intel could be using that 5% die space to add an extra small core completely negative the impact of not having HT.
E-cores come in clusters of 4 - you can't add just one.

Furthermore, in Alder Lake die shot analysis, E-cores have an amortized size of 29% the area of a P-core. Given the massive IPC increases of Skymont, this ratio will almost certainly increase, in Lunar Lake and Arrow Lake.
 
  • Like
Reactions: thestryker
They didn't compare Ryzen AI to any Intel CPU. It was the same CPU with/without HT.
Yes? Which is a silly way to then use the data to theorize whether intel did good by removing HT or not.

If one wanted to gauge the effectiveness of HT on Intel CPUs, you could either use a CPU without E-cores or simply disable them. That would make the experiment a pure one of HT on/off.
But that's irrelevant. Intel chips have ecores - so the decision to remove HT has to be weighted with Ecores in mind.
E-cores come in clusters of 4 - you can't add just one.

Furthermore, in Alder Lake die shot analysis, E-cores have an amortized size of 29% the area of a P-core. Given the massive IPC increases of Skymont, this ratio will almost certainly increase, in Lunar Lake and Arrow Lake.

Don't they have the smaller NPU cores or whatever the hell they are called?

It doesn't really matter, the point is HT takes some die space, if that die space is used for something else (increased ST, increased clocks, more core or what have you) it might negate the drawbacks.
 
  • Like
Reactions: Quirkz
Yes? Which is a silly way to then use the data to theorize whether intel did good by removing HT or not.
According to the data I cited in post #9, Intel is claiming a 9.5% perf/W benefit by enabling HT on a core that supports it. You can compute that because their HT on vs. HT off stats are both related to a core without HT.

The data cited in the Phoronix article shows a 20.2% efficiency benefit between HT on vs. off. This suggests that AMD's implementation of SMT favors power-efficiency more than Intel's, although clearly Intel didn't use the same workload profile for their statistics as that Phoronix benchmark suite.

It doesn't really matter, the point is HT takes some die space, if that die space is used for something else (increased ST, increased clocks, more core or what have you) it might negate the drawbacks.
Intel's own slides directly supports the conclusion that HT is a win for MT perf/area. Again, see my Post #9. This is cited as the reason why they retained it in their server P-cores.

Their claim was HT is worse for ST perf/area and not a big win for MT power-efficiency. However, the Phoronix data on Zen 5 (mobile) is enough to suggest the latter isn't necessarily so.
 
According to the data I cited in post #9, Intel is claiming a 9.5% perf/W benefit by enabling HT on a core that supports it. You can compute that because their HT on vs. HT off stats are both related to a core without HT.

The data cited in the Phoronix article shows a 20.2% efficiency benefit between HT on vs. off. This suggests that AMD's implementation of SMT favors power-efficiency more than Intel's, although clearly Intel didn't use the same workload profile for their statistics as that Phoronix benchmark suite.


Intel's own slides directly supports the conclusion that HT is a win for MT perf/area. Again, see my Post #9. This is cited as the reason why they retained it in their server P-cores.

Their claim was HT is worse for ST perf/area and not a big win for MT power-efficiency. However, the Phoronix data on Zen 5 (mobile) is enough to suggest the latter isn't necessarily so.
Obviously when you are comparing P core with HT on vs a P core with HT off, perf / watt and perf / area is always on favor of HT. Noone disputes that, that's not the point im making. The point im making is the saved die area can be used for something else.

On a CPU with 8 pcore and 16 ecores the drop in performance in HEAVILY mt scenarios is ~10%. If you can get 10% by using that extra die space that HT takes somewhere else, you might negate the negatives.

What could be the logical explantion, that they removed HT cause they want their cpus to be slower?
 
  • Like
Reactions: TheSecondPower
The point im making is the saved die area can be used for something else.
I addressed this point. My post was not long. If you can't even be bothered to read the whole thing before replying, then I fail to see how this exchange can possibly be productive.

What could be the logical explantion, that they removed HT cause they want their cpus to be slower?
I already answered this.
 
  • Like
Reactions: Guardians Bane
Furthermore, in Alder Lake die shot analysis, E-cores have an amortized size of 29% the area of a P-core. Given the massive IPC increases of Skymont, this ratio will almost certainly increase, in Lunar Lake and Arrow Lake.
3 more weeks! I really hope Intel brings both LNL and ARL to Hotchips because I'm really curious how the die design is different. Sure feels like it's going to be a long time before we get Xeons with Lion Cove, but those are also going to be super interesting to see how those P-cores look since I imagine AVX512 won't be on LNL/ARL and they'll also have HT of course.
 
  • Like
Reactions: KyaraM and bit_user
Okay, so we just stop caring about people who are trying to do everything with just a laptop? That includes a lot of students, you know? They might not appreciate being told they have to buy a gaming laptop or a desktop PC to do multithreaded work.
Intel has two product lines coming, Lunar Lake and Arrow Lake. If you want a laptop processor from Intel that has more than 4+4 cores, consider Arrow Lake instead.
 
It seems that it is possible that a process could have some threads that would benefit from running on a core that does not have another thread running.

If so, what is the likelihood that undesirable scheduling would occur and what is the result of these situations?

I guess that you could say that boosting has a similar issue. Sometimes a CPU is boosting and sometimes it is not.

This is probably relevant for an application, such as a game, where performance is important, and low variation of performance is important.
 
and what about security?
I'm guessing you are talking about AMD's unfixed SQUIP vulnerability where processes in different threads across shared Zen cores leak information regardless of any other security measures taken?
Intel's got you there: the server version of Lions Cove (where side-channel attacks tend to be the biggest concern) will retain hyperthreading!

Security is pretty easily dealt with by hypervisors and OS schedulers now having the capability to not to split a physical core between VMs or processes.
Intel's vulnerabilities in this regard have been patched. And one would hope that companies would do the right thing at the expense of their bottom line and not share physical cores, but I remain suspicious that their bottom line will come first.
I put in mitigations for that in ESXi 6.7 back in 2018. All new hypervisors have this put in by default.
Does this include custom setups like AWS, Azure and Google cloud which make up the majority? If so, it is good to know that most, if not nearly all side channel attacks are little more than fearmongering hype.

That aside, I think it is well known that AMD's SMT is generally more performant than Intel's HT. And that AMD with fewer cores per die has more benefit from doubling their threads to avoid die to die latency hits.
I would rather have e-cores though. Even though they aren't as efficient as SMT they are vastly more performant, especially the new ones. But if I had a Ryzen I would also much rather have SMT enabled than disabled.
They're different chips with different cost/benefit ratios for SMT,HT. You can't really say AMD's SMT is as useless as Intel's HT or Intel would suffer as much as AMD losing SMT if they ditched HT. I really wish AMD would do something about SQUIP though. I know some companies are using Epycs and I've seen corporations be less than noble in the name of profits. Like AMD with SQUIP for example.
 
Intel has two product lines coming, Lunar Lake and Arrow Lake. If you want a laptop processor from Intel that has more than 4+4 cores, consider Arrow Lake instead.
Arrow Lake is primarily a desktop CPU.

However, Intel usually repurposes their mainstream desktop die for the high-end (HX) laptop model. IIRC, the Arrow Lake-based HX laptop CPUs aren't due out until Q1 2025.
 
I'm guessing you are talking about AMD's unfixed SQUIP vulnerability where processes in different threads across shared Zen cores leak information regardless of any other security measures taken?
Either disable SMT or use the hypervisor features @jeremyj_83 mentioned, as well as Core Scheduling within your VM.

If so, it is good to know that most, if not nearly all side channel attacks are little more than fearmongering hype.
Oh, so you think side-channel attacks require SMT? "DOIT" (Data Operand Independent Timing) and RFDS (Register File Data Sampling) are two examples of side-channel attacks that affect Gracemont E-cores, which lack hyper-threading.

ARM also has CPU cores affected by some side-channel attack vulnerabilities, even though they lack SMT.

Heck, even some of Apple's SoCs are affected by some!


I really wish AMD would do something about SQUIP though.
Check to see if Zen 5 has it fixed.
 
Does this include custom setups like AWS, Azure and Google cloud which make up the majority? If so, it is good to know that most, if not nearly all side channel attacks are little more than fearmongering hype.
AFAIK those cloud providers patched their hypervisors right away. Considering they use automation for commands it would be quite simple to patch early on. Afterwards the newer versions of the hypervisor would include making sure that SMT threads aren't shared across different VMs/containers.
 
I think hyperthreading has proven it's abilities over the past 10 years.
To go back to single threading would be a mistake in my opinion, because software has been optimized over the past many years for hyperthreading.
 
Either disable SMT or use the hypervisor features @jeremyj_83 mentioned, as well as Core Scheduling within your VM.


Oh, so you think side-channel attacks require SMT? "DOIT" (Data Operand Independent Timing) and RFDS (Register File Data Sampling) are two examples of side-channel attacks that affect Gracemont E-cores, which lack hyper-threading.

ARM also has CPU cores affected by some side-channel attack vulnerabilities, even though they lack SMT.

Heck, even some of Apple's SoCs are affected by some!


Check to see if Zen 5 has it fixed.
Once again, having the ability to mitigate by allocating more resources per VM and actually mitigating by providing more resources per VM are 2 different things. It sounds simple enough to just isolate users and processes on individual cores, but VMs can better use the resources by sharing what is available and cloud providers can charge a premium for dedicated compute resources.

Are they going to give out the premium service of going from vCPU-hour to CPU-hour for free because they decided to go with Epyc? Are they not going to share cores for the lower tiers? I don't know much about cloud computing, but sharing server compute resources to better utilize them seems like one of the main benefits. https://learn.microsoft.com/en-us/w...ation/hyper-v/manage/manage-hyper-v-cpugroups
They likely can dedicate these cores just some of the time but clean stops and starts of processes must cost more than no resources.

And I'm not arguing that the cloud service administrators can't mitigate for SQUIP, I'm arguing that I don't trust they always will because it costs them money. Even I sometimes ride right through a stop sign when I'm on my 10 speed and nobody is around. Nobody is hurt, the trees don't complain so why not? It is almost always safe.

As far as SQUIP affecting Zen 5: I haven't heard of mitigation and since not a lot are looking into it, we will just have to wait a year for Zen 5 to show up on the official list: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1039.html like was the case with Zen 4.
 
I think hyperthreading has proven it's abilities over the past 10 years.
To go back to single threading would be a mistake in my opinion, because software has been optimized over the past many years for hyperthreading.
Hyperthreading (HT) is Intel's product name for SMT, a technology that allows 2 threads to be run on the same physical core. Getting rid of HT wouldn't stop software from being multi threaded. Instead the extra threads would be only able to run on the physical cores instead of the logical cores (a 4c/8t CPU has 4 physical and 4 logical (SMT thread cores) that SMT creates.
 
Hyperthreading (HT) is Intel's product name for SMT, a technology that allows 2 threads to be run on the same physical core. Getting rid of HT wouldn't stop software from being multi threaded. Instead the extra threads would be only able to run on the physical cores instead of the logical cores (a 4c/8t CPU has 4 physical and 4 logical (SMT thread cores) that SMT creates.

Yes, what I was saying that 4C/8T would now need 8 cores to be 8T. That in my opinion is a step backwards.
 
  • Like
Reactions: bit_user
Yes, what I was saying that 4C/8T would now need 8 cores to be 8T. That in my opinion is a step backwards.
Out of curiosity why would you care how many threads there are as long as the necessary performance is provided?

Realistically speaking thread count doesn't matter anymore than core count does. It's about getting the right level of performance for the application in question.

In the case of LNL the drive behind dropping SMT is all about maximizing space and power efficiency while maintaining good performance. If they've managed to do that without sacrificing performance then it's a win in my book.
 
What Intel is doing reminds me of AMD's Bulldozer architecture. It used more smaller cores to make up for a lack of SMT. I disagree with Intel's design concept, because over the years I've learned that asymmetrical and inconsistent designs, are much more difficult to deal with, it's almost universally much better to design for consistency. The exception, made by trying to work around different types of cores for different workloads, will be a PITA forever. As successful as ARM has been, it too has a big-little design, which IMHO is not very good, they should have instead found another way to get the job done. I'm actually really pleased that AMD is not following ARM's concept to conserve energy, it's the less optimal choice. Intel has made a design choice blunder IMO.
 
Realistically speaking thread count doesn't matter anymore than core count does. It's about getting the right level of performance for the application in question.
You're right, but there's an extremely narrow argument one could make, such as where a pair of threads operate on shared state in a consumer/producer fashion. If it also involves relatively low ILP code, you could see a very nice speedup from SMT, when the threads are paired on the same cores.

However, threads are usually a lot more decoupled. Furthermore floating-point workloads tend to benefit little from SMT, because CPUs can usually achieve good pipeline occupancy from one thread per core.

One reason software tends not to exploit the super low-latency and cache efficiency benefits of SMT is because threading APIs are generally somewhat archaic and make it difficult for apps to tell the operating system which of their threads do a lot of communication with each other.

This fits into a larger story about why the benefits of SMT aren't even greater and downsides smaller, due mostly to operating systems lacking the kind of in-depth knowledge of needed to make more intelligent scheduling decisions. For instance, some gamers experience higher framerates by disabling hyperthreading, but that probably only helps because the app naively spins up more threads than it really needs and then the OS ends up pairing some compute-intensive threads - at least one of which is on the critical path for frame rendering - on the same physical core. That should almost never happen, but threading APIs and apps never properly evolved to take into account SMT.

Also, app developers tend to act as if they have the whole CPU to themselves and create "thread pools" that are sometimes inappropriately sized. Too often, they're made bigger than necessary, with one thread per logical core. They're nearly always sized statically, and without care or knowledge of how heavily-loaded the rest of the system is.

Of course, a similar story can be told about E-cores. There, AMD hasn't gone as far as Intel, while Intel appears to be moving closer to AMD's approach of simply having less performance gap between P-cores and E-cores. I have to wonder how much this is really just a matter of the hardware catering to legacy software.
 
Last edited:
I disagree with Intel's design concept, because over the years I've learned that asymmetrical and inconsistent designs, are much more difficult to deal with, it's almost universally much better to design for consistency. The exception, made by trying to work around different types of cores for different workloads, will be a PITA forever.
App developers have a tendency both to presume they know best, but also to prefer simplicity - because dealing with hardware and thread scheduling is hard. The two are somewhat at odds with each other.

The unpleasant fact you really can't get around is that any hardware will have asymmetries at scale. Whether it's chiplets with distinct L3 cache slices or NUMA, there's some point at which it becomes increasingly costly to maintain a fiction of uniformity. IMO, the solution is for apps and operating systems to work more harmoniously and revisit some sacred cows of how multithreaded software is designed, implemented, and interacts with the OS.

Dynamic clock speeds create another wrinkle in the core symmetry picture. Even when you have physically identical cores, they're often not all running at identical clock speeds (Intel Turbo Boost 3.0 even goes so far as to establish different per-core maximums, based on manufacturing quality). So, the problem of optimal thread placement will often have complexities that belie any notion of perfect symmetry.
 
Of course, a similar story can be told about E-cores. There, AMD hasn't gone as far as Intel, while Intel appears to be moving closer to AMD's approach of simply having less performance gap between P-cores and E-cores. I have to wonder how much this is really just a matter of the hardware catering to legacy software.
I think Intel is just trying to maximize performance within thermal constraints. The E- cores typically run significantly cooler than the P-cores. Raising the compute load/mm^2 raises temps with everything else equal. The contrapositive can be said for losing HT.

New desktop CPUs are pretty thermally limited so raising the IPC where you have thermal headroom is kind of picking the low hanging fruit.
 
  • Like
Reactions: TheHerald
Status
Not open for further replies.