News Zen 5 SMT-focused testing suggests Intel made a mistake ditching Hyper-Threading on Lunar Lake

Status
Not open for further replies.
Trying to make any overall SMT efficiency conclusions based on a laptop platform seems useless. All you can really say is that when the CPUs are heavily power limited it's a net gain.

It's looking like Intel's strategy moving forward is using the E-cores on the client side of things in place of SMT. They're more space efficient and allow the P-cores to be smaller as well. It's a move that makes sense from a business perspective since they're leveraging the E-cores in enterprise and low power parts as well.
 
The article said:
The Phoronix benchmarks demonstrate that Zen 5 and Zen 5c benefit massively from multi-threading technology. In the case of the Ryzen AI 9 HX 370, AMD is only giving up 2% of its power to extract a very impressive 18% more performance from the chip, significantly improving efficiency.

Ironically, Intel removed Hyper-Threading in Lunar Lake to improve performance efficiency.
Mike Clark said the dual-decoder microarchitecture of Zen 5 works differently between single-threaded and SMT modes. When executing multiple threads, each thread gets its own decoder. He also said there are a couple other per-thread resources, but I forget which and I think the decoder is really the main one.

In other words, I think this doesn't invalidate what Intel said about Hyperthreading, in Lions Cove. It just says that while Intel went in the direction of eliminating SMT, AMD went in the direction of understanding its bottlenecks and optimizing them.
 
Last edited:
The article said:
Intel says that removing Hyper-Threading allowed its designers to squeeze a 30% improvement in performance per power per area out of the Lion Cove P-cores.
To be clear, the 30% number is some kind of weird metric they concocted. IMO, it's just an exercise in claiming bigger numbers, rather than anything terribly useful.

They said a single thread has a 15% better perf/W and 10% better perf/area advantage vs. a single thread running on an equivalent core that's HT-capable.

d743MPmtAGFZpWgwcv5HDL.jpg


They said multithreaded apps have 5% better perf/W vs. a comparable core running 2 threads. However, on perf/area, the non-HT core is 15% less efficient than a comparable hyperthreaded core.

4a3sRhEMdALmD2H2H96aKL.jpg


It really would've been nice if you'd quoted these slides more precisely, or just posted them for people to see the key points I just outlined.

Anyway, whether HT makes sense is really a question of whether you're optimizing for lightly-threaded workloads and prioritizing power-efficiency, or focused mainly on multi-threaded workloads and optimizing for perf/$. This aligns with Intel's decision to remove it from the client version of Lions Cove, but to retain it in the server variant of the P-core.
 
and what about security?
Intel's got you there: the server version of Lions Cove (where side-channel attacks tend to be the biggest concern) will retain hyperthreading!

Security is pretty easily dealt with by hypervisors and OS schedulers now having the capability to not to split a physical core between VMs or processes.
 
Trying to make any overall SMT efficiency conclusions based on a laptop platform seems useless.
Power does complicate the picture, but then multithreaded workloads tend to be power-limited (or thermally-limited - basically the same thing) almost no matter where they're run!

All you can really say is that when the CPUs are heavily power limited it's a net gain.
Intel is claiming SMT actually hurts power-efficiency. So, running it on a more heavily power-limited platform is actually a more strenuous endorsement for Zen 5's implementation!
 
  • Like
Reactions: Sluggotg
I think this slide is meant to address 1-thread vs. 2-thread efficiency on the same HT-capable core:
It's still "projected" and "best-case" rather than an applicable overall statement and they're still prominently pushing area savings.

SMT is just one of those things where it can have big swings in efficiency and cost depending on the workload. I think the AMD enterprise chips will probably show similar gains to the laptop, but it'd be the desktop parts I'm most curious about as they have a lot of headroom vs the amount of cores.
 
Last edited:
  • Like
Reactions: KyaraM
AMD recently did an interview on Chips and Cheese where they specifically said they don’t add extra resources/larger structures for SMT. They specifically design the architecture where it can use the full resources of the core in 1T mode as they also load every core before engaging any SMT.
 
  • Like
Reactions: bit_user
Mike Clark said the dual-decoder microarchitecture of Zen 5 works differently between single-threaded and SMT modes. When executing multiple threads, each thread gets its own decoder. He also said there are a couple other per-thread resources, but I forget which and I think the decoder is really the main one.

In other words, I think this doesn't invalidate what Intel said about Hyperthreading, in Lions Cove. It just says that while Intel went in the direction of eliminating SMT, AMD went in the direction of understanding its bottlenecks and optimizing them.
He also specifically said in the same interview that the core can use all of its resources in 1T mode and they didn’t overbuild any structures to improve SMT performance. 1T still uses the full 8 wide decode, it just doesn’t go into 2 threaded operation. SMT is PURELY about trying to keep all the ALUs always occupied and never stalled waiting on data as that greatly improves efficiency.
 
  • Like
Reactions: KyaraM and bit_user
One commenter on Phoronix said that turning off SMT on a Zen 5 doesn't actually power off its resources, and if that's true then this test does nothing to test the power savings that could be realized by removing SMT entirely.

Moreover AMD's SMT since Zen has usually been considered a bigger boost to threaded workloads than Intel's hyperthreading, so Intel has less to lose by turning it off.

Lastly, the Phoronix article doesn't touch the theory behind Intel's plan with Lunar Lake. If I open Task Manager on my Ryzen 1800X during a moderately-threaded workload, every other logical core will be busy. The OS assigns work to the 8 physical cores first and only then begins to assign work to the "logical cores". SMT is useless until that 9th thread is scheduled. On Meteor Lake, work will be assigned to the 6 big cores and 8 little cores and so hyperthreading is useless until the 15th thread is scheduled.

Now Lunar Lake only has 4 big and 4 little cores, and the little cores aren't on the same ring bus nor L3 cache so the OS is going to try to keep related threads on only one type of core at a time, so the performance loss will probably be a little bigger for Lunar Lake than for Meteor Lake. But Lunar Lake's little cores will be a lot faster than Meteor Lake's. Intel said that multithreading improves performance by 20% and increases power consumption by 10%. Lunar Lake is going into low-power devices, where most users will spend 90% of their time running light workloads and will be wanting long battery life and quiet fans. Which is better, 10% less power consumption 90% of the time, or 20% more performance 10% of the time?
 
One commenter on Phoronix said that turning off SMT on a Zen 5 doesn't actually power off its resources, and if that's true then this test does nothing to test the power savings that could be realized by removing SMT entirely.
That's a false experiment, because nearly all of a core's resources are shared. In fact, the whole point of SMT is resource-sharing.

Lastly, the Phoronix article doesn't touch the theory behind Intel's plan with Lunar Lake. If I open Task Manager on my Ryzen 1800X during a moderately-threaded workload, every other logical core will be busy. The OS assigns work to the 8 physical cores first and only then begins to assign work to the "logical cores". SMT is useless until that 9th thread is scheduled. On Meteor Lake, work will be assigned to the 6 big cores and 8 little cores and so hyperthreading is useless until the 15th thread is scheduled.
Phoronix tested this on a CPU with 12 cores and 24 threads. That's not much different than the 6P+8E scenario you outlined (although I think you forgot about the 2LPE cores, but never mind them). For the benchmarks to have shown a benefit, the workloads he tested must use > 12 threads. That's not uncommon in tasks like compiling, rendering, video compression, etc.

Lunar Lake is going into low-power devices, where most users will spend 90% of their time running light workloads and will be wanting long battery life and quiet fans. Which is better, 10% less power consumption 90% of the time, or 20% more performance 10% of the time?
If users of thin-and-light laptops really had no need for more than 8 threads, I guess it would mean there are a whole lot of bad marketing departments out there, because that's not where thin-and-light laptops currently max out.
 
That's a false experiment, because nearly all of a core's resources are shared. In fact, the whole point of SMT is resource-sharing.
What is a false experiment?
Phoronix tested this on a CPU with 12 cores and 24 threads. That's not much different than the 6P+8E scenario you outlined (although I think you forgot about the 2LPE cores, but never mind them). For the benchmarks to have shown a benefit, the workloads he tested must use > 12 threads. That's not uncommon in tasks like compiling, rendering, video compression, etc.
When I have video compression work to do, I use my desktop. Most compiling work I do (JS) only takes a few seconds on my 4-core Tiger Lake laptop, so if given a choice I would take faster and more efficient cores for work that involves compiling.
If users of thin-and-light laptops really had no need for more than 8 threads, I guess it would mean there are a whole lot of bad marketing departments out there, because that's not where thin-and-light laptops currently max out.
The smaller variant of Meteor Lake is 2+8 cores (12 threads), Phoenix 2 is 2+4 cores (12 threads), and the M3 in the MacBook Air is 4+4 cores (8 threads). In small, thin, and light laptops, there's not a lot of thermal or battery headroom to power more cores. This is approximately the market Lunar Lake is after. I know Lunar Lake's thread count is down from the smaller Meteor Lake, but it trades 2 little cores for big cores and promises a 50% IPC increase for the remaining little cores.
 
  • Like
Reactions: KyaraM and rluker5
Clickbait article at its best Klotz. But the fools will debate it.
Why do you think it's clickbait? The data presented clearly shows Zen 5 & 5C getting significantly more performance for virtually the same power. In light of that, it's fair to question whether Intel made the right call to remove it, or took the right perspective on its downsides.

Also, let's not call people fools.
 
Given a few assumptions that may or may not be true:
  • Hyperthreading increases power by 10% in exchange for a 20% increase in performance. (As claimed by Intel.)
  • Clock speed changes affect performance and power 1:1. (e.g. 1GHz at 10W means 2GHz at 20W, not true but close to what happens in certain clock ranges.)
  • Low-power laptops under load adjust clock speed to fit an exact power or thermal limit.
Then with a little napkin math:
  • frequency * power/frequency = total power
  • frequency * power-f/frequency * power cost for hyperthreading = total power
  • 1.00GHz * 15W/GHz = 15W
  • 0.91GHz * 15W/GHz * 110% = 15W
The clock speed is only 91% of its non-hyoerthreaded value, so final performance is 91% * 20% or 109%. Therefore:
  • Hyperthreading increases performance by about 9%.
 
What is a false experiment?
I meant the suggestion that there's a significant number of SMT-specific structures that could be disabled.

BTW, if you've heard the term power gating, it turns out that lots of chip resources get dynamically powered down when not in use.

When I have video compression work to do, I use my desktop.
Okay, so we just stop caring about people who are trying to do everything with just a laptop? That includes a lot of students, you know? They might not appreciate being told they have to buy a gaming laptop or a desktop PC to do multithreaded work.

Most compiling work I do (JS) only takes a few seconds on my 4-core Tiger Lake laptop, so if given a choice I would take faster and more efficient cores for work that involves compiling.
Okay, but that's you. In my case, I'm typically waiting a couple minutes for incremental builds, when I do pulls or switch branches. A fresh work tree takes me about a half hour to build, on a 12th gen i9. The buildsystem uses ninja, which maxes all of the cores from the beginning until the very end.

I know Lunar Lake's thread count is down from the smaller Meteor Lake, but it trades 2 little cores for big cores and promises a 50% IPC increase for the remaining little cores.
Yeah, we'll see just how they compare on multithreaded workloads. If it can even match the MT performance of its predecessor, that'd be pretty good.
 
That's the most silly comparison I've ever seen for so many reasons it's unfathomable to me that a serious platform like phoronix went ahead with this, lol.

1) Intel doesn't have HT on all of it's cores. Turning off HT on eg. a 14900k drops performance by 9-12%, because only 8 of it's 24 cores have it in the first place. Common sense, right? On AMD it drops performance by 20-25% cause all of it's cores have it

2) Comparing HT on / off on a CPU that was made (used silicon) with HT is ridiculous. Intel could be using that 5% die space to add an extra small core completely negative the impact of not having HT.
 
  • Like
Reactions: TheSecondPower
and what about security?
In my decades of using Intel or AMD chips with HT, I don't recall any security incidents. The sample is small, and there is little benefit for hacking my computer, but that just tells you whether HT/ SMT is there or not, there's little reason to be concern about security. Even without HT/ SMT, there are plenty of vulnerabilities being discovered elsewhere.
 
Status
Not open for further replies.