News Zen 5 SMT-focused testing suggests Intel made a mistake ditching Hyper-Threading on Lunar Lake

thestryker · Aug 6, 2024

TheSecondPower said:
Clearly Intel should've compared Skymont LP e-cores to Crestmont LP e-cores only when exploring power consumption. And Intel should've compared Skymont LP e-cores to the Crestmont regular e-cores when talking about performance, to ensure Skymont is presented in the worst-possible light.

Unfortunately it's not that simple because the presentation is about LNL and the Skymont cores in LNL aren't on the ringbus. That means the closest comparison that can be made is to the Crestmont LPE cores even if the comparison seems silly on the face of it. So while I don't think it's a great way to show performance it's the closest they could get. I suspect we'll see all the 1:1 comparisons when Intel dives into ARL.

TheSecondPower · Aug 6, 2024

TheSecondPower said:
Clearly Intel should've compared Skymont LP e-cores to Crestmont LP e-cores only when exploring power consumption. And Intel should've compared Skymont LP e-cores to the Crestmont regular e-cores when talking about performance, to ensure Skymont is presented in the worst-possible light.

Comparing Skymont LP e-cores to Crestmont LP e-cores was called "marketing jujitsu". So my point here was to take that claim to its logical extreme. I don't actually think that Intel should present their products in the worst-possible light.

bit_user · Aug 6, 2024

TheSecondPower said:
Comparing Skymont LP e-cores to Crestmont LP e-cores was called "marketing jujitsu". So my point here was to take that claim to its logical extreme. I don't actually think that Intel should present their products in the worst-possible light.

The thing about Lunar Lake is that Intel isn't just using Skymont to replace the LP E-cores, but also the regular E-cores. That's why I feel it would've been equally valid for them to compare against either the Crestmont E-cores or LP E-cores. ...that is, until you consider the fact that the Crestmont LP E-cores had nothing beyond their L2 cache, while Lunar Lake's Skymont cores still enjoy a LLC that fulfills a similar function as L3 (as you pointed out).

So, in my opinion, it would've made more sense for Intel to compare Lunar Lake's Skymont cores against Meteor Lake's regular Crestmont E-cores. Instead, Intel chose the more favorable point of comparison. Defensible, but still comes off looking shady, when you notice it.

stuff and nonesense · Aug 6, 2024

Quirkz said:
Intel: Removing SMT support freed up die space, and gave us a 30% improvement performance per watt improvement.

Independant test: AMD SMT gives an 18% performance per watt improvement.

Tom's Hardware: "What were intel thinking? Morons!"

😀

You can’t compare changes across architectures in that way. Yes they are similar features, yes they take up die space and they both improve performance somewhat.

The differing implementations have different cost/benefit analyses. Intel have decided for the next range of CPUs that the cost/benefit (space and power/throughput) indicates that HT isn’t worth it.

It’s their product, time will tell.

Pierce2623 · Aug 6, 2024

TheSecondPower said:
What is a false experiment?

When I have video compression work to do, I use my desktop. Most compiling work I do (JS) only takes a few seconds on my 4-core Tiger Lake laptop, so if given a choice I would take faster and more efficient cores for work that involves compiling.

The smaller variant of Meteor Lake is 2+8 cores (12 threads), Phoenix 2 is 2+4 cores (12 threads), and the M3 in the MacBook Air is 4+4 cores (8 threads). In small, thin, and light laptops, there's not a lot of thermal or battery headroom to power more cores. This is approximately the market Lunar Lake is after. I know Lunar Lake's thread count is down from the smaller Meteor Lake, but it trades 2 little cores for big cores and promises a 50% IPC increase for the remaining little cores.

Lunar lake is NOT in the same market as little Phoenix. Little Phoenix only has 4CU and it’s a total budget chip. Lunar Lake has Intel’s biggest most powerful iGPU ever and it’s just as big as the one in that AMD just released in Strix Point with 16CU. So not the same market as little Phoenix AT ALL.

Pierce2623 · Aug 6, 2024

TheHerald said:
That's the most silly comparison I've ever seen for so many reasons it's unfathomable to me that a serious platform like phoronix went ahead with this, lol.

1) Intel doesn't have HT on all of it's cores. Turning off HT on eg. a 14900k drops performance by 9-12%, because only 8 of it's 24 cores have it in the first place. Common sense, right? On AMD it drops performance by 20-25% cause all of it's cores have it

2) Comparing HT on / off on a CPU that was made (used silicon) with HT is ridiculous. Intel could be using that 5% die space to add an extra small core completely negative the impact of not having HT.

Where are you coming up with the idea that removing HT is a guaranteed savings of die space? You realize this architecture has HT and it’s just switched off on the consumer parts right?

Pierce2623 · Aug 6, 2024

TheHerald said:
But doesn't the "I expect skymont will be even closer in size to lions cove" answers the whole thing? Removing HT allows them to make ecores larger at the same die size, no?

No it doesn’t. HT doesn’t add structures to the CPU it’s just adding another thread that competitively shares the same resources. Only operating one thread per core will ALWAYS leave some structures under utilized and waiting for data or instructions. AMD just did an interview saying Zen5 can fully utilize its resources in 1T mode but they still have HT because it’s the best way of keeping your ALUs occupied so far.

Hartemis · Aug 6, 2024

Hyper Threading increased performances by arround ~30% on the old 8th Core generation (8000K).
Perhaps this gain has diminished over time and generations, with Lion Cove, but that's just a design reason.
https://www.phoronix.com/review/intel-ht-2018

If the server Lions Cove has HT, I'm sure it will be tested on and off and against the desktop version, by Phoronix or other testers. We'll have more data on which to debate. 😆

Here is another review of AMD's SMT (Zen 3)
https://www.anandtech.com/show/1626...of-multithreading-on-zen-3-and-amd-ryzen-5000

The assumption is consistent through reviews: HT/SMT is +20% ~ 30% in application, 0% ~ -5% in gaming (in terms of pure performance, without regard to the consumption or area)

TheSecondPower · Aug 6, 2024

Pierce2623 said:
Lunar lake is NOT in the same market as little Phoenix. Little Phoenix only has 4CU and it’s a total budget chip. Lunar Lake has Intel’s biggest most powerful iGPU ever and it’s just as big as the one in that AMD just released in Strix Point with 16CU. So not the same market as little Phoenix AT ALL.

I'm comparing CPU to CPU. They're not completely comparable but this is largely because Lunar Lake doesn't have a direct predecessor or direct competition. The most similar competitor is probably the Apple M3 or M4, but I doubt many people will cross-shop the two given the operating system desparity.

Pierce2623 said:
Where are you coming up with the idea that removing HT is a guaranteed savings of die space? You realize this architecture has HT and it’s just switched off on the consumer parts right?

The Lion Cove design has hyperthreading, but the version of it put into Lunar Lake does not have those transistors. One of Intel's stated goals in removing it is improved performance per area in single-threaded workloads. Intel cannot get that benefit if the transistors are still there.

bit_user · Aug 6, 2024

stuff and nonesense said:
You can’t compare changes across architectures in that way. Yes they are similar features, yes they take up die space and they both improve performance somewhat.

Those numbers (or the ones @Quirkz probably intended to quote) aren't measuring the same thing, but saying you can't compare hyperthreading implementations like saying you can't compare implementations of AVX2 or any other feature. In other words, of course you can!

stuff and nonesense said:
The differing implementations have different cost/benefit analyses. Intel have decided for the next range of CPUs that the cost/benefit (space and power/throughput) indicates that HT isn’t worth it.

Yes, but Intel has a more extreme hybrid strategy, where their E-cores provide a better perf/area improvement. Intel acknowledged that the perf/area of non-HT P-core is 15% less than that of a fully-occupied HT-capable P-core. If we use old Alder Lake data, the SPECint performance of a Gracemont E-core is 64.5% as high as a Golden Cove P-core. Using the area ratio of 29% yields a perf/area ratio of 2.22x for the E-core vs. P-core, which confirms that E-cores provide an obvious area-efficiency benefit for their hybrid CPUs (if I'm reading their slide correctly, fully-occupied HT should provide 1.18x perf/area).

Since server CPUs aren't hybrid, SMT obviously makes more sense in that context.

bit_user · Aug 6, 2024

Pierce2623 said:
Where are you coming up with the idea that removing HT is a guaranteed savings of die space? You realize this architecture has HT and it’s just switched off on the consumer parts right?

Intel's analysis is based on the premise of physically eliminating all supporting infrastructure for Hyper-Threading. See post #8, for details.

Intel's server cores have long had some physical differences to their client cores. Features like an extra AVX-512 FMA port, additional L2 cache, and now AMX are not physically present in their client cores. Varying the presence or absence of Hyperthreading would be a much more invasive change, but it's possible if they're willing to redo the layout:

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F527b9476-2264-485e-8880-8d4b28c488a9_1185x1037.jpeg

Source: https://locuza.substack.com/p/info-snack-alder-lake-m-raptor-lake

Pierce2623 said:
HT doesn’t add structures to the CPU it’s just adding another thread that competitively shares the same resources.

They need additional tagging, queue high/low watermarks (at least, according to Mike Clark's description of AMD's implementation), soft partitioning of cache (i.e. so one SMT thread can't completely cache-starve the other), and some per-thread structures have likely been added, to counter information leakage between threads. That stuff might be cheap, but it's not free.

stuff and nonesense · Aug 6, 2024

bit_user said:
Yes, but Intel has a more extreme hybrid strategy, where their E-cores provide a better perf/area improvement. Intel acknowledged that the perf/area of non-HT P-core is 15% less than that of a fully-occupied HT-capable P-core. If we use old Alder Lake data, the SPECint performance of a Gracemont E-core is 64.5% as high as a Golden Cove P-core. Using the area ratio of 29% yields a perf/area ratio of 2.22x for the E-core vs. P-core, which confirms that E-cores provide an obvious area-efficiency benefit for their hybrid CPUs (if I'm reading their slide correctly, fully-occupied HT should provide 1.18x perf/area).

Curious, why is that a “but”?
That is the type of data which would go into a cost/benefit analysis.

bit_user · Aug 6, 2024

stuff and nonesense said:
Curious, why is that a “but”?

It could change their analysis, leading them down a different path. The data suggests that AMD sought to further optimize SMT, while Intel dropped it. If Zen 5C cores had a similar perf/area advantage as Intel's E-cores, maybe AMD would've gone the same route.

stuff and nonesense said:
That is the type of data which would go into a cost/benefit analysis.

The reason I used that conjunction is because you cited Intel's analysis as if it were only about HT and not incorporating the totality of their hybrid architecture.

I think we basically agree. Let's not lawyer the the minutiae, cool?

Quirkz · Aug 7, 2024

stuff and nonesense said:
You can’t compare changes across architectures in that way. Yes they are similar features, yes they take up die space and they both improve performance somewhat.

That's exactly my point!

Toms hardware is comparing this, and implying that Intel made a mistake. But it's a poor comparison to make.

Quirkz · Aug 7, 2024

vanadiel007 said:
I think hyperthreading has proven it's abilities over the past 10 years.
To go back to single threading would be a mistake in my opinion, because software has been optimized over the past many years for hyperthreading.

Software hasn't been optimised for hyperthreading: Software has mostly been optimised to many cores.

Quirkz · Aug 7, 2024

bit_user said:
Since server CPUs aren't hybrid, SMT obviously makes more sense in that context.

Exactly. But you can bet that intel wants some of that sweet, sweet perf-per-watt that the e-cores bring to the datacenter.
Which is why Xeons with E-cores were announced at computex 2024 as part of a new efficiency line.

bit_user · Aug 7, 2024

Quirkz said:
you can bet that intel wants some of that sweet, sweet perf-per-watt that the e-cores bring to the datacenter.
Which is why Xeons with E-cores were announced at computex 2024 as part of a new efficiency line.

Sierra Forest is already shipping and it's good.

https://www.tomshardware.com/pc-com...rest-xeon-6-cpus-granite-rapids-follows-in-q3

It usually beats AMD's Begamo on integer workloads, but it's comparatively weak on floating point. However Begamo is last year's news and it remains to be seen how the Zen 5C version of Turin will compare.

Hartemis · Aug 7, 2024

vanadiel007 said:
With the complexity of today's programs, and the emerging inclusion of AI capable chips in work stations, I would say there's lot's of opportunity to break the computing chain up in many small parallel pieces.

That is why I think more threads is better. Also there's a reason why it has been 2 threads per core, and not say 3 or 4.

It's a bit like having a winning horse and replacing it with a brand new horse.

SMT/HT is not intended to provide more threads to the OS or applications. It is intended to make more efficient use of available resources of a cpu.
Let's say 1T can occupy 70% of the compute units of a core, 2T can use 95%. 3T or 4T could only extract 3 or 4% more.

News Zen 5 SMT-focused testing suggests Intel made a mistake ditching Hyper-Threading on Lunar Lake

Judicious

Distinguished

Titan

Honorable

Commendable

Commendable

Commendable

Reputable

Distinguished

Titan

Titan

Honorable

Titan

Commendable

Commendable

Commendable

Titan

Reputable

Share this page