News Intel's Meteor Lake CPUs are slower at single-core work than previous-gen models — new benchmarks show IPC regressions vs Raptor Lake

Admin · Jan 2, 2024

A recent test of Intel's Core Ultra 7 155H reveals that Meteor Lake may have regressed in IPC compared to Raptor Lake. It's a limited view of performance, but indicates Intel may have prioritized other aspects of the chip's design.

Intel's Meteor Lake CPUs are slower at single-core work than previous-gen models — new benchmarks show IPC regressions vs Raptor Lake : Read more

peachpuff · Jan 2, 2024

Shocked... I am not.

Kamen Rider Blade · Jan 2, 2024

Intel's Bulldozer moment, we know what happened afterwards.

bit_user · Jan 2, 2024

the CPU's performance characteristics elsewhere imply that Intel simply might not have cared as much about IPC.

Rubbish. I'm sure they care enough to try and avoid regressions, even if they weren't planning on making outright improvements.

Pretty outlandish claim, right there.

Meteor Lake is primarily designed to excel in AI applications

No, it wasn't. That happens to be one of its big selling points, but I pretty sure their top priority was to improve efficiency & battery life. They're probably worried about competition from Apple, Qualcomm, and AMD on that front.

While raw CPU performance is likely a lower priority for Intel as it focuses on AI and graphics performance

These parts are worked on by different teams.

There may also be other scenarios where Meteor Lake does better than in the SPECint 2017 testing shown here.

The only variables can be power & thermal thresholds. If the author divided by max boost clock, and the CPU wasn't allowed to stretch its legs very much or for very long, then testing on a laptop with more generous power limits and better cooling could produce better results.

Edit: Using Google Translate, the original webpage has this to say about heat & power:

"The CPU frequency is all default, and will be marked when the power consumption and heat dissipation performance are low enough to affect single-thread performance."

Still, there could be a time limit on boosting. Otherwise, sounds like the power & thermal parameters of the test platform probably shouldn't have much effect.

BTW, in addition to the 155H's "P-core" entry, there's another weird entry for it, called "LP P-core". I have no idea what that means, because I think there's only one set of P-cores and that result is still well above the 155H's E-core result.

Interestingly, its E-core scored 5.55 (absolute), while the i7-13700H's E-core scored 6.0. This is notable because Intel claimed the Crestmont E-cores actually did improve IPC over Gracemont. Normalizing by the specified "Efficient-core Max Turbo Frequency", which is 3.8 GHz for the 155H and 3.7 GHz for the i7-13700H, yields IPC scores of 1.46 and 1.62 for Meteor Lake and the previous-gen CPU, respectively. So, far from an improvement, it looks like quite a regression - even slightly more than the P-cores (11.1% vs. 9.1%)!

He measured the 155H's LP E-core at 3.15, but remember that its frequency is capped at just 2.5 GHz. It should have nearly identical IPC to the regular E-cores, but when I use the specified frequencies, I get 1.26 and 1.46 for the LP and regular E-cores, respectively.

rluker5 · Jan 2, 2024

Probably added latency due to disaggregation and it's associated steps.

So long as the typical use case battery life is improved enough then it is still a win.

bit_user · Jan 2, 2024

peachpuff said:
Shocked... I am not.

I can't see why not. It's a new node and Intel has everything to play for. No reason to think they'd take their foot off the gas.

Kamen Rider Blade said:
Intel's Bulldozer moment,

I wouldn't go that far. Not least because its perf/W indeed appears to have improved. Being a laptop part, that + its low idle power are definitely relevant to its success.

I'd love to see some in-depth analysis of what's going on. I think it's mighty suspicious that Intel apparently didn't send out any review samples, which is probably why Toms has yet to review it. Phoronix, whom they've been quoting in other articles, bought his own Meteor Lake laptop.

rluker5 said:
Probably added latency due to disaggregation and it's associated steps.

I doubt it. AMD disaggregated its cores and memory controllers, using far more primitive interconnect technology, and didn't seem to suffer much from it. The memory controller is located in the SoC die, so it's just one hop away from the CPU tile.

LPDDR5 does have meaningfully higher latency, though. David Huang was wise to point that out. He notes several other CPUs where LPDDR5 seems to have a notable performance impact.

Reverend_Clint · Jan 2, 2024

Former Intel contract employee here... Meteor Lake was a hassle and a half. Samsung drives wouldn't work unless you dropped the pcie speed to 1. They were always damn slow, even the desktop cpus (yes they existed).

JarredWaltonGPU · Jan 2, 2024

bit_user said:
Rubbish. I'm sure they care enough to try and avoid regressions, even if they weren't planning on making outright improvements. Pretty outlandish claim, right there.

The next bit is important, and perhaps we should have emphasized this more but I didn't feel it was necessary: "It also features Foveros technology and multiple tiles manufactured on different processes."

We don't have a good way to measure "pure IPC" as caches, interfaces, etc. all affect instruction throughput. The cores on Meteor Lake could be identical or even improved over Raptor Lake, but all of the changes to implement the new multi-tile packaging could reduce the final real-world throughput.

Or if you want to read the statement in a different way: Intel seems to have cared more about getting Foveros working and proving it as a viable approach over just pure CPU IPC.

bit_user · Jan 2, 2024

Thanks for considering my post and taking the time to reply. I always appreciate your dedication to the work and to us, Jarred. Also, Happy New Year!
: )

JarredWaltonGPU said:
The next bit is important, and perhaps we should have emphasized this more but I didn't feel it was necessary: "It also features Foveros technology and multiple tiles manufactured on different processes."

I understand that, but I'm also well aware that this isn't Intel's first chiplet/tile rodeo. They had Lakefield, Ponte Vecchio, and Sapphire Rapids. By now, they should have at least as much experience with disaggregated architectures as AMD had, when it did Zen 3 (i.e. Ryzen 5000).

JarredWaltonGPU said:
We don't have a good way to measure "pure IPC" as caches, interfaces, etc. all affect instruction throughput.

For a while, now, I think industry standard practice has basically been to use a benchmark like SPEC2006, SPEC2017, or GeekBench (CPU) and normalize by clock speed. It's not a strict count of the instruction rate, but different instructions have different throughputs and latencies anyhow. The key point is just to look at clock-normalized performance, as a measure of microarchitecture width and the efficacy of things like branch predictors and prefetchers.

If you really wanted to try and isolate different aspects of the microarchitecture, one could even do this with microbenchmarks designed to stress different parts of the CPU. And some people do.

JarredWaltonGPU said:
The cores on Meteor Lake could be identical or even improved over Raptor Lake, but all of the changes to implement the new multi-tile packaging could reduce the final real-world throughput.

His data provides a couple interesting points of comparison.

Let's consider the Ryzen 5 3600 and Ryzen 7 4800U. Both are Zen 2, but the first is disaggregated and the latter is monolithic. Their raw scores were 6.57 and 5.95. As they both feature turbo frequencies of 4.2 GHz, we can just directly compare the scores, with the chiplet-based CPU coming in at 10.4% faster! Maybe there's some difference in boost behavior, but I found a review which confirms the 4800U can stay under 15W, with one thread at 4.2 GHz, so I'm assuming that was what happened. If they did both sustain 4.2 GHz, then leaves only cache & perhaps memory latency as the distinguishing factors (both were tested using DDR4-3200). The Ryzen 5 3600 has 32 MB of L3 cache, while the Ryzen 7 4800U has only 8 MB. However, since the 3600 is subdivided into two CCX's, a single thread probably has access only to 16 MB of L3.

I think the main takeaway here is that there are some factors bigger than chiplet vs. monolithic.

JarredWaltonGPU said:
Or if you want to read the statement in a different way: Intel seems to have cared more about getting Foveros working and proving it as a viable approach over just pure CPU IPC.

I don't believe Intel would've rolled out such a technology for its bread-and-butter products that it didn't think was ready for prime time. The stakes are too high for them just to "push it out the door", before it's ready. Intel has taken its time to migrate to chiplets, and I assume that was just so they could perfect the technology. Sapphire Rapids showed they can deliver solid IPC using chiplets, and Meteor Lake is a whole generation beyond that and actually has far fewer tiles.

I'd encourage folks to read some of what David Huang said about LPDDR5, because it stands out as one of the more plausible explanations for the regression. I don't understand a word of Chinese, but Google Translate seems to work pretty well on the page.

"Zen 3 desktop performance is 12% higher than the strongest Zen 3 mobile (DDR5), and more than 30% higher than the strongest LPDDR5 mobile performance"
"when Alder Lake mainstream SoC i5-1240P is paired with LPDDR5-4800 The overall performance is only slightly better than the desktop Skylake++."

Giroro · Jan 2, 2024

Doesn't Meteor Lake have a couple of extra-small extra-weak (extra-misleading to customers) "platform" e-cores thrown in the mix?
Meteor lake is like a weird big.LITTLE.TINY architecture as Intel tries everything to make their core count look larger, isn't it?

Maybe they're dragging down the averages, and I can't imagine today's schedulers are that great at simultaneously managing 3 different types of cores in a single processor, considering they still aren't the best at managing 2.

thestryker · Jan 2, 2024

After seeing the Hardware Canucks preview I'm not really trusting any early release hardware results. Not to mention this reviewers spec numbers appear to be off (and I don't mean just the raw numbers, but the differences between them) when compared to those AnandTech runs which they standardized a long while ago when Andrei was there.

After CES and when there are a lot more results to compare I think we'll get a better idea. Until then assumptions based on what little is available now are nebulous at best.

From everything Intel has indicated IPC on the P-cores should be basically the same as RPL its the E-cores we'll want to take note of.

bit_user · Jan 2, 2024

Just to follow up on the LPDDR5 latency issue, NotebookCheck tested the memory latency of the Core Ultra 7 155H and got 147.1 ns. Their testing of an i7-13700H of the same notebook brand (but different model line), showed a memory latency of 97.7 ns. That's 50.6% higher, for Meteor Lake (lower is better)!

https://www.notebookcheck.net/Acer-...ra-7-impresses-with-its-AI-core.783349.0.html

They also characterize the average subnotebook latency as 97.6 ns, showing the i7-13700H to be no outlier.

bit_user · Jan 2, 2024

Giroro said:
Doesn't Meteor Lake have a couple of extra-small extra-weak (extra-misleading to customers) "platform" e-cores thrown in the mix?
Meteor lake is like a weird big.LITTLE.TINY architecture as Intel tries everything to make their core count look larger, isn't it?

The two LP E-cores reside in the SoC tile, so they can power down the CPU tile, when at or near idle. I think they're not about trying to inflate core count, nor should they come into play in single-threaded performance.

In fact, if you look at the original article (or just expand the image, in my above post), you'll see that he actually characterized the 3 different core types, separately.

Giroro said:
Maybe they're dragging down the averages, and I can't imagine today's schedulers are that great at simultaneously managing 3 different types of cores in a single processor, considering they still aren't the best at managing 2.

For a single-threaded test, it should make no difference. However, given that he explicitly tested the different core types, I'd bet he explicitly set the affinity and didn't even rely on the OS to pick a P-core, automatically.

JarredWaltonGPU · Jan 2, 2024

bit_user said:
Just to follow up on the LPDDR5 latency issue, NotebookCheck tested the memory latency of the Core Ultra 7 155H and got 147.1 ns. Their testing of an i7-13700H of the same notebook brand (but different model line), showed a memory latency of 97.7 ns. That's 50.6% higher, for Meteor Lake (lower is better)!

https://www.notebookcheck.net/Acer-...ra-7-impresses-with-its-AI-core.783349.0.html

They also characterize the average subnotebook latency as 97.6 ns, showing the i7-13700H to be no outlier.

Happy New Year! Hope everyone had a good break.

So the thing with Foveros and multi-tile stuff is it has clearly been giving Intel a lot of trouble streamlining the process. I think Meteor Lake finally proves it's ready for the mainstream market, but Ponte Vecchio and Sapphire Rapids were both super delayed from Intel — and they're also more complex, which certainly didn't help.

Anyway, that big jump in latency to me says that caching and other aspects are absolutely having an impact. And that's expected at some level, because there are four tiles: Compute, SOC, GPU, and IO. So now for compute to access memory, it goes off chip via EMIB or Foveros or whatever to the IO tile, and from there potentially direct to the IO tile but also potentially through the SOC tile first. The increase in latency makes me think it might be doing two hops (i.e. from compute to SOC then to IO).

This is why I say that measuring IPC via real-world applications is difficult, because everything has a different way of hitting the various caches and main memory. I don't know enough about SPECint 2017 to say how it's affected by those aspects.

Looking at the source article, I'm also concerned that the testing was simply done with the thought/hope that clocks would be whatever the specified boost clock says, rather than being measured for the tests. Because I know that motherboards and BIOS settings can absolutely result in CPUs not running at their supposed single core boost clocks.

In short, this is interesting data, but not something I would refer to as being fully reliable in terms of how it was produced. Dividing by potentially estimated clocks could skew various "IPC" figures by 5 to 10 percent quite easily.

Giroro · Jan 2, 2024

bit_user said:
think they're not about trying to inflate core count, nor should they come into play in single-threaded performance.

If it wasn't about inflating core count, then Intel would be putting a stop to anyone reductively listing a 6P+8E+2e processor as a "16 cores" or 2P+8E+2e as "12 cores". But they didn't take a stand against combining core counts in the last 2 generations. So I think they like it when retail consumers are misled or simply get confused into thinking all the cores are the same old "big" cores, especially when compared to AMD processors with all big cores.

bit_user said:
or a single-threaded test, it should make no difference. However, given that he explicitly tested the different core types, I'd bet he explicitly set the affinity and didn't even rely on the OS to pick a P-core, automatically.

Maybe not the OS interfering, but that still begs the question of "what is a LP P-core?"
Even then, Intel new thread director is a black box in itself. There could be some conflict happening between it and the benchmark that we don't understand yet.

bit_user · Jan 2, 2024

JarredWaltonGPU said:
Anyway, that big jump in latency to me says that caching and other aspects are absolutely having an impact.

It's not just Meteor Lake where LPDDR5 has been shown to have exceptional latency. I once saw an explanation of why, though I'm foggy on the details. Basically, the interface protocol for LPDDR5 is more complex and involves more steps.

This ASUS Zenbook S 13 OLED features an i7-1355U with LPDDR5 and was benchmarked at 120.1 ns.

https://www.notebookcheck.net/Asus-...de-struggles-with-annoying-fans.711985.0.html

Well, that's just one datapoint, but it both supports the idea that LPDDR5 is inherently higher latency, while also allowing for the possibility that something about Meteor Lake's multi-die architecture is having a further impact on latency.

JarredWaltonGPU said:
there are four tiles: Compute, SOC, GPU, and IO. So now for compute to access memory, it goes off chip via EMIB or Foveros or whatever to the IO tile, and from there potentially direct to the IO tile but also potentially through the SOC tile first. The increase in latency makes me think it might be doing two hops (i.e. from compute to SOC then to IO).

No, the memory controller lives in the SoC tile. So, it's just one hop away from the CPU tile.

Source: https://www.tomshardware.com/news/i...meteor-lake-architecture-launches-december-14

The I/O tile has the PCIe controller and (at least some of) the display PHYs. Surely, also SATA and USB.

JarredWaltonGPU said:
This is why I say that measuring IPC via real-world applications is difficult, because everything has a different way of hitting the various caches and main memory. I don't know enough about SPECint 2017 to say how it's affected by those aspects.

You just pick a benchmark and run it. As long as you know the clock speed, you can just divide it out. Or, sometimes, people go out of their way to force multiple different CPUs they're comparing all to run at the same clock speed.

BTW, I've found that IPC tends to stay relatively flat, over the range of different clock speeds. Here's what I found from some of Chips & Cheese's data:

So, it seems just dividing it out should tend to give fairly similar results as running the CPU at a fixed speed towards the upper end of its frequency range.

The SPEC 2017 benchmarks are composed of applications. The integer portion has 10 different ones, while floating point has 13. You can see the list, here:

https://www.spec.org/cpu2017/Docs/overview.html#benchmarks

BTW, I didn't realize the difference between SPECrate FP and SPECspeed FP, but I think most people run SPECrate.

JarredWaltonGPU said:
Looking at the source article, I'm also concerned that the testing was simply done with the thought/hope that clocks would be whatever the specified boost clock says, rather than being measured for the tests. Because I know that motherboards and BIOS settings can absolutely result in CPUs not running at their supposed single core boost clocks.

In short, this is interesting data, but not something I would refer to as being fully reliable in terms of how it was produced. Dividing by potentially estimated clocks could skew various "IPC" figures by 5 to 10 percent quite easily.

Yeah, it'd be nice to have more visibility into how the testing was done. In particular, if he had traces of the clock speed, to assure us of his methodology.

bit_user · Jan 2, 2024

Giroro said:
Even then, Intel new thread director is a black box in itself. There could be some conflict happening between it and the benchmark that we don't understand yet.

Intel's Thread Director doesn't play an active role in thread scheduling. It merely collects statistics, which can be used to inform the kernel on where different threads are best-placed. Therefore, if you set the core affinity, the Thread Director can't do anything to override that.

Giroro said:
that still begs the question of "what is a LP P-core?"

One guess I have is that he ran tested a P-core with the system in some kind of "Low Power" mode. If so, it's sad that he used the same notation as the LP E-core, which is an actual thing an distinct from the other E-cores.

BTW, the LP E-cores aren't "tiny", as you called them. They're supposed to have the same Crestmont microarchitecture as the other E-cores, but they reside on the TSMC N6 SoC tile and have their turbo clock speed limited to 2.5 GHz, instead of the other E-cores' 3.9 GHz limit. Therefore, it's surprising to see them providing worse IPC. If they're truly the same microarchitecture, then their IPC should be almost identical the other E-cores.

rluker5 · Jan 2, 2024

I have a few old Intel atom mini pcs. They all have horrible ram latency. Maybe the low power e cores have too much role in communicating with the ram.

Or maybe since MTL has got 3 different types of cores and can shut some of them down or wake them up depending on whatever the agreement between the thread director and Windows says, it is a complicated mess. And there may be a significant difference in both power consumption and performance depending on the Windows power plan settings. And there may also be a noticeable difference depending on whatever Windows updates you get down the line.

The whole setup of leaving a whole section of the CPU down is pretty new. There may be some changes down the line. In a review I would like to hear if things significantly change with different power plans though. If they don't change that much then I don't care because that is more normal.

bit_user · Jan 2, 2024

rluker5 said:
I have a few old Intel atom mini pcs. They all have horrible ram latency. Maybe the low power e cores have too much role in communicating with the ram.

In Alder Lake, the only bad thing with the E-cores is that their ring bus interface ran at a lower clock speed. So, people found that the P-cores would perform better, if you disabled the E-cores in BIOS.

I think that was supposed have been fixed in Raptor Lake. And I'd be pretty surprised if it were still an issue in Meteor Lake.

ingtar33 · Jan 2, 2024

Intel has been taking it on the chin in recent laptop gens on performance, igpu, and power efficiency by AMD; I suspect this chip is an attempt at chip design with a focus on more power efficiency and better igpu performance... hoping to close the gap on 2 of the 3 metrics important for laptop chips. but this reduced IPC news has been well known for weeks in many circles in the tech community. so this isn't really a surprise. intel played some games with the test units they sent out to reviewers, the ASUS zenbooks they shipped came with ram that isn't commercially available on any zenbook on the market, ram that is blistering fast in order to hide some of the <Mod Edit> performance this chip has. some reviewers noticed this fact, and reported it, meanwhile I never saw any article on this site acknowledging this fact.

those reviewers were able to crunch the numbers with their "juiced" non-commercially available review laptops, and turn up the sad performance numbers for these chips. what was particularly noteworthy is even with those juiced samples, the intel chips were still at best tied or even losing against last gen ryzen laptops with much slower ram.

thestryker · Jan 2, 2024

ingtar33 said:
intel played some games with the test units they sent out to reviewers, the ASUS zenbooks they shipped came with ram that isn't commercially available on any zenbook on the market, ram that is blistering fast in order to hide some of the <Mod Edit> performance this chip has. some reviewers noticed this fact, and reported it, meanwhile I never saw any article on this site acknowledging this fact.

Ah you're back with this again how wonderful.

There weren't any articles on it because it's not true since Asus is selling the Zenbook in question with 7467 which is what was shipped to reviewers.
https://press.asus.com/news/asus-zenbook-14-oled-ux3405-laptop-announced/

Lenovo is using 7467 too
https://www.lenovo.com/us/en/p/lapt...apad-pro-5i-gen-9-(16-inch-intel)/len101i0092

Acer and MSI are currently selling models using 6400.

jkflipflop98 · Jan 2, 2024

Reverend_Clint said:
Former Intel contract employee here... Meteor Lake was a hassle and a half. Samsung drives wouldn't work unless you dropped the pcie speed to 1. They were always damn slow, even the desktop cpus (yes they existed).

Yeah, Greenbadges don't really get the best info.

TerryLaze · Jan 3, 2024

https://www.tomshardware.com/laptop...e-boost-core-ultra-laptops-now-more-efficient

If you guys don't read your own news how do you expect any of the guests to do so?!
If this test is still with a bad bios then it's completely useless.

bit_user · Jan 3, 2024

TerryLaze said:
https://www.tomshardware.com/laptop...e-boost-core-ultra-laptops-now-more-efficient

Thanks for the reminder. I'm still waiting for outlets who've posted up Meteor Lake reviews to revisit their measurements, in light of that. If you're aware of any post-update reviews or review-updates, please feel free to share them with us.

The Historical Fidelity · Jan 3, 2024

bit_user said:
It's not just Meteor Lake where LPDDR5 has been shown to have exceptional latency. I once saw an explanation of why, though I'm foggy on the details. Basically, the interface protocol for LPDDR5 is more complex and involves more steps.

Latency is the unfortunate trade-off to achieve lower power consumption with LPDDR. Specifically an additional step in the read/write protocol where a command is issued to synchronize the DRAM to the high speed clock, among other small differences with DDR. LPDDR5X and T variants are right around the corner that offer 8533mhz and 9600mhz respectively which should lower latency accordingly.

News Intel's Meteor Lake CPUs are slower at single-core work than previous-gen models — new benchmarks show IPC regressions vs Raptor Lake

Administrator

Reputable

Distinguished

Titan

Distinguished

Titan

Prominent

Splendid

Titan

Splendid

Judicious

Titan

Titan

Splendid

Splendid

Titan

Titan

Distinguished

Titan

Glorious

Judicious

Distinguished

Titan

Titan

Reputable

Share this page