A recent test of Intel's Core Ultra 7 155H reveals that Meteor Lake may have regressed in IPC compared to Raptor Lake. It's a limited view of performance, but indicates Intel may have prioritized other aspects of the chip's design.
No, it wasn't. That happens to be one of its big selling points, but I pretty sure their top priority was to improve efficiency & battery life. They're probably worried about competition from Apple, Qualcomm, and AMD on that front.
While raw CPU performance is likely a lower priority for Intel as it focuses on AI and graphics performance
The only variables can be power & thermal thresholds. If the author divided by max boost clock, and the CPU wasn't allowed to stretch its legs very much or for very long, then testing on a laptop with more generous power limits and better cooling could produce better results.
Edit: Using Google Translate, the original webpage has this to say about heat & power:
"The CPU frequency is all default, and will be marked when the power consumption and heat dissipation performance are low enough to affect single-thread performance."
Still, there could be a time limit on boosting. Otherwise, sounds like the power & thermal parameters of the test platform probably shouldn't have much effect.
BTW, in addition to the 155H's "P-core" entry, there's another weird entry for it, called "LP P-core". I have no idea what that means, because I think there's only one set of P-cores and that result is still well above the 155H's E-core result.
Interestingly, its E-core scored 5.55 (absolute), while the i7-13700H's E-core scored 6.0. This is notable because Intel claimed the Crestmont E-cores actually did improve IPC over Gracemont. Normalizing by the specified "Efficient-core Max Turbo Frequency", which is 3.8 GHz for the 155H and 3.7 GHz for the i7-13700H, yields IPC scores of 1.46 and 1.62 for Meteor Lake and the previous-gen CPU, respectively. So, far from an improvement, it looks like quite a regression - even slightly more than the P-cores (11.1% vs. 9.1%)!
He measured the 155H's LP E-core at 3.15, but remember that its frequency is capped at just 2.5 GHz. It should have nearly identical IPC to the regular E-cores, but when I use the specified frequencies, I get 1.26 and 1.46 for the LP and regular E-cores, respectively.
I wouldn't go that far. Not least because its perf/W indeed appears to have improved. Being a laptop part, that + its low idle power are definitely relevant to its success.
I'd love to see some in-depth analysis of what's going on. I think it's mighty suspicious that Intel apparently didn't send out any review samples, which is probably why Toms has yet to review it. Phoronix, whom they've been quoting in other articles, bought his own Meteor Lake laptop.
I doubt it. AMD disaggregated its cores and memory controllers, using far more primitive interconnect technology, and didn't seem to suffer much from it. The memory controller is located in the SoC die, so it's just one hop away from the CPU tile.
LPDDR5 does have meaningfully higher latency, though. David Huang was wise to point that out. He notes several other CPUs where LPDDR5 seems to have a notable performance impact.
Former Intel contract employee here... Meteor Lake was a hassle and a half. Samsung drives wouldn't work unless you dropped the pcie speed to 1. They were always damn slow, even the desktop cpus (yes they existed).
Rubbish. I'm sure they care enough to try and avoid regressions, even if they weren't planning on making outright improvements. Pretty outlandish claim, right there.
The next bit is important, and perhaps we should have emphasized this more but I didn't feel it was necessary: "It also features Foveros technology and multiple tiles manufactured on different processes."
We don't have a good way to measure "pure IPC" as caches, interfaces, etc. all affect instruction throughput. The cores on Meteor Lake could be identical or even improved over Raptor Lake, but all of the changes to implement the new multi-tile packaging could reduce the final real-world throughput.
Or if you want to read the statement in a different way: Intel seems to have cared more about getting Foveros working and proving it as a viable approach over just pure CPU IPC.
Thanks for considering my post and taking the time to reply. I always appreciate your dedication to the work and to us, Jarred. Also, Happy New Year!
: )
The next bit is important, and perhaps we should have emphasized this more but I didn't feel it was necessary: "It also features Foveros technology and multiple tiles manufactured on different processes."
I understand that, but I'm also well aware that this isn't Intel's first chiplet/tile rodeo. They had Lakefield, Ponte Vecchio, and Sapphire Rapids. By now, they should have at least as much experience with disaggregated architectures as AMD had, when it did Zen 3 (i.e. Ryzen 5000).
For a while, now, I think industry standard practice has basically been to use a benchmark like SPEC2006, SPEC2017, or GeekBench (CPU) and normalize by clock speed. It's not a strict count of the instruction rate, but different instructions have different throughputs and latencies anyhow. The key point is just to look at clock-normalized performance, as a measure of microarchitecture width and the efficacy of things like branch predictors and prefetchers.
If you really wanted to try and isolate different aspects of the microarchitecture, one could even do this with microbenchmarks designed to stress different parts of the CPU. And some people do.
The cores on Meteor Lake could be identical or even improved over Raptor Lake, but all of the changes to implement the new multi-tile packaging could reduce the final real-world throughput.
His data provides a couple interesting points of comparison.
Let's consider the Ryzen 5 3600 and Ryzen 7 4800U. Both are Zen 2, but the first is disaggregated and the latter is monolithic. Their raw scores were 6.57 and 5.95. As they both feature turbo frequencies of 4.2 GHz, we can just directly compare the scores, with the chiplet-based CPU coming in at 10.4% faster! Maybe there's some difference in boost behavior, but I found a review which confirms the 4800U can stay under 15W, with one thread at 4.2 GHz, so I'm assuming that was what happened. If they did both sustain 4.2 GHz, then leaves only cache & perhaps memory latency as the distinguishing factors (both were tested using DDR4-3200). The Ryzen 5 3600 has 32 MB of L3 cache, while the Ryzen 7 4800U has only 8 MB. However, since the 3600 is subdivided into two CCX's, a single thread probably has access only to 16 MB of L3.
I think the main takeaway here is that there are some factors bigger than chiplet vs. monolithic.
Or if you want to read the statement in a different way: Intel seems to have cared more about getting Foveros working and proving it as a viable approach over just pure CPU IPC.
I don't believe Intel would've rolled out such a technology for its bread-and-butter products that it didn't think was ready for prime time. The stakes are too high for them just to "push it out the door", before it's ready. Intel has taken its time to migrate to chiplets, and I assume that was just so they could perfect the technology. Sapphire Rapids showed they can deliver solid IPC using chiplets, and Meteor Lake is a whole generation beyond that and actually has far fewer tiles.
I'd encourage folks to read some of what David Huang said about LPDDR5, because it stands out as one of the more plausible explanations for the regression. I don't understand a word of Chinese, but Google Translate seems to work pretty well on the page.
"Zen 3 desktop performance is 12% higher than the strongest Zen 3 mobile (DDR5), and more than 30% higher than the strongest LPDDR5 mobile performance"
"when Alder Lake mainstream SoC i5-1240P is paired with LPDDR5-4800 The overall performance is only slightly better than the desktop Skylake++."
Doesn't Meteor Lake have a couple of extra-small extra-weak (extra-misleading to customers) "platform" e-cores thrown in the mix?
Meteor lake is like a weird big.LITTLE.TINY architecture as Intel tries everything to make their core count look larger, isn't it?
Maybe they're dragging down the averages, and I can't imagine today's schedulers are that great at simultaneously managing 3 different types of cores in a single processor, considering they still aren't the best at managing 2.
After seeing the Hardware Canucks preview I'm not really trusting any early release hardware results. Not to mention this reviewers spec numbers appear to be off (and I don't mean just the raw numbers, but the differences between them) when compared to those AnandTech runs which they standardized a long while ago when Andrei was there.
After CES and when there are a lot more results to compare I think we'll get a better idea. Until then assumptions based on what little is available now are nebulous at best.
From everything Intel has indicated IPC on the P-cores should be basically the same as RPL its the E-cores we'll want to take note of.
Just to follow up on the LPDDR5 latency issue, NotebookCheck tested the memory latency of the Core Ultra 7 155H and got 147.1 ns. Their testing of an i7-13700H of the same notebook brand (but different model line), showed a memory latency of 97.7 ns. That's 50.6% higher, for Meteor Lake (lower is better)!
Doesn't Meteor Lake have a couple of extra-small extra-weak (extra-misleading to customers) "platform" e-cores thrown in the mix?
Meteor lake is like a weird big.LITTLE.TINY architecture as Intel tries everything to make their core count look larger, isn't it?
The two LP E-cores reside in the SoC tile, so they can power down the CPU tile, when at or near idle. I think they're not about trying to inflate core count, nor should they come into play in single-threaded performance.
In fact, if you look at the original article (or just expand the image, in my above post), you'll see that he actually characterized the 3 different core types, separately.
Maybe they're dragging down the averages, and I can't imagine today's schedulers are that great at simultaneously managing 3 different types of cores in a single processor, considering they still aren't the best at managing 2.
For a single-threaded test, it should make no difference. However, given that he explicitly tested the different core types, I'd bet he explicitly set the affinity and didn't even rely on the OS to pick a P-core, automatically.
Just to follow up on the LPDDR5 latency issue, NotebookCheck tested the memory latency of the Core Ultra 7 155H and got 147.1 ns. Their testing of an i7-13700H of the same notebook brand (but different model line), showed a memory latency of 97.7 ns. That's 50.6% higher, for Meteor Lake (lower is better)!
So the thing with Foveros and multi-tile stuff is it has clearly been giving Intel a lot of trouble streamlining the process. I think Meteor Lake finally proves it's ready for the mainstream market, but Ponte Vecchio and Sapphire Rapids were both super delayed from Intel — and they're also more complex, which certainly didn't help.
Anyway, that big jump in latency to me says that caching and other aspects are absolutely having an impact. And that's expected at some level, because there are four tiles: Compute, SOC, GPU, and IO. So now for compute to access memory, it goes off chip via EMIB or Foveros or whatever to the IO tile, and from there potentially direct to the IO tile but also potentially through the SOC tile first. The increase in latency makes me think it might be doing two hops (i.e. from compute to SOC then to IO).
This is why I say that measuring IPC via real-world applications is difficult, because everything has a different way of hitting the various caches and main memory. I don't know enough about SPECint 2017 to say how it's affected by those aspects.
Looking at the source article, I'm also concerned that the testing was simply done with the thought/hope that clocks would be whatever the specified boost clock says, rather than being measured for the tests. Because I know that motherboards and BIOS settings can absolutely result in CPUs not running at their supposed single core boost clocks.
In short, this is interesting data, but not something I would refer to as being fully reliable in terms of how it was produced. Dividing by potentially estimated clocks could skew various "IPC" figures by 5 to 10 percent quite easily.
If it wasn't about inflating core count, then Intel would be putting a stop to anyone reductively listing a 6P+8E+2e processor as a "16 cores" or 2P+8E+2e as "12 cores". But they didn't take a stand against combining core counts in the last 2 generations. So I think they like it when retail consumers are misled or simply get confused into thinking all the cores are the same old "big" cores, especially when compared to AMD processors with all big cores.
or a single-threaded test, it should make no difference. However, given that he explicitly tested the different core types, I'd bet he explicitly set the affinity and didn't even rely on the OS to pick a P-core, automatically.
Maybe not the OS interfering, but that still begs the question of "what is a LP P-core?"
Even then, Intel new thread director is a black box in itself. There could be some conflict happening between it and the benchmark that we don't understand yet.
It's not just Meteor Lake where LPDDR5 has been shown to have exceptional latency. I once saw an explanation of why, though I'm foggy on the details. Basically, the interface protocol for LPDDR5 is more complex and involves more steps.
This ASUS Zenbook S 13 OLED features an i7-1355U with LPDDR5 and was benchmarked at 120.1 ns.
Well, that's just one datapoint, but it both supports the idea that LPDDR5 is inherently higher latency, while also allowing for the possibility that something about Meteor Lake's multi-die architecture is having a further impact on latency.
there are four tiles: Compute, SOC, GPU, and IO. So now for compute to access memory, it goes off chip via EMIB or Foveros or whatever to the IO tile, and from there potentially direct to the IO tile but also potentially through the SOC tile first. The increase in latency makes me think it might be doing two hops (i.e. from compute to SOC then to IO).
This is why I say that measuring IPC via real-world applications is difficult, because everything has a different way of hitting the various caches and main memory. I don't know enough about SPECint 2017 to say how it's affected by those aspects.
You just pick a benchmark and run it. As long as you know the clock speed, you can just divide it out. Or, sometimes, people go out of their way to force multiple different CPUs they're comparing all to run at the same clock speed.
BTW, I've found that IPC tends to stay relatively flat, over the range of different clock speeds. Here's what I found from some of Chips & Cheese's data:
So, it seems just dividing it out should tend to give fairly similar results as running the CPU at a fixed speed towards the upper end of its frequency range.
The SPEC 2017 benchmarks are composed of applications. The integer portion has 10 different ones, while floating point has 13. You can see the list, here:
Looking at the source article, I'm also concerned that the testing was simply done with the thought/hope that clocks would be whatever the specified boost clock says, rather than being measured for the tests. Because I know that motherboards and BIOS settings can absolutely result in CPUs not running at their supposed single core boost clocks.
In short, this is interesting data, but not something I would refer to as being fully reliable in terms of how it was produced. Dividing by potentially estimated clocks could skew various "IPC" figures by 5 to 10 percent quite easily.
Yeah, it'd be nice to have more visibility into how the testing was done. In particular, if he had traces of the clock speed, to assure us of his methodology.
Even then, Intel new thread director is a black box in itself. There could be some conflict happening between it and the benchmark that we don't understand yet.
Intel's Thread Director doesn't play an active role in thread scheduling. It merely collects statistics, which can be used to inform the kernel on where different threads are best-placed. Therefore, if you set the core affinity, the Thread Director can't do anything to override that.
One guess I have is that he ran tested a P-core with the system in some kind of "Low Power" mode. If so, it's sad that he used the same notation as the LP E-core, which is an actual thing an distinct from the other E-cores.
BTW, the LP E-cores aren't "tiny", as you called them. They're supposed to have the same Crestmont microarchitecture as the other E-cores, but they reside on the TSMC N6 SoC tile and have their turbo clock speed limited to 2.5 GHz, instead of the other E-cores' 3.9 GHz limit. Therefore, it's surprising to see them providing worse IPC. If they're truly the same microarchitecture, then their IPC should be almost identical the other E-cores.
I have a few old Intel atom mini pcs. They all have horrible ram latency. Maybe the low power e cores have too much role in communicating with the ram.
Or maybe since MTL has got 3 different types of cores and can shut some of them down or wake them up depending on whatever the agreement between the thread director and Windows says, it is a complicated mess. And there may be a significant difference in both power consumption and performance depending on the Windows power plan settings. And there may also be a noticeable difference depending on whatever Windows updates you get down the line.
The whole setup of leaving a whole section of the CPU down is pretty new. There may be some changes down the line. In a review I would like to hear if things significantly change with different power plans though. If they don't change that much then I don't care because that is more normal.
I have a few old Intel atom mini pcs. They all have horrible ram latency. Maybe the low power e cores have too much role in communicating with the ram.
In Alder Lake, the only bad thing with the E-cores is that their ring bus interface ran at a lower clock speed. So, people found that the P-cores would perform better, if you disabled the E-cores in BIOS.
I think that was supposed have been fixed in Raptor Lake. And I'd be pretty surprised if it were still an issue in Meteor Lake.
Intel has been taking it on the chin in recent laptop gens on performance, igpu, and power efficiency by AMD; I suspect this chip is an attempt at chip design with a focus on more power efficiency and better igpu performance... hoping to close the gap on 2 of the 3 metrics important for laptop chips. but this reduced IPC news has been well known for weeks in many circles in the tech community. so this isn't really a surprise. intel played some games with the test units they sent out to reviewers, the ASUS zenbooks they shipped came with ram that isn't commercially available on any zenbook on the market, ram that is blistering fast in order to hide some of the <Mod Edit> performance this chip has. some reviewers noticed this fact, and reported it, meanwhile I never saw any article on this site acknowledging this fact.
those reviewers were able to crunch the numbers with their "juiced" non-commercially available review laptops, and turn up the sad performance numbers for these chips. what was particularly noteworthy is even with those juiced samples, the intel chips were still at best tied or even losing against last gen ryzen laptops with much slower ram.
intel played some games with the test units they sent out to reviewers, the ASUS zenbooks they shipped came with ram that isn't commercially available on any zenbook on the market, ram that is blistering fast in order to hide some of the <Mod Edit> performance this chip has. some reviewers noticed this fact, and reported it, meanwhile I never saw any article on this site acknowledging this fact.
Former Intel contract employee here... Meteor Lake was a hassle and a half. Samsung drives wouldn't work unless you dropped the pcie speed to 1. They were always damn slow, even the desktop cpus (yes they existed).
If you guys don't read your own news how do you expect any of the guests to do so?!
If this test is still with a bad bios then it's completely useless.
Thanks for the reminder. I'm still waiting for outlets who've posted up Meteor Lake reviews to revisit their measurements, in light of that. If you're aware of any post-update reviews or review-updates, please feel free to share them with us.
It's not just Meteor Lake where LPDDR5 has been shown to have exceptional latency. I once saw an explanation of why, though I'm foggy on the details. Basically, the interface protocol for LPDDR5 is more complex and involves more steps.
Latency is the unfortunate trade-off to achieve lower power consumption with LPDDR. Specifically an additional step in the read/write protocol where a command is issued to synchronize the DRAM to the high speed clock, among other small differences with DDR. LPDDR5X and T variants are right around the corner that offer 8533mhz and 9600mhz respectively which should lower latency accordingly.