News I tested Intel's Meteor Lake CPUs on AI workloads: AMD's chips sometimes beat them

Status
Not open for further replies.

xyster

Distinguished
Dec 6, 2005
233
8
18,695
More efficient power usage is very welcomed within a laptop, as was the value gained by hardware video acceleration. And just like with HWA, freeing up more CPU capacity for other tasks is going to gain traction in some communities, like live streaming.

Imagine, a green screen , blurred background, or digital background effect on your video streams that adds no additional CPU load? The AV1 encoding would also be handled by the chip, leaving the CPU wide open to keep OBS from crashing.

I always figured the GPU would be the main horse in handling AI workloads, but that perception is changing. It paints a concerning picture for perhaps companies like Nvidia, who might start to lose releveance in the AI space as they did with crypto. The era of being able to game, mine, and train on a single desktop gaming GPU is slowly fading away. The entire China sanctions thing might only accelerate this.

Anyways, I think Intel has done quite well steering its ship in the right direction of late, and OpenVINO is finally maturing into something that's a bit of a blessing for all PC users. If OpenCL got traction instead of CUDA, maybe we'd be here already long ago. I'll wait for more comprehensive tests, but I'm excited to buy a new laptop -- no more 14nm+++!
 

domih

Reputable
Jan 31, 2020
205
183
4,760
Readers can also refer to https://www.phoronix.com/review/intel-core-ultra-7-155h-linux

From the article:

<<In fact, out of 370 benchmarks run on both the Ryzen 7 7840U and Core Ultra 7 155H focused strictly on the processor performance, the Ryzen 7 7840U was the best performer 80% of the time!

When taking the geometric mean of all 370 benchmark results, the Ryzen 7 7840U enjoyed a 28% lead over the Intel Core Ultra 7 155H in these Linux CPU performance benchmarks. This was all the while the Ryzen 7 7840U was delivering similar or lower power consumption than the Core Ultra 7 155H with these tests on Ubuntu 23.10 with the Linux 6.7 kernel at each system's defaults. The Core Ultra 7 155H also had a tendency to have significantly higher power spikes than the Ryzen 7 7840U.>>


Chart: https://www.phoronix.com/benchmark/...-ryzen-7-7840u-linux-benchmarks/result-1.svgz

As the French say: "y a pas photo" (kind of slang expression which means: no need to use the photo to distinguish the winner on the finish line)

EDIT
Found after I posted this, Tomshardware refers to the Phoronix article there: https://www.tomshardware.com/pc-com...u-and-intel-core-ultra-7-155h-go-head-to-head

EDIT
On the other hand, the INTEL 7 155H with the new iGPU based on Arc Graphics shines and passes AMD RDNA 3 in a majority of benchmarks.
Ref: https://www.phoronix.com/review/meteor-lake-arc-graphics
 
Last edited:

usertests

Distinguished
Mar 8, 2013
928
839
19,760
How much power does the NPU consume? Isn't it on the low power SOC?
I'm wondering the same about these NPUs, although I think we can safely assume it's low and more efficient.

It's worth noting that Hawk Point has its "NPU" apparently clocked up to 60% higher than Phoenix for about 40% more performance. It's probably sacrificing a bit of efficiency for more performance.

EDIT
On the other hand, the INTEL 7 155H with the new iGPU based on Arc Graphics shines and passes AMD RDNA 3 in a majority of benchmarks.
Ref: https://www.phoronix.com/review/meteor-lake-arc-graphics
Some of that is very obscure. It will be interesting to see how gaming actually shakes out.

Also, only Meteor Lake-H gets 8 Xe cores (128 EUs), while Meteor Lake-U will only come with 4 Xe cores (64 EUs). So comparing Intel's "U" to AMD's "U" won't go so well.
 
Last edited:
  • Like
Reactions: bit_user

ThomasKinsley

Notable
Oct 4, 2023
385
384
1,060
I'm split on this. On the one hand although Meteor Lake makes some gains it comes up short in expected battery life and core performance. On the other hand the NPU might eventually mature and developers could leverage its use in many more applications after they learn how to code for it.
 

bit_user

Titan
Ambassador
The OpenVINO plugins specifically recognized the Meteor Lake laptops' NPU and presented it as a device option. However, they didn't recognize the Ryzen chips' Ryzen AI accelerator and present it as an option. So we don't know if the AMD laptops, which we ran in CPU mode, utilized their AI accelerators at all during testing.
No, if you didn't see it, then Ryzen AI definitely wasn't used by the Ryzen 7840U (i.e. Phoenix). At best, the OpenVINO CPU backend was simply using AVX-512 on the CPU cores (at worst, just AVX2).

Based on specs of both Meteor Lake's NPU and Phoenix' Ryzen AI, I get the sense that they are comparable and AMD might even have a slight edge. I look forward to a proper comparison between them. I hope AMD can facilitate this (e.g. by adding the proper support to OpenVINO).
 
Last edited:
  • Like
Reactions: JarredWaltonGPU

rluker5

Distinguished
Jun 23, 2014
901
574
19,760
I'm wondering the same about these NPUs, although I think we can safely assume it's low and more efficient.

It's worth noting that Hawk Point has its "NPU" apparently clocked up to 60% higher than Phoenix for about 40% more performance. It's probably sacrificing a bit of efficiency for more performance.


Some of that is very obscure. It will be interesting to see how gaming actually shakes out.

Also, only Meteor Lake-H gets 8 Xe cores (128 EUs), while Meteor Lake-U will only come with 4 Xe cores (64 EUs). So comparing Intel's "U" to AMD's "U" won't go so well.
Intel's MTL "U" has a max TDP of 15w while AMD's "U" has a max TDP of 28w. Not as comparable as the same letter suggests. You would have to compare by TDP.
 

bit_user

Titan
Ambassador
@apiltch , the labels in this graphic got truncated, preventing us from seeing whether some of the tests used CPU, GPU, NPU, etc.

MaoqQkJhzQyVtoSGXfkSuZ.png

 

bit_user

Titan
Ambassador
I always figured the GPU would be the main horse in handling AI workloads, but that perception is changing. It paints a concerning picture for perhaps companies like Nvidia, who might start to lose releveance in the AI space as they did with crypto.
The iGPU in these Meteor Lake processors lacks the XMX cores (analogous to Nvidia's Tensor cores), seen in Intel's dGPUs. I think the jury is still out on whether GPUs can remain relevant, in the AI race.

Anyway, Nvidia, AMD, and Intel have all created separate architectures and product lines for their high end compute & AI products. They don't much resemble consumer GPUs, any more. Intel is also pursuing a parallel track of improving their Gaudi series of dedicated AI accelerators.

OpenVINO is finally maturing into something that's a bit of a blessing for all PC users. If OpenCL got traction instead of CUDA, maybe we'd be here already long ago.
From what I've seen, OpenVINO's GPU backend actually uses OpenCL. If AMD would beef up their OpenCL support, perhaps it would "just work" on their GPUs, as well.
 
  • Like
Reactions: prtskg

bit_user

Titan
Ambassador
Intel's MTL "U" has a max TDP of 15w while AMD's "U" has a max TDP of 28w. Not as comparable as the same letter suggests. You would have to compare by TDP.
As usual, the picture is more complex than that.

  • Processor Base Power 15 W
  • Maximum Turbo Power 57 W
  • Minimum Assured Power 12 W
  • Maximum Assured Power 28 W

Source: https://ark.intel.com/content/www/u...-processor-165u-12m-cache-up-to-4-90-ghz.html

I think it will turn out to be implementation-specific, and looking at what sustained power a specific laptop (or mini-PC) utilizes.
 
  • Like
Reactions: usertests
EDIT
On the other hand, the INTEL 7 155H with the new iGPU based on Arc Graphics shines and passes AMD RDNA 3 in a majority of benchmarks.
Ref: https://www.phoronix.com/review/meteor-lake-arc-graphics
One thing to note in that test is that the Intel laptop is running LPDDR5-6600 or 7500 whereas the Ryzen has DDR5-5600. The additional bandwidth will help a lot in the games. Interestingly Intel is usually leading by a decent margin, in power and performance, in the more synthetic style tests like 3D Mark. However, in gaming they are usually neck and neck in power and performance.
 

bit_user

Titan
Ambassador
One thing to note in that test is that the Intel laptop is running LPDDR5-6600 or 7500 whereas the Ryzen has DDR5-5600. The additional bandwidth will help a lot in the games.
The real performance difference is a lot bigger than what that would suggest. It's not just a matter of more memory bandwidth. You can most easily see this in the comparison against the Gen 12 part.
 
  • Like
Reactions: prtskg
The real performance difference is a lot bigger than what that would suggest. It's not just a matter of more memory bandwidth. You can most easily see this in the comparison against the Gen 12 part.
The Gen 12 part has at best LPDDR5-5200 RAM but might be running LPDDR4X-4267 or even DDR4-3200. The problem is Phoronix only gives us the amount of RAM in each system and not the bandwidth so we are speculating based on minimal data. Also the 155H has at least 128 execution cores on the newer uArc compared to 96 for the Alder Lake part. The combination of extra cores and bandwidth makes a MASSIVE difference in performance. Do note that doesn't take away from the massive gains Intel has made in the iGPU department. They finally decided that the iGPU shouldn't just be an after thought like it was for so many years.
 

bit_user

Titan
Ambassador
The Gen 12 part has at best LPDDR5-5200 RAM but might be running LPDDR4X-4267 or even DDR4-3200.
You can find the exact laptop model he compared against, in prior articles. It's this one:
Even with LPDDR4X-4267, that alone, still doesn't explain the whole performance discrepancy.

the 155H has at least 128 execution cores on the newer uArc compared to 96 for the Alder Lake part. The combination of extra cores and bandwidth makes a MASSIVE difference in performance.
The way bottlenecks work means your performance is limited by whichever is the slower aspect. Having more bandwidth and more shaders doesn't give you a multiplicative improvement.
 

rluker5

Distinguished
Jun 23, 2014
901
574
19,760
I've got a laptop with a 7700hq and a 1050ti. Nothing special. I set the dgpus/igpu use to auto where the laptop uses the igpu unless Windows selects the dgpus for heavier work like games.

I think this is how MTL works for CPU and tGPU. Where the SOC can handle loads like those old atom tablets at the same power draw and if it needs more wakes up the other components. Intel has a video where they claim 3w for a teams video call. The SOC may likely only be able to handle 1080p60 video before waking up other parts. It would be good to know the SOCs limits.

If most mobile device use per time is light, then the power draw of MTL vs AMD is no longer Apple's to apples comparison. And you can no longer extrapolate battery life from 100% use stress tests.
 
The way bottlenecks work means your performance is limited by whichever is the slower aspect. Having more bandwidth and more shaders doesn't give you a multiplicative improvement.
I also said that the 155H has the newer uArch. No matter what the 155H has at least 33% more shaders AND 55% more bandwidth. Don't forget that the core clock on the 155H is 55% higher as well. Add all those things up and normally you get a huge increase in performance. I also said that Intel has made massive gains in their iGPUs. Overall the 155H's iGPU is a vast improvement over the iGPU from 2 generations ago. They were able to aleviate some of the bottlenecks that limited performance.
 
Since these are mobile parts, power consumption levels are even more important than they are in the desktop space. How do the candidates compare with regard to power use? I know that craptop CPUs tend to be difficult to measure in that regard because of their variable TDP and changes made to them by the craptops' OEMs. I was just wondering if you had a ballpark figure because nothing is mentioned at all in the article (which is odd for a mobile CPU comparison).
 

bit_user

Titan
Ambassador
I also said that the 155H has the newer uArch. No matter what the 155H has at least 33% more shaders AND 55% more bandwidth. Don't forget that the core clock on the 155H is 55% higher as well.
But the geomeans shown on Phoronix is 61.8% higher, on graphics, and 107.2% higher on compute. That says the microarchitecture improvements X additional shaders X clock speed should've been at least that high. It's probably bottlenecked by memory a lot more now than before.

Add all those things up and normally you get a huge increase in performance.
clock speed, shader count, and IPC changes are generally multiplicative. When it comes to memory bandwidth, you're either bottlenecked by it or you're not. When you're not, it doesn't make much difference. If you are, then you typically see only up to that amount of improvement. Nowhere in that formulation is there any addition.
 

bit_user

Titan
Ambassador
Since these are mobile parts, power consumption levels are even more important than they are in the desktop space. How do the candidates compare with regard to power use?
Since we're pretty much derailed from talking about the NPU and basically just talking about graphics, at this point, you can see power figures on Meteor Lake's iGPU here:

Unlike Phoronix' review of Meteor Lake's CPU performance, the GPU is actually very efficient. That also happens to be made on TSMC N5, whereas the CPU tile is made on Intel 4.

In the CPU tests, the Ryzen 7840U is typically using less power, while delivering better performance:
 

apiltch

Editor-in-Chief
Editor
Sep 15, 2014
245
139
18,870
@apiltch , the labels in this graphic got truncated, preventing us from seeing whether some of the tests used CPU, GPU, NPU, etc.
MaoqQkJhzQyVtoSGXfkSuZ.png

I apologize. I thought I fixed that problem on all the charts before publishing. I think it happened on one or two charts anyway. I have fixed it (I hope) as of 11:36 am ET on Thursday (give it like 15 minutes for the cache to clear for sure).

Also here are all the charts attached: 1703176651881.png1703176642926.png1703176630474.png

1703176623957.png
 
  • Like
Reactions: bit_user

Alpha_Lyrae

Reputable
Nov 13, 2021
28
26
4,560
But the geomeans shown on Phoronix is 61.8% higher, on graphics, and 107.2% higher on compute. That says the microarchitecture improvements X additional shaders X clock speed should've been at least that high. It's probably bottlenecked by memory a lot more now than before.


clock speed, shader count, and IPC changes are generally multiplicative. When it comes to memory bandwidth, you're either bottlenecked by it or you're not. When you're not, it doesn't make much difference. If you are, then you typically see only up to that amount of improvement. Nowhere in that formulation is there any addition.

It doesn't always come down to simply memory bandwidth. Overall package power, sustained CPU clocks when iGPU is in use, and even Intel's interconnect bandwidth/speed can all play a part here. I'm not sure how Intel has laid out the iGPU tile, but I'd assume the memory intensive pipelines (like ROPs) in iGPU will actually be in SoC tile and adjacent to memory controllers+PHYs. Display and media engines also seem like a good fit for SoC tile, so iGPU tile is mostly geometry+compute.

1080p is often draw call limited by CPU submits, moreso on Ryzen than Core Ultra esp. on legacy APIs like DX9/DX11/OGL, so there's that too. Core clocks, then, can become a limiting factor if iGPU is using a majority of the package power.

Intel, like AMD, is definitely using aggressive memory compression for iGPU pipelines. Everything stays compressed to reduce power and bandwidth use. Expensive decompresses are only done as a last resort when destination is unknown, but even these are becoming less common.
 

bit_user

Titan
Ambassador
I'm not sure how Intel has laid out the iGPU tile, but I'd assume the memory intensive pipelines (like ROPs) in iGPU will actually be in SoC tile and adjacent to memory controllers+PHYs.
No. Take a read through this:

Display and media engines also seem like a good fit for SoC tile, so iGPU tile is mostly geometry+compute.
They're there, but probably just so they can power down the GPU tile when playing videos.

1080p is often draw call limited by CPU submits, moreso on Ryzen than Core Ultra esp. on legacy APIs like DX9/DX11/OGL, so there's that too.
By and large, the benchmarks aren't using those legacy APIs.

Intel, like AMD, is definitely using aggressive memory compression for iGPU pipelines. Everything stays compressed to reduce power and bandwidth use. Expensive decompresses are only done as a last resort when destination is unknown, but even these are becoming less common.
That's nonsense. TMUs do on-the-fly decompression, always.
 
I think this is how MTL works for CPU and tGPU. Where the SOC can handle loads like those old atom tablets at the same power draw and if it needs more wakes up the other components. Intel has a video where they claim 3w for a teams video call. The SOC may likely only be able to handle 1080p60 video before waking up other parts. It would be good to know the SOCs limits.
The SoC includes the media engine so the limit is whatever that has as an limit.
Quicksync can handle 4k for years now so it should do that easily.
It's all hardware accelerated so the low power cores don't even factor into it other than being needed to keep windows from freezing up.
 

usertests

Distinguished
Mar 8, 2013
928
839
19,760
Intel's MTL "U" has a max TDP of 15w while AMD's "U" has a max TDP of 28w. Not as comparable as the same letter suggests. You would have to compare by TDP.
Nice try. 165U, 155U, 135U, and 125U all have a max turbo TDP of 57W, and maximum assured power of 28W (what they used to call "configurable TDP-up"). The "U" chips from both companies can be directly compared. You have to check what specific devices are running out of the box, can be changed to, and are capable of cooling, but these are obviously the same class of chip, except for Intel kneecapping the graphics and potentially using more power.

Now Intel has two more SKUs: the Core Ultra 7 164U and Core Ultra 5 134U. These are probably the models that have LPDDR5/x memory integrated on the package. They have a 9W TDP and 15W cTDP-up, but still have a max turbo TDP of 30W.
 
  • Like
Reactions: bit_user
Status
Not open for further replies.