News Apple's M2 Ultra Seemingly Can't Beat AMD and Intel Rivals

This is what I've been saying. The new Mac Pro isn't a proper replacement, for lack of cores and memory capacity.

These cores & SoCs were designed primarily for phones and laptops, respectively. Therefore, Apple optimized them for perf/W, rather than outright performance & scalability.

I've heard rumors that Apple is working on a server CPU. If true, it should feature in the next Mac Pro. If not, they could always source ARM CPUs from someone like Ampere, but I doubt they'll do it because those won't have the same ISA extensions and accelerators as their other products, and I doubt they'd do anything to hurt portability between their various machines.

Some may argue that Geekbench 5 is a synthetic benchmark that does not reflect performance in real-world applications
So, run SPECbench 2017. That's comprised of 22 real-world, multi-threaded apps. It's designed to answer exactly these sorts of questions.
 
Last edited:
This is what I've been saying. The new Mac Pro isn't a proper replacement, for lack of cores and memory capacity.

These cores & SoCs were designed primarily for phones and laptops, respectively. Therefore, Apple optimized them for perf/W, rather than outright performance & scalability.

I've heard rumors that Apple is working on a server CPU. If true, it should feature in the next Mac Pro. If not, they could always source ARM CPUs from someone like Ampere, but I doubt they'll do it because those won't have the same ISA extensions and accelerators as their other products.


So, run SPECbench 2017. That's comprised of 22 real-world, multi-threaded apps. It's designed to answer exactly these sorts of questions.
100% agree. Apple simply put does not have a desktop class, let alone workstation class CPU. Their solution of using laptop class CPUs for desktop is hopefully just filler, but if they sell to well Apple may just stick to the strategy.
 
Apple simply put does not have a desktop class, let alone workstation class CPU.
Hey, I didn't say that. I think the M2 Ultra is a very respectable desktop CPU. Just not a proper workstation powerhouse.

Their solution of using laptop class CPUs for desktop is hopefully just filler
You actually can't buy the Ultra-tier chips in a laptop. The strongest laptop CPU they offer is the Max, which is exactly half of an Ultra.

If you scroll the table over, you can see how it compares vs. i9-13900K. Granted, Intel pulls a decisive win, but generally not by much. The lone exception is multi-threaded float, where the M2 Ultra actually beats it!

Intel's performance margin also seems less impressive, when you consider the machines' relative power budgets. I don't know if there's yet power data available on the M2 Ultra, but the M1 Ultra-powered Mac Studio consumed 215 W, measured at the wall.


That's PSU-limited, meaning it should cover a full CPU + GPU load.

"Max" is defined as the maximum possible power draw based on the computer's power supply rating.

It would be interesting to know what a CPU-only load consumes.
 
Last edited:
You actually can't buy the Ultra-tier chips in a laptop. The strongest laptop CPU they offer is the Max, which is exactly half of an Ultra.

True. However, the Ultra is largely just two stitched together Max chips which is a laptop class part (largely) which is why I don't consider them desktop/workstation class. It may give respectable desktop performance, but I would still classify it as a laptop targeted component.
 
  • Like
Reactions: msroadkill612
the Ultra is largely just two stitched together Max chips
That makes it sound as if they weren't designed to be connected, which they very much were. They have a 2.5 TB/s cache-coherent interconnect, which is/was the fastest ever chip-to-chip link by a wide margin.

It wasn't a mere afterthought, but very much designed-in from the outset.

It may give respectable desktop performance, but I would still classify it as a laptop targeted component.
Yes, the Max was designed to be used (standalone) in laptops, and therefore features all the compromises you tend to find in laptop SoCs. That much is fair, and it's where I started in this thread.
 
That makes it sound as if they weren't designed to be connected, which they very much were. They have a 2.5 TB/s cache-coherent interconnect, which is/was the fastest ever chip-to-chip link by a wide margin.

It wasn't a mere afterthought, but very much designed-in from the outset.
Don't get me wrong, I'm not saying that wasn't intended or wasn't designed to work like that or that it was somehow a simple engineering feat. However, at its core it's scaling two of the same parts into a single part. Similar to dual socket compatible CPUs, which typically have design considerations for woking in that configuration as well.
 
I've heard rumors that Apple is working on a server CPU.
Yes, they are working on it, but since last few years there has been complete silence.

Kind of OT discussion:

Speaking of Apple's role play in the server space, instead of making server solutions themselves, I think it is far more realistic that Apple cooperate with Amazon, Ampere, Google and others to establish ARM as a strong alternative to x86. They all have an interest in this.

With ARM all the big players can build their own custom solutions tailored to their needs in a way they never could with x86.

But there are two parts to this puzzle. You need the server hardware but you also need popular desktop and laptop computers running ARM. If these are not prevalent, then developers will not develop sufficient experience with ARM. Linus Torvalds has been quite clear that your home computer needs to run the same hardware as your server.

But can Apple really capture the server "market space" on their own ?

Actually, technically speaking., Apple servers already existed. Apart from iCloud hosting and Apple's own data centers, Apple offered MacOS servers since at least 1996, when it sold complete server racks. But yet today, these are little more than repurposed Mac workstations.

Apple has an advantage here, in that they control the whole widget, but that only applies if BOTH their hardware and software is used. So if we want to get the full advantage of heterogeneous computing power from Apple, we actually need to run macOS in the cloud, not Linux, not FreeBSD and certainly not Windows.

If not, Apple would have to wait for industry standards supported by Linux, BSD, Windows and others to emerge and then tailor their hardware to those standards. This is unlikely to be something Apple agrees with.

Also I am skeptical that Apple would want to sell solutions not running their software.

This puts potential users in a bind. It helps that macOS is a Unix operating system. That means a lot of Linux and BSD software will run fine on it with minimal change. Yet macOS is not really optimized for server use.

Linux kernel developers are very focused on this and that drives their development efforts. macOS e.g. is highly tuned towards things like low latency to deal with things such as real time audio and video. These are use cases which matter to professionals working on video and audio. That is a deep part of the Apple DNA and heritage.

Sure we can run Linux on macOS through virtualization, but then you have also lost access to Apple specific frameworks such as Core Audio, Core ML etc which utilize custom Apple co-processors.
 
What I want to know is the power consumption during these tests. That's what Apple tends to tout with their silicon.

On an individual scale, sure it's not as exciting. But if you start bying these up and/or stick them in an enclosed space, that starts to add up. Like sure, a i9-12900K can maybe match or come a little ahead of an M1 Ultra in a Handbrake run, but it'll consume at least 50% more power getting there.
 
Speaking of Apple's role play in the server space, instead of making server solutions themselves, I think it is far more realistic that Apple cooperate with Amazon, Ampere, Google and others to establish ARM as a strong alternative to x86. They all have an interest in this.

This actually isn't that big of a leap in theory. Amazon already offers their Graviton based EC2 instances which use ARM processors. They cost less to use than their Intel and AMD counterparts and have decent performance (though not quite as good as the latest Intel/AMD performance, but match one gen back). AWS already allows allocating ARM based Macs, though these are currently largely targeted at Mobile development.

I've not tried Google's ARM class instances yet, but ARM is starting to get a foothold in cloud based deployments because they cost less and have decent enough performance for the average application.
 
What I want to know is the power consumption during these tests. That's what Apple tends to tout with their silicon.

On an individual scale, sure it's not as exciting. But if you start bying these up and/or stick them in an enclosed space, that starts to add up. Like sure, a i9-12900K can maybe match or come a little ahead of an M1 Ultra in a Handbrake run, but it'll consume at least 50% more power getting there.
I swear I heard the Mac Pro had a maximum system output of 300 watts (no add-ins), but I can't seem to find it verification of that.

However, The M1 Ultra Studio clocks in at 215 watts max total system output. I've seen claims of CPU output at 60 watts and GPU output at 100 watts so a total of 160 watts for the M1 Ultra, but it probably needs verification

An Apples to Apples comparison with Handbreak would be nice!

Mac Studio power consumption and thermal output (BTU) information

Learn about the power consumption and thermal output of Mac Studio computers.
support.apple.com
support.apple.com
 
I'd think that for many workstation tasks, the performance of the GPU is more important than the number of CPU cores.

Apple has killed support for discrete GPUs in Macs, and stopped supporting other APIs than Metal.
I'd think that for many workstation applications, those are big factors keeping them away from the Mac, not that Apple doesn't hold the CPU crown.
 
  • Like
Reactions: drajitsh
Raptor Lake is a beast. It runs over everything on a per-core basis.
Not really sure what you mean by that. Yes, its single-threaded scores beat the M2 Ultra, but lost on the multithreaded float benchmark. And they both have 24 cores.

In multithreading it has 50%+ performance with only 25% of the threads.
Well, it has 32 threads and 8 P-cores. So, you're essentially saying that you get over 50% of peak performance by running just one thread on each of the P-cores. That sounds almost believable, assuming they'd be throttled when scaling up the workload to all threads. Also, running fewer threads gives each of them a larger share of L3 cache and memory bandwidth.

BTW, you know that it's not usually regarded as a bad thing to have more linear scaling, right?
 
  • Like
Reactions: dipique
Some may argue that Geekbench 5 is a synthetic benchmark that does not reflect performance in real-world applications, which is a fair argument. But it gives a sense of what to expect from CPUs regarding their compute capabilities without any special-purpose accelerators. And this brings us to the fact that Apple's M2 SoCs have plenty of accelerators inside. Therefore, it may not need to have high clocks or extreme core count to offer great performance in many workstation-grade workloads.

Definition of the benefits of ARM based chips in a nutshell.
 
My last Apple was a ][, I guess because PCs could do everything and Macs were just for users.

But I find the modular Lego architcture of the M1 and M2 truely a very smart idea.

The ability to do a 1x, 2x and 4x CPU, GPU and RAM aggregate from a single base design saves tons of money (unfortunately only for Apple) and fits a lot of use cases, while eliminating most of those super wide external interfaces to RAM xPU saves tons of hard-to-shrink I/O die area.

The energy efficiency speaks for itsself, too, but that mostly matters where it does.

But the Mx Legos obviously can't go everywhere and do everything x86 or other ARMs do, so criticising them for failing at HPC and pacemakers is largely missing their point.

All I'd want from the Mx chips is the ability to buy them from the diversity of PC vendors with the ability to run them with Linux, Windows or whatnot (MacOS has never felt particularly attractive) at reasonable prices.

And with 192GB of RAM as an option, my main complaint about the last generation is laid to rest: this should be ok for a lot of what I do locally and the really big machines I don't want in a room with me.

And there is no reason my big CUDA rigs should go, especially if both run the same operation systems: I alway have a broad range of machines for different workloads.

MacOS, its iSlave culture and Apple prices are the real obstacles for me, not somewhat lesser benchmark scores in one or another niche.
 
Last edited:
Shilov is smart and well-informed. So he knows that Primate replaced GB5 with GB6 because they determined GB5 was flawed for MC. And he knows that if we compare their GB6 MC scores, we get the following. So why did Shilov make his comparison based on GB5 only? This doesn't seem like good tech journalism to me, and I'm used to seeing better from Shilov.

GB6 MC
i9-13900KS: 21,661
M2 Ultra: 21,531
i9-13900K: 19,932
AMD Ryzen Threadripper PRO 5995WX: 18,413

[Geekbench's chart doesn't list the Xeon W9-3495X, but it does have the two other comparators in Shilov's article: The i9-13990K, and the 5995WX. The M2 Ultra achieves higher GB6 MC scores than both of them. Indeed, the only one higher, which was not in Shilov's article, is the i9-13900KS. Granted, this is a single M2 Ultra score, not an average. We'll have more accurate info. after the review embargo lifts.]

Sources:
and

Here's a quote from John Poole about why they updated the MC test in GB6. Essentially, they determined the GB5 MC test was too easy, because it sent separate tasks to each core. I.e., the workloads it gave were "embarassingly parallel". This wouldn't properly represent what the processor would need to do when faced with a true MC task, and such tasks are the more typical workload for workstations with high core counts.

"True-to-Life Scaling​

The multi-core benchmark tests in Geekbench 6 have also undergone a significant overhaul. Rather than assigning separate tasks to each core, the tests now measure how cores cooperate to complete a shared task. This approach improves the relevance of the multi-core tests and is better suited to measuring heterogeneous core performance. This approach follows the growing trend of incorporating 'performance' and “efficient” cores in desktops and laptops (not just smartphones and tablets)."

 
Last edited:
  • Like
Reactions: bit_user
What I want to know is the power consumption during these tests. That's what Apple tends to tout with their silicon.

On an individual scale, sure it's not as exciting. But if you start bying these up and/or stick them in an enclosed space, that starts to add up. Like sure, a i9-12900K can maybe match or come a little ahead of an M1 Ultra in a Handbrake run, but it'll consume at least 50% more power getting there.

I pulled my numbers from https://arstechnica.com/gadgets/2022/03/mac-studio-review-a-nearly-perfect-workhorse-mac/3/#h1. They used system power consumption. Regarding the Intel system they used, it did use a Z690 board with a RTX 3070, but I doubt that matters too much in a CPU test.

I'm waiting tor someone to make some tests with CPUs locked to same TDP, so we can compare eg. M2 Ultra at 60W vs Ryzen 7000 at 60W.
 
>So, run SPECbench 2017.

Huh? The reason why Geekbench 5 is being used on Macs, is due to nearly complete absence of vectorized (Intel's AVX2 and AVX512) instructions sets. It took Intel 25 years (!) to develop them and these vector instructions have been the primary new features for Intel CPUs for the last 20 years.

As soon as you introduce a benchmark which leverages AVX2 for Intel CPUs, the Apple M2 does not even the enter the charts for general computing. According to this chart:


even Xiaomi Android smartphone on ARM is faster than 13900K, but that of course is nonsense.

The primary advantage of M2 are "accelerators". Instead of focusing on general computing, Apple is accelerating specific workloads for specific applications, which are known to take most time.
 
  • Like
Reactions: drajitsh
Shilov is smart and well-informed. So he knows that Primate replaced GB5 with GB6 because they determined GB5 was flawed for MC. And he knows that if we compare their GB6 MC scores, we get the following. So why did Shilov make his comparison based on GB5 only?
Maybe because Geek Bench 5 was the only score available, when the article was written. Or, because it's simply what was brought to his attention and he forgot to look for a Geek Bench 6 score?

GB6 MC
i9-13900KS: 21,661
M2 Ultra: 21,531
i9-13900K: 19,932
AMD Ryzen Threadripper PRO 5995WX: 18,413
Thanks for the info.
 
  • Like
Reactions: dipique