News Apple's M4 Max is the single-core performance king in Geekbench 6 — M4 Max beats the Core Ultra 9 285K and Ryzen 9 9950X

Honestly I could not give an f about having the fastest machine, what I do care about is getting out my laptop knowing the battery won’t be flat, I consistently gravitate to my m2 MacBook Air for this reason, every dell, HP and Microsoft laptop I have ever owned the power management is just hopeless.
 
I like that Geek Bench is cross platform and all, but I hate what a black box it is. Who the heck knows exactly what it tests, or how well-optimized it is for different ISAs? Or how it's even compiled?

I wish more people would use SPECbench. At least that's based on well-known, open source software packages. ChipsAndCheese recently started running it (perhaps partly because Anandtech is now defunct), but they have yet to benhcmark any Apple hardware.

The article said:
The M4 Max handily keeps up even in multi-core at a fraction of the power
Geek Bench multi-core is trash. I don't know what it does, but you can clearly see that it scales very poorly with core count. Much worse than most multi-threaded software people usually benchmark.
 
I like that Geek Bench is cross platform and all, but I hate what a black box it is. Who the heck knows exactly what it tests, or how well-optimized it is for different ISAs? Or how it's even compiled?

I wish more people would use SPECbench. At least that's based on well-known, open source software packages. ChipsAndCheese recently started running it (perhaps partly because Anandtech is now defunct), but they have yet to benhcmark any Apple hardware.


Geek Bench multi-core is trash. I don't know what it does, but you can clearly see that it scales very poorly with core count. Much worse than most multi-threaded software people usually benchmark.
Primate Labs publishes all these details:
https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf
This doc ^ describes all the test cases, which ISA extensions are used, how the scores are weighted, why multi-core scales the way it does etc.

Geekbench is way overhated online. It’s a fairly reliable way to measure the relative performance of chips. It just offends fanboys who want to see bigger numbers for their brand(s) of choice.

Apple Silicon scores highly in Geekbench simply because it’s fast. There’s no sleight of hand - e.g. Apple’s designs are particularly strong for compiling so they score highly in the clang test 🙂
 
  • Like
Reactions: bit_user
Primate Labs publishes all these details:
Thanks, I'll have a look.

It’s a fairly reliable way to measure the relative performance of chips.
Maybe, once I have a better understanding of what it's actually measuring.

IMO, what's really needed is to demonstrate its predictive power by showing how it correlates to the performance of other apps.

It just offends fanboys who want to see bigger numbers for their brand(s) of choice.
Not in my case. I've been accused on here of being an Apple fan, just because I respect their tech. I would never buy one, but that's a different matter.
 
You can't determine the relative performance of two CPUs if they are running vastly different operating systems and the software was compiled with different compilers. Compilers and OS kernels are hugely important for performance. There's often a huge difference in performance between Windows and Linux running on the same hardware. There are even significant performance differences between different Linux distros and different versions of Windows.
 
Thanks, I'll have a look.


Maybe, once I have a better understanding of what it's actually measuring.

IMO, what's really needed is to demonstrate its predictive power by showing how it correlates to the performance of other apps.


Not in my case. I've been accused on here of being an Apple fan, just because I respect their tech. I would never buy one, but that's a different matter.
Sorry, didn’t mean to accuse you of being a fanboy - was supposed to be a more general comment (Geekbench always gets caught up in the whole PC vs. Mac nonsense)
 
Primate Labs publishes all these details:
https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf
This doc ^ describes all the test cases, which ISA extensions are used, how the scores are weighted, why multi-core scales the way it does etc.

Geekbench is way overhated online. It’s a fairly reliable way to measure the relative performance of chips. It just offends fanboys who want to see bigger numbers for their brand(s) of choice.

Apple Silicon scores highly in Geekbench simply because it’s fast. There’s no sleight of hand - e.g. Apple’s designs are particularly strong for compiling so they score highly in the clang test 🙂
The MT portion of GB6 is just terrible.
 
You can't determine the relative performance of two CPUs if they are running vastly different operating systems and the software was compiled with different compilers. Compilers and OS kernels are hugely important for performance. There's often a huge difference in performance between Windows and Linux running on the same hardware. There are even significant performance differences between different Linux distros and different versions of Windows.
OS is always a factor regardless of the benchmark. The difference between platforms was much greater in earlier versions of Geekbench that used different compilers for each OS. Now that clang is used for every build, there’s a much smaller difference between Windows, Linux and macOS (though there is still an advantage for Linux in general).
 
Primate Labs publishes all these details:
https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf
This doc ^ describes all the test cases, which ISA extensions are used, how the scores are weighted, why multi-core scales the way it does etc.

Geekbench is way overhated online. It’s a fairly reliable way to measure the relative performance of chips. It just offends fanboys who want to see bigger numbers for their brand(s) of choice.

Apple Silicon scores highly in Geekbench simply because it’s fast. There’s no sleight of hand - e.g. Apple’s designs are particularly strong for compiling so they score highly in the clang test 🙂
That document isn't extremely rich on details, and it certainly doesn't contain source code.

There is no reliable way of measuring relative performance any more, because these days each architecture has significant assumptions about how it is going to be used built into it.

Lunar Lake and Arrow Lake are prime examples of how differently Intel executed similar ingredients to match very distinct use cases. And if you try condensing 128/192 P/E Xeons or power/dense EPYCs into the same two numbers, that won't be meaningful either.

Do they deserve to be hated? Certainly not.
Should they be less quoted or interpreted as a meaningful differentiator: certainly so.

Is there a sleigh of hand? Absolutely, if only in favoring short bursty workloads for both exploiting mobile thermals and quick results.

Does it man that Apple silicion is actually slow? Of course not. I just wish you could buy it at reasonable prices and without becoming an iSlave.
 
Geek Bench multi-core is trash. I don't know what it does, but you can clearly see that it scales very poorly with core count. Much worse than most multi-threaded software people usually benchmark.
I agree, the M4 Max does not seem to scale as well in Cinebench R23 where it scores a 25881 for multicore. Still an amazing score for the power envelope, but not the beast that Geekbench seems to make it look like.

https://nanoreview.net/en/cpu/apple-m4-max-16-core
 
Last edited:
You can't determine the relative performance of two CPUs if they are running vastly different operating systems
It depends on just what the benchmark does. If it's a pure compute benchmark that does all its dynamic memory allocations up front, then single-threaded won't touch the OS at all. The only thing you're counting on the OS for, is to schedule the thread on a fast core and give it enough juice to clock high.

On the other hand, if it's like a database benchmark, it'll be making tons of filesystem calls and probably doing a lot of thread synchronization.

and the software was compiled with different compilers.
Apple uses Clang, which is open source. I don't know what Geekbench was compiled with, but they certainly could've used the same compiler and options, everywhere.

With all that being said, it's really not irrelevant to benchmark on different operating systems, if the point is to predict how well applications will perform. I know that's a slightly different question than about the hardware, itself.

BTW, the article also compared M4 vs. older Macs, in which case they're all running the same OS and compiler.
 
  • Like
Reactions: P.Amini
This also doesn't factor in cost. I'm willing to bet an M4 Max costs allot more to produce than a 9950 does.
 
It does not matter whether GB6 is rubbish, you are never going to notice 10-20% performance difference without using a benchmark program anyway. What matters the most is MacBooks great battery life, unbeatable trackpad and properly designed cooling system. I've tried several premium Windows laptops over the years, and 3/4 of them fail on the trackpad/battery hurdle, while all of them fail on the cooling. I cannot fathom how you can misdesign the cooling system so badly. Most of the time it's noisy for no reason, and even if it's quiet, the fans make sure that you notice them by ramping up and down all the time like demented chickens. I despair why the fans switch on with CPU temperature at 45 deg C? What's the point of that? CPU can withstand 90 deg C easily, so wait until then. How hard can it be? That's what Apple does and Apple engineers must be laughing their heads off when they see that Samsung, Microsoft, LG, Dell or Lenovo (I tried them all) again screwed up the basics in their latest & greatest $4000 toy.
 
Last edited:
You're saying it like it's a cpu anyone can just grab lol. The price will never be right and macs are just not for everyone, and it's simply a not directly comparable. Apple always had higher benchmarks than androids forever and it never mattered. why the hell would it matter in the pc space?
 
Probably, but the M4's are made on TSMC N3E and that not only makes them more expensive, but helps make them quite a bit more efficient.
2 x 70.6mm² and 122mm² on an even older process. vs a monolithic much bigger slab of silicon with double the memory channels.
Yeah, Apple's chip will be more expensive to make.
If it's going to be more efficient depends on the work load, if this years releases have shown us anything it's that software isn't ready to make full use of this batch of wide cores
 
You're saying it like it's a cpu anyone can just grab lol. The price will never be right and macs are just not for everyone, and it's simply a not directly comparable. Apple always had higher benchmarks than androids forever and it never mattered. why the hell would it matter in the pc space?
It's relevant to non-Mac users because it shows what's possible. Consider it a glimpse into the future of what next year's ARM-based CPUs will deliver from Qualcomm and others.

It also creates some pressure for others to match. Most people wouldn't switch between Mac and PCs, but the better Apple becomes, more people do switch over. As most software the average person uses is now in the cloud, the question of which CPU and OS you use is becoming increasingly irrelevant to many.
 
We should at least have some decency and just report M4 is made on TSMC's N3E process, while Intel's offerings use N3B and AMD is still stuck on N4P, even if without elaborating so much.

Speaking of Apple making a jump in efficiency is as erroneous as it can get. And it is as wrong as speaking of an "x86 wave" of efficiency in the latest Intel and AMD processors.
 
  • Like
Reactions: KyaraM
I don't expect the m4 max to be competitive with high end desktop chips from amd and Intel in MT workloads, not even close. The ultra, if there is one, will. But then again the ultra is on an entire different category of chips to begin with, it's more akin to a threadripper / xeon.
 
Geek Bench multi-core is trash. I don't know what it does, but you can clearly see that it scales very poorly with core count. Much worse than most multi-threaded software people usually benchmark.
GB6 is specifically designed to not scale linearly on the multi-core tests.

Many real world tasks don’t scale well across cores and GB has been criticised in the past for not testing things from the real world. In GB6 they have specific tests to mimic these poorly scaling real world use cases.

“The multi-core benchmark tests in Geekbench 6 have also undergone a significant overhaul. Rather than assigning separate tasks to each core, the tests now measure how cores cooperate to complete a shared task. This approach improves the relevance of the multi-core tests and is better suited to measuring heterogeneous core performance. This approach follows the growing trend of incorporating “performance” and “efficient” cores in desktops and laptops (not just smartphones and tablets).”
 
Geek bench means nothing, apple has always had strong geek bench results only to get spanked by a much cheaper windows machine in the end.
 
  • Like
Reactions: KyaraM
GB6 is specifically designed to not scale linearly on the multi-core tests.
That's what the doc says and I think it's a bad decision.

Many real world tasks don’t scale well across cores and GB has been criticised in the past for not testing things from the real world. In GB6 they have specific tests to mimic these poorly scaling real world use cases.
The solution to that is for them to have a 3-tier model: single-threaded, lightly-threaded, and highly-threaded.

Most reviewers who use MT benchmarks opt for ones that do scale, like Blender or CineBench, and there is no shortage of real world apps that do. By declaring itself multi-threaded, GeekBench is implying that the benchmark is highly-scalable. If it's not, then that's a misleading name.

As supporting evidence, I could point to no fewer than a dozen or so articles on this site, in which the authors tout and try to compare GeekBench 6 MT numbers on ThreadRippers, Xeons, and other CPUs with lots of cores. Clearly, the public isn't well aware that GB6/MT isn't designed to be very multi-threaded.

I think there's a reasonable expectation that when you run a MT benchmark on a CPU with more cores, the score should scale somewhat in proportion.
 
But that just isn’t the audience that GB are aiming at, they aren’t claiming to be a benchmark for full-on multi-threaded workloads.

Right on the landing page for GB6 they say:

“Geekbench 6 measures your processor's single-core and multi-core power, for everything from checking your email to taking a picture to playing music, or all of it at once.”

So they are upfront about not targeting multi-threaded rendering and the like. If reviewers represent it otherwise that isn’t really GB’s fault.

I for one am glad there is an easily accessible benchmark that isn’t all about ultimate multi-threaded performance but more related to what most people’s workload is like.
 
But that just isn’t the audience that GB are aiming at, they aren’t claiming to be a benchmark for full-on multi-threaded workloads.

Right on the landing page for GB6 they say:

“Geekbench 6 measures your processor's single-core and multi-core power, for everything from checking your email to taking a picture to playing music, or all of it at once.”

So they are upfront about not targeting multi-threaded rendering and the like. If reviewers represent it otherwise that isn’t really GB’s fault.

I for one am glad there is an easily accessible benchmark that isn’t all about ultimate multi-threaded performance but more related to what most people’s workload is like.
But do you really need a benchmark for checking your email or listening to music, or both at the same time, when any somewhat modern CPU can do that adequately? The simple truth is, most people won't need anything above a CPU like the 12100, or AMD's equivalent chips for the majority of workloads. Stephanie from project management won't need it for her PowerPoint presentation, and neither will Peter from accounting. Not everyone who uses a computer is a software engineer etc, but those are the people who would actually need meaningful benchmarks. So, cute that GB6 is a benchmark for the small people, but at the same time, that makes it kinda superfluous.
 
  • Like
Reactions: bit_user