News Intel and AMD forge x86 ecosystem advisory group that aims to ensure a unified ISA moving forward

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Man, that's exactly (exactly exactly) what I said in my first post and you said I'm coping. I literally said M3 delivers good efficiency because of low power and a big core.
You just said it failed. It absolutely didn't fail on any metric that actually matters to anyone, and I simply explained why your characterization of failure is wrong.

it's absolutely useless for desktops and only useful for mobile.
Why? Intel is using Lion Cove and Skymont for desktops. Apple's M3 cores beat them on ST performance and equaled them on MT perf (but at lower power).

There's currently no M3 Ultra, as far as I can tell. However, the M3 Max delivers a CB R23 (MT) score of 24,020 at 56 W (429 points/W). If you compare that to the i9-14900K, the Intel CPU delivers a score of only 21,775 at 65 W (335 points/W). What's more is that it accomplishes this with only 16 cores/16 threads, compared to Raptor Lake's 24 cores/32 threads!

So, for someone in the market for a 65 W CPU, it seems quite viable to me.

The bottomlone is M3 isn't particularly impressive. It sacrifices a ton of MT performance and efficiency to achieve good ST performance and efficiency.
Again, you're just looking at the entry-level model. It's meant for the iPad and ultralight market. Once you step up to the Pro and Max, you get a lot more MT (and GPU) performance.
 
You just said it failed. It absolutely didn't fail on any metric that actually matters to anyone, and I simply explained why your characterization of failure is wrong.


Why? Intel is using Lion Cove and Skymont for desktops. Apple's M3 cores beat them on ST performance and equaled them on MT perf (but at lower power).

There's currently no M3 Ultra, as far as I can tell. However, the M3 Max delivers a CB R23 (MT) score of 24,020 at 56 W (429 points/W). If you compare that to the i9-14900K, the Intel CPU delivers a score of only 21,775 at 65 W (335 points/W). What's more is that it accomplishes this with only 16 cores/16 threads, compared to Raptor Lake's 24 cores/32 threads!

So, for someone in the market for a 65 W CPU, it seems quite viable to me.


Again, you're just looking at the entry-level model. It's meant for the iPad and ultralight market. Once you step up to the Pro and Max, you get a lot more MT (and GPU) performance.
You are asking me why MT is more important for desktops? Cause that's the whole point of having a desktop chip. If you are just browsing the web you don't need a desktop and high end chips are irrelevant to you anyways.

Why does the number of cores matter? The m3 max beating the 14900k while being 10 times as large is a failure. I don't get how you can pretend otherwise. If the m3 max cores have the transistor count of threadripper 7995wx then that's the one it should be beating for me to conclude that arm has a advantage over x86. If it's just beating a much much smaller chip then what's the point? It's freaking obvious it should be beating it just like the 14900k is beating the 12100. It has more transistors, therefore it's faster. Who would have thought...
 
Last edited:
You are fighting against an inconsistent argument built on shifting sands…
My argument is any thing but inconsistent. I'm talking about transistor / performance from the get go, comparing arm with x86. Just because you don't agree doesn't make my argument inconsistent.

In fact saying transistors don't matter is inconsistent. The whole world is on a node shrink race to pack as many transistors as possible but yet here we are pretending they don't matter. What the hell. Why is everyone using more and more expensive and advanced tsmc nodes then? Because they can use more transistors...duh.
 
Last edited:
Let's measure how good an engine is solely based on the number of pistons/cylinders it has and their cHP! Let's ignore absolutely everything else, because I like my biases and proving skewed irrelevant points. No need to account for displacement, injection method or even shape. Irrelevant.

Regards.
Exactly. We should ignore absolutely everything else, transistor count is the only thing that matters. Transistor counts determines the price. A 150b transistor chip is going to cost a lot more than a 10b transistor chip. So it has to compete with other 150b transistor chips. Whether or not it beats a 10b chip is useless information.
 
Last edited:
You are asking me why MT is more important for desktops? Cause that's the whole point of having a desktop chip.
I was addressing the question of whether the cores, themselves, were suitable for desktop usage. That's the point I thought you were making. So, now that's been settled.

As I've said before, if you'd try to take a bit more time and express yourself more clearly, there might be less confusion.

Why does the number of cores matter? The m3 max beating the 14900k while being 10 times as large is a failure.
Since this thread is about CPU ISA, I'm only going to focus on the CPU cores.

If the m3 max cores have the transistor count of threadripper 7995wx then that's the one it should be beating for me to conclude that arm has a advantage over x86.
Actually, Lunar Lake is the point of equal comparison, since they're on the exact same process node and everything else. You should be aware that performance and efficiency aren't simply a function of the number of transistors.

If the compute tile of Arrow Lake turns out to be made on TSMC N3B, as well, then we can also use it for further points of equal comparison.

If it's just beating a much much smaller chip then what's the point?
The point was that you seemed to be saying a CPU with those cores wouldn't make a good desktop processor. I was showing that it can, but we have to look at a more appropriate incarnation. The numbers I quoted from the M3 Max establish that it should be at least as good a desktop CPU as the i9-14900, and I'm sure you're not saying that's a bad desktop!

It's freaking obvious it should be beating it just like the 14900k is beating the 12100. It has more transistors, therefore it's faster. Who would have thought...
So, now we get to the numbers. As I said, the M3 Max has 16 cores, comprised of 12P + 4E. In prior posts, I've stated they have an estimated 426M and 135M transistors, respectively. So, the total core transistor count is 5.65B.

The Raptor Cove P-cores are 8.52 mm^2 and each E-core cluster is 10.8 mm^2, which I extrapolated based on Locuza's analysis of Alder Lake. The total core area should be 111.36 mm^2. Using the density figure from before, the estimated transistor count in Raptor Lake's cores would be 3.59B.

So, focusing just on the CPU cores, it's not a 10:1 ratio but a far more modest 1.57:1 ratio. When you look at it like that, it's hard not to be impressed that the M3 Max achieved 10.3% better performance at 86% of the power (especially at such a core/thread disadvantage). Or, to put it in efficiency terms, 28.1% better perf/W.

The last thing I'll (hopefully) say about that match up is that the Notebook Check data I used was from the M3 Max in a laptop. Inside of something like a Mac Mini, I'm sure it could manage more than 56 W. So, for those wanting more performance, it probably hasn't even stretched its legs at only 56W.
 
  • Like
Reactions: NinoPino
Why is everyone using more and more expensive and advanced tsmc nodes then? Because they can use more transistors...duh.
No, smaller nodes aren't just about squeezing in more transistors, but also about power savings and timing benefits. Better timing means you could run at higher frequencies, have longer critical paths, or some combination. Longer critical paths enable higher IPC.

In other words, the benefits of smaller nodes are multi-faceted. That's why they're so coveted.

We should ignore absolutely everything else, transistor count is the only thing that matters. Transistor counts determines the price.
No, it's not that simple. The cost of a transistor depends on the node and is also a time-dependent quantity.

Price also depends on yield, which is related to die size. One benefit of chiplet-based architectures is that you get better yield than if you fabbed the whole thing as a monolithic die. The amount of benefit obviously depends on the relative sizes and what the yields are like on the node.
 
Last edited:
I was addressing the question of whether the cores, themselves, were suitable for desktop usage. That's the point I thought you were making. So, now that's been settled.

As I've said before, if you'd try to take a bit more time and express yourself more clearly, there might be less confusion.
My bad, was on the phone so I couldn't make multiple quotes.

Actually, Lunar Lake is the point of equal comparison, since they're on the exact same process node and everything else. You should be aware that performance and efficiency aren't simply a function of the number of transistors.

If the compute tile of Arrow Lake turns out to be made on TSMC N3B, as well, then we can also use it for further points of equal comparison.
But we agreed that Lunar lake doesn't represent the entirety of x86. Lunar lake might be the biggest stinker ever in existence - it still wouldn't make a difference on whether x86 or arm is superior.
So, focusing just on the CPU cores, it's not a 10:1 ratio but a far more modest 1.57:1 ratio. When you look at it like that, it's hard not to be impressed that the M3 Max achieved 10.3% better performance at 86% of the power (especially at such a core/thread disadvantage). Or, to put it in efficiency terms, 28.1% better perf/W.

The 10:1 ratio was just an example, I wasn't actually suggest that its 10 times as big.

If the numbers you are now using are indeed correct, then the numbers I used for alderlake should be correct as well. I ended up at 2.3b for alderlake, seeing how it has half the ecores, a smaller cache and a smaller ring bus, seems very plausible. But anyways, let's just focus on the new numbers you presented.

The m3 max needs 57% more transistors to achieve ~20% more performance at same power (and im being very generous here). How can that possibly be good? This was my question from the beginning, what am I missing? How many extra E or P cores can you fit in that 57% extra space? We are talking about a 12Pcore+25Ecores chip, such a chip would absolutely annihilate the m3 max in both performance and efficiency and it wouldn't even be close.

And that's while comparing it with Intel's P cores which are in themselves not that good in transistor / performance. What the heck man?
 
The m3 max needs 57% more transistors to achieve ~20% more performance at same power (and im being very generous here). How can that possibly be good?
As I said before, and I think demonstrated somewhat conclusively with Coffee Lake R -> Golden Cove, performance never scales linearly with transistor count! Again, in that case (detailed in post #97), we see that Golden Cove had about 2.55 times as many transistors as a Coffee Lake core, yet ST performance only increased by 36.8% (int) and 46.9% (fp)! If it were linear, it should've increased 155%!

How many extra E or P cores can you fit in that 57% extra space?
As you'll know, not everything is multithreaded as well as CineBench or Blender. Many games and other apps use a smaller number of threads quite intensively, or might just not scale performance as well to a larger number of cores. Therefore, if you can get equivalent performance with fewer cores, that's providing more performance people are able to actually use!

In the past, there used to be a real tradeoff you faced, between going with a CPU that had higher core count vs. higher clock speeds. I always opted for higher clock speeds, because I felt that would be useful in virtually all situations, whereas more cores would be applicable to only a few. In this case, the argument is a bit similar.
 
Last edited:
Neither is superior, it all depends on your use case
Uh... do me a favor and remember you said that. It'll be worth noting, when AMD comes along with a version of Zen 5 with its front end swapped out for an AArch64 decoder, next year. Or maybe it'll be Zen 6. Anyway, I expect we'll have an even better apples-to-apples comparison than this Lunar Lake vs. M3 match up. Prepare to be surprised.

I suppose I should allow for the possibility there might be a handful of cases where x86 does better, but I think the vast majority are going to be either equal or swing in ARM's favor.

If you disagree, please tell us which use cases will favor which ISA, and why.
 
  • Like
Reactions: NinoPino
As I said before, and I think demonstrated somewhat conclusively with Coffee Lake R -> Golden Cove, performance never scales linearly with transistors! Again, in that case (detailed in post #97), we see that Golden Cove had about 2.55 times as many transistors as a Coffee Lake core, yet ST performance only increased by 36.8% (int) and 46.9% (fp)! If it were linear, it should've increased 155%!
You did no such thing though. You compared different architectures. A proper comparison to show that transistor count and performance don't scale linearly would be 7700x vs 7950x. I can assure you, youll get an almost linear scale if not exactly linear. That's why im totally not impressed with M3. I see nothing impressive about it. It uses 57% more transistors for not even a 20% increase in performance. In CBR23 that is...that's just flat out bad.

As you'll know, not everything is multithreaded as well as CineBench or Blender. Many games and other apps use a smaller number of threads quite intensively, or might just not scale performance as well to a larger number of cores. Therefore, if you can get equivalent performance with fewer cores, that's providing more performance people are able to actually use!
That's a software issue. Warp stabilizer doesn't scale with cores but a lot of content creators are using a lot of them simultaneously because of that.

And surely you are not making the argument that m3 is good because some applications don't scale with cores, right? How is that relevant?

At least now we seem to agree on something, I'm saying that the MT performance and efficiency of arm (or in this case m3) seems disgustingly bad based on the transistor count, and you are saying that it's the case because they are trading off MT for ST?
 
A proper comparison to show that transistor count and performance don't scale linearly would be 7700x vs 7950x.
No, because you're changing the number of cores, as well as their topology. So, you have those additional variables, as well as the question about multi-thread scalability of the workload.

The best analysis of what you're talking about is to look at core vs. core. Coffee Lake vs. Golden Cove has the additional property of keeping the ISA the same. It also compares across nodes, which is what you were doing between Intel 7 and TSMC N3B.

That's a software issue.
It's just a reality that not everything is multi-threaded. Of those things that are, not everything scales well to lots of cores. If you don't care about that, then my advice would be to get yourself a server board and a Sierra Forest CPU with 144 E-cores.
 
No, because you're changing the number of cores, as well as their topology. So, you have those additional variables, as well as the question about multi-thread scalability of the workload.

The best analysis of what you're talking about is to look at core vs. core. Coffee Lake vs. Golden Cove has the additional property of keeping the ISA the same. It also compares across nodes, which is what you were doing between Intel 7 and TSMC N3B
Yes, im changing the number of cores. I can do that because extra transistors allow me to add extra cores. That's what I've been saying the whole time.
It's just a reality that not everything is multi-threaded. Of those things that are, not everything scales well to lots of cores. If you don't care about that, then my advice would be to get yourself a server board and a Sierra Forest CPU with 144 E-cores.
It doesn't matter if everything is multithreaded or not .The 7995wx is faster than a 14900k, period. How well the software scales to take advantage of that is irrelevant. We are comparing the chips.

There are arm server chips out in the wild right? Do we have any info on those and how they compare to threadrippers and xeons?
 
Exactly. We should ignore absolutely everything else, transistor count is the only thing that matters. Transistor counts determines the price. A 150b transistor chip is going to cost a lot more than a 10b transistor chip. So it has to compete with other 150b transistor chips. Whether or not it beats a 10b chip is useless information.
You do realize what you just wrote there is factually wrong, right?

I'm not going to even bother following up after this, since any other reader will understand why.

Regards.
 
  • Like
Reactions: Ogotai and bit_user
  • Like
Reactions: TheHerald
You do realize what you just wrote there is factually wrong, right?

I'm not going to even bother following up after this, since any other reader will understand why.
No, it is not.

But I'm not going to even bother following up after this, since any other reader will understand why.
 
There are arm server chips out in the wild right? Do we have any info on those and how they compare to threadrippers and xeons?
AMD EPYC 9965 "Turin Dense" Delivers Better Performance/Power Efficiency vs. AmpereOne 192-Core ARM CPU

There is 1x ARM Server Chip that is a generic one and is readily available to purchase.

But Phoronix already tested it and it got trounced by AMD.
On a geo mean basis across all the benchmarks, the 192-core EPYC 9965 was delivering 1.6x the performance of the AmpereOne A192-32X flagship processor in the benchmarks conducted. So while the average power use of the EPYC 9965 was around 1.2x that of the AmpereOne A192-32X, it more than makes up for it in power efficiency with 1.6x the performance.


As seen with this EPYC 9965 to AmpereOne A192-32X comparison, AMD EPYC Turin Dense CPUs can easily compete with and typically outperform AmpereOne AArch64 cores. And ultimately delivering better performance-per-Watt. The one area where Ampere Computing may have the advantage is on performance-per-dollar if the list pricing for both AMD EPYC Turin and AmpereOne are accurate and there becomes robust availability. That's for CPU pricing at least with not yet having any public AmpereOne server platform pricing for getting an idea of the overall TCO.
 
  • Like
Reactions: TheHerald
LOL, AmpereOne is sort of a flop. It's the classic story of too little, too late. They tried to make a custom core that's more on the E-core end of the spectrum, so they could eventually scale up to some crazy number.

I think they really should've stuck with using ARM's Neoverse cores, like they did in Altra. That's what Nvidia's Grace did (see above).
 
AMD EPYC 9965 "Turin Dense" Delivers Better Performance/Power Efficiency vs. AmpereOne 192-Core ARM CPU

There is 1x ARM Server Chip that is a generic one and is readily available to purchase.

But Phoronix already tested it and it got trounced by AMD.
I got this from the review you linked - what im saying all along

While some like to criticize x86_64 for power efficiency, with Intel Sierra Forest and AMD Bergamo / Turin Dense is increasing proof that the ISA isn't inherently power inefficient or the like. EPYC 9965 is shooting well ahead of the AmpereOne A192-32X in this 192-core showdown.