News AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I don't think it's math but grammar or semantics.

What they call "2x faster" is in fact 2x as fast, while 200% faster is 3x as fast.

And it is quite intentional misleading by everyone who uses faster/better/less expensive etc. comparative with a factor instead of a percentage: shame on them!
No, this is the shame of AMD and its marketing department, where managers simply do not know how to correctly, mathematically, convert times into percentages and back. The shame of the AMD team is right before your eyes on the slide.

The mathematically correct slide - if the numbers on it are correct in % - at the bottom there should be an average of 4.6 times. If the average of 2.6 times at the bottom is correct - at the top of the columns the percentages should be around 140-160% and the last column 202% on the histogram.

It is simply incredible - that not a single journalist at the exhibition shamed them. We now live in a world of lies and absurdity and ignoramuses. This will definitely end badly for the entire civilization.
 
This is shared memory. Therefore, when igpu is not load, the entire bandwidth is available to the processor cores
That's not accurate. As thoroughly investigated & documented here, Apple didn't make the interface of the CPU cluster to the interconnect fabric wide enough that the CPU cores could saturate the memory bandwidth:

and that is why the M4 Max and especially the M3, has a memory controller with a very low efficiency in reality.
It's not the memory controller(s). Using OpenCL, it was possible for the GPU to achieve 390 GB/s (out of a theoretical 400 GB/s) on the M1 Max:

If everything were as you claim, why do the M3 Pro/M3 Max cores have almost twice less real bandwidth available in real tests, if the buses are the same width?
I don't follow. The M3 Pro has a 192-bit memory data width, which is all I really know about that SoC, specifically.
 
  • Like
Reactions: thestryker
How do you explain then the real 120-130GB/s for the 16-core version of the M3 Max with 400Gb/s bandwidth with lpddr5 6400 and 220-230GB/s for the M4 Max with 546GB/s with lpddr 8500?

In essence, marketing is lying to buyers that the memory controller is fully accessible to the processor cores, because I have not seen any footnote anywhere in advertising or review materials that part of the bandwidth is allegedly reserved and not accessible to the processor cores.
 
How do you explain then the real 120-130GB/s for the 16-core version of the M3 Max with 400Gb/s bandwidth with lpddr5 6400 and 220-230GB/s for the M4 Max with 546GB/s with lpddr 8500?
If I understand your question correctly, you're saying the 16-core M3 Max can only access ~130 GB/s out of 400 GB/s, whereas the M4 Max can access ~230 GB/s out of 546 GB/s? I have no specific knowledge of either SoC, but I'd speculate that it comes down to how many CPU core clusters they each have, how those are connected to the data fabric of the SoC, and what frequency that interconnect runs at.

I expect you can probably find more insights into the matter, if you do a bit of digging. One guy I'd follow is ex-Apple engineer Manard Handley, who goes by the alias name99 and has done a lot of reverse-engineering of their M-series SoCs. Here's his github repo, but he also posts on some social media and sometimes over on the RealworldTech forums.

In essence, marketing is lying to buyers that the memory controller is fully accessible to the processor cores, because I have not seen any footnote anywhere in advertising or review materials that part of the bandwidth is allegedly reserved and not accessible to the processor cores.
I'm not here to defend Apple, but if the SoC is capable of using that much memory bandwidth, then they didn't actually lie. You just assumed it was all made available to the CPU cores, but I'm sure they never said so. Also, pretty much every datasheet or specs summary I've seen published by a manufacturer has a "get out of jail free" clause, where they say something like: "all specifications subject to change".

After yesterday's exchange, I had wanted to add that this sort of thing isn't uncommon. In the PS4 and PS5, the CPU cores were also restricted from eating the whole pie. In the PS4's case, the CPU cores could only use a total of about 20 GB/s out of the 176 GB/s max [1]. In the PS5, the CPU cores are limited to about 97 GB/s out of the 440 GB/s total [2].

Sources:
  1. https://forum.beyond3d.com/threads/is-ps4-hampered-by-its-memory-system.54916/
  2. https://chipsandcheese.com/p/the-nerfed-fpu-in-ps5s-zen-2-cores
 
  • Like
Reactions: thestryker
I'm not here to defend Apple, but if the SoC is capable of using that much memory bandwidth, then they didn't actually lie. You just assumed it was all made available to the CPU cores, but I'm sure they never said so.
Given the lack of source for the testing perhaps it's not even limited, but rather what the CPU designs are capable of saturating. As you've already said the bus width isn't to feed the CPU, but rather the GPU.
 
Given the lack of source for the testing perhaps it's not even limited, but rather what the CPU designs are capable of saturating.
In the first link I posted about this (the Anandtech article), Dr. Ian Cutress used multithreaded scaling analysis to show that the M1's bottleneck was somewhere between the CPU cluster and the memory controller. He showed you could reach the saturation point with only 4 active cores, all simultaneously hammering memory.
 
  • Like
Reactions: thestryker
In the first link I posted about this (the Anandtech article), Dr. Ian Cutress used multithreaded scaling analysis to show that the M1's bottleneck was somewhere between the CPU cluster and the memory controller. He showed you could reach the saturation point with only 4 active cores, all simultaneously hammering memory.
Yeah, but those could easily be attributed to inherent design limitations given that it didn't scale much beyond the Pro. Now I'm bitter about nobody picking up the mantle on Apple SoC testing after Ian left 🤣
 
Yeah, but those could easily be attributed to inherent design limitations given that it didn't scale much beyond the Pro.
I thought you were saying the CPU cores just weren't fast enough to demand any more than that, which clearly isn't the case. Presumably, Apple figured that's all the CPU cluster would need, in most real-world scenarios.

Now I'm bitter about nobody picking up the mantle on Apple SoC testing after Ian left 🤣
I'm not aware of it, but that doesn't mean that nobody is doing it. Chips & Cheese has yet to touch Apple silicon, but their tools are open source and it's possible others might've already run some of their tests on newer M-series SoCs.
 
I thought you were saying the CPU cores just weren't fast enough to demand any more than that, which clearly isn't the case.
Ah, yeah, I didn't make that clear. Even x86 CPU cores have been fast enough to saturate the dual channel bandwidth it just didnt matter in much until 16 core CPUs arrived. I also have a hard time believing that the M3 Max wouldn't be able to at least match the bandwidth of the M1 Max.
I'm not aware of it, but that doesn't mean that nobody is doing it. Chips & Cheese has yet to touch Apple silicon, but their tools are open source and it's possible others might've already run some of their tests on newer M-series SoCs.
Any time I've gone looking for technical insights into the later M-series I've hit a brick wall. It doesn't mean that they're not out there (I don't check youtube for example), but if they are they aren't easy to find.
 
Any time I've gone looking for technical insights into the later M-series I've hit a brick wall. It doesn't mean that they're not out there (I don't check youtube for example), but if they are they aren't easy to find.
Beyond the github link I posted above, you can also find a smattering of interesting papers on Google Scholar. Because research takes a while and then there's the delays of the publication pipeline, you're not going to find much on the latest and greatest products, but you'll do better if you focus on a couple generations prior.


I know Handley (name99) gleaned many details by reading through quite a few of Apple's patents. He's also collected data others have gathered via microbenchmarking. Unlike research, patents have the advantage of being more forward-looking, sometimes getting filed years before a product implementing them reaches the market.

BTW, Github's online PDF viewer seems to choke on some of the larger PDFs, but they work just fine if I download and view the raw file locally.
 
Last edited: