News AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores

Uh, so what memory does it actually use? I'm guessing LPDDR5X @ 256-bit data width. Unless my math is wrong, a nominal bandwidth of 256 GB/s suggests LPDDR5X-8000?

The total cache suggests they're using the same amount of L3 per core as their desktop/server CCDs. That makes me wonder if they're also using full 512-bit vector pipelines.

The last big question I have (for now, at least) is what process node?
 
The article said:
if you have 128GB of total system memory, up to 96GB can be allocated to the GPU alone, with the remaining 32GB dedicated to the CPU. However, the GPU can still read from the entire 128 GB memory, thus eliminating costly memory copies via its unified coherent memory architecture.
You'd think so, but this actually requires software to be written accordingly. For games, Microsoft only somewhat recently introduced a feature in Direct3D for the app/game to share memory with the GPU. Unless the programmer specifically uses this API feature, games will still be copying assets into the GPU's memory segment, in spite of the fact that they're merely logical partitions within the same physical memory.

@JarredWaltonGPU , perhaps reach out to AMD to see if they know which games do this, since it might make for some interesting benchmarks.
 
Its great to see that this will be on mini PC's from various OEMs but I would love to see this on an ITX motherboard with 3 or 4 M.2 slots and a 2.5gb ethernet port. I can think of a few small sub 4L cases I would love to put this in and use as a portable computer.
 
  • Like
Reactions: bit_user
CAMM2 support? 128GB(+?) support of user replaceable memory is ideal.

Weird that they dropped from 192GB max support to 128GB on this though. I guess it has to do with the new IMC.
 
  • Like
Reactions: bit_user
CAMM2 support? 128GB(+?) support of user replaceable memory is ideal.

Weird that they dropped from 192GB max support to 128GB on this though. I guess it has to do with the new IMC.
It really depends on whether they support dual rank memory. If so, then 32 Gb chips should enable 256 GB. However, if that were the case, then they should already support 192 GB. So, maybe the 128 GB figure is already anticipating the 32 Gb chips.
 
Last edited:
I find it interesting AMD didn't compare apple's to apple's when it comes to Apple Silicon. It would have made more sense to compare the 16 core/40 GPU core M4 Max to the AI Max+ 395 since they have the same basic config (16 compute/40 GPU). I'll be curious to see the independent reviews to see how close it comes to Apple's upper end offering.
 
1.4x gaming performance compared to the C9U 288V seems like a conservative estimate?
It's >228% faster in the 3DMark benchmarks.

The 256GB/s memory bandwidth is better than expected for a 256-bit bus. and at least they didn't go for the measly 6400.
I hope OEMs decide to pack in LPDDR5X-10700, although 8533 would be fine too.
 
Unless my math is wrong, a nominal bandwidth of 256 GB/s suggests LPDDR5X-8000?
If so they pulled back (or never had) LPDDR5X-8533 (273 GB/s).
1.4x gaming performance compared to the C9U 288V seems like a conservative estimate?
It's >228% faster in the 3DMark benchmarks.

The 256GB/s memory bandwidth is better than expected for a 256-bit bus. and at least they didn't go for the measly 6400.
I hope OEMs decide to pack in LPDDR5X-10700, although 8533 would be fine too.
128% faster, 2.28x, not 228%. But yes that is strange. They don't want to talk about Strix Halo gaming, maybe because it will be irrelevant compared to the AI use case.

Some leaks said LPDDR5X-8000 which appears to be correct, some said 8533. Could they even use higher than that?
The total cache suggests they're using the same amount of L3 per core as their desktop/server CCDs.

The last big question I have (for now, at least) is what process node?
Strix Halo should be using desktop Zen 5 chiplets. It could even get X3D in the future.
I'm curious if there's any cache dedicated to the GPU as they don't really say one way or the other.
Leaks said 32 MiB Infinity Cache. Now we wait for more details.
 
hmmm, if this were built into an mini-ITX motherboard with 128GB of memory and support for 2 NVMe M2 drives and 2 HDMI or 2 DP connectors., this could very well be a system that would carry me into retirement 6 years 9 months from now. I thought I'd be waiting for Ryzen R9-9950X3D and Radeon RX 9070 XTX (think 9950XTX ) combo to replace my current long in the tooth build... but this all-in-one solution will be fine and it seems like it will play Flight Simulator 2024 just fine :)
 
  • Like
Reactions: SocDriver
hmmm, if this were built into an mini-ITX motherboard with 128GB of memory and support for 2 NVMe M2 drives and 2 HDMI or 2 DP connectors., this could very well be a system that would carry me into retirement 6 years 9 months from now. I thought I'd be waiting for Ryzen R9-9950X3D and Radeon RX 9070 XTX (think 9950XTX ) combo to replace my current long in the tooth build... but this all-in-one solution will be fine and it seems like it will play Flight Simulator 2024 just fine :)
It would also make a great board for people that have travel gaming rigs. Right now mine is in a 4L chassis and I would love to get it under 3L and have some options to do so if they can get it on an ITX motherboard and let me spin that thing to the TDP max. There are a few coolers that could handle the workload and I could have a main drive plus data drive for games if they ran it with 2 m.2 slots.
 
  • Like
Reactions: bit_user
You'd think so, but this actually requires software to be written accordingly. For games, Microsoft only somewhat recently introduced a feature in Direct3D for the app/game to share memory with the GPU. Unless the programmer specifically uses this API feature, games will still be copying assets into the GPU's memory segment, in spite of the fact that they're merely logical partitions within the same physical memory.

@JarredWaltonGPU , perhaps reach out to AMD to see if they know which games do this, since it might make for some interesting benchmarks.
That relates mostly to dGPUs, as it's an extension of ReBar, which iGPUs don't support. AMD's iGPUs have used a unified memory architecture since around 2011. So, there's no point in copying to the same memory. Instead, CPU and GPU use pointers and pointer-pass to locations in memory (GPU driver-aware, hardware context aware). There's no copy aside from the initial allocation of texture+asset data from SSD. Consoles act in the same manner.

Gaming performance will be limited primarily by power limits, as CPU and (very large) iGPU share package power; no mention of SmartShift, but that should be used here. 8060S gives us an idea of where the GPU performs in the stack, and that is firmly midrange.

Allocating more than 1/2 of system RAM for iGPU is new. This seems to be a soft-partition, maybe by microcode switch (reboot required?). I don't know. We need a more technical overview.

I'd also like to know more packaging details. It certainly looks like CCDs+IOD are connected via fanout (InFO) or are on a passive interposer (CoWoS). This has obvious benefits for UMA, especially for CPU accesses, which are at least 1-hop away from IOD, whereas iGPU is basically adjacent to memory controllers. High-bandwidth, low-latency interconnect can bridge the disparate pieces together a bit more coherently.
 
Last edited:
  • Like
Reactions: bit_user
AMD's marketing department has disgraced the world! Their employees don't have elementary school math! They don't know how to convert percentages to multiples and back! The average 2.6 times as on their slide means that all percentages in columns should be around 140-160% and only the last one 202%, not 402%.

340% = 4.4 times, which means that either the % are overstated by 200% or times more than shown in the picture. Which figure is correct - percentages or times, we don't know now!

FokBujBJ8oxDWBfRC65sfZ.jpg


Is it possible that not a single editorial board or author on the planet informed AMD about such a mathematical disgrace?

I noticed it immediately within 2 seconds of viewing the charts!

Where is our world heading if the second x86 processor manufacturers in the world employ such poorly educated employees in the marketing department who probably failed their high school math exam...
 
AMD's marketing department has disgraced the world! Their employees don't have elementary school math! They don't know how to convert percentages to multiples and back! The average 2.6 times as on their slide means that all percentages in columns should be around 140-160% and only the last one 202%, not 402%.

340% = 4.4 times, which means that either the % are overstated by 200% or times more than shown in the picture. Which figure is correct - percentages or times, we don't know now!
The slide doesn't say "average 2.6 times". It says "average 2.6X faster", which means +260%. That fits all the other numbers.

The graphs have a "100%" bar, so they're depicting performance compared to that reference point - not uplift from a comparison value. So "340%" means 3.4 times the performance of the Core Ultra 9 288V.
 
  • Like
Reactions: bit_user
The latest slides show LPDDR5X-8533 to match Lunar Lake, for 273GB/s as it's expected to be 256-bit (and not 512-bit like Apple).

Likewise, Strix Point refresh gets an upgrade to 128-bit soldered LPDDR5X-8000 (125GB/s, about like RX6400), or DDR5-5600 (PC5-44800)
Link?

Their specifications website quite literally says:
System Memory Type 256-bit LPDDR5x
Max. Memory 128 GB
Max Memory Speed LPDDR5x-8000
 
  • Like
Reactions: bit_user
it's expected to be 256-bit (and not 512-bit like Apple).
Depends on which Apple M tier you're talking about. Only the Max has 512-bit.

The key difference between AMD and Apple is that AMD makes dGPUs, while Apple doesn't. So, Apple had to scale up its iGPU as much as possible, in order to try and make it a substitute for the top-end dGPUs (and yet, it's still not quite), whereas AMD will tell you just to use one of their dGPUs, if you need more horsepower than its iGPU can provide.
 
The slide doesn't say "average 2.6 times". It says "average 2.6X faster", which means +260%. That fits all the other numbers.

The graphs have a "100%" bar, so they're depicting performance compared to that reference point - not uplift from a comparison value. So "340%" means 3.4 times the performance of the Core Ultra 9 288V.
This is a full disgrace! What did they teach you in elementary school?
2.6 times faster = 160%. Learn elemental math.
AMD has disgraced itself in front of the entire planet. Its marketing department, then all managers, then all sites and news writers who did not pay attention to this mathematical EPIC FAIL from AMD.
The latest slides show LPDDR5X-8533 to match Lunar Lake, for 273GB/s as it's expected to be 256-bit (and not 512-bit like Apple).
This is another failure of AMD's marketing department if the RAM really works in LPDDR5 8533 mode (on official slides - 256GB/s or 8000 mode). And this is sad - it means there will be no fast configurations with 2 slots and you will have to pay many times more for soldered 32,64,128 (and 256), and with less guarantees than for modules in retail, where the guarantee on them reaches 10 years.

Apple M3/M4 Pro - 256 bit memory bus
Apple M3/M4 Max - 512.
But in reality, the efficiency of the M3 Pro memory controller is less than 45% - ~120-125GB/s in real bandwidth tests.
The efficiency of the M3 Max is ~24% - ~130GB/s

The efficiency of the M4 Pro is 75% - ~200-210GB/s
The efficiency of the M4 Max is 40% - ~210-220GB/s - there is no use for the 512-bit bus in the Max M3/M4 series in practice.
 
Apple M3/M4 Pro - 256 bit memory bus
The M1 Pro and M2 Pro had 256-bit datapath, but M3 Pro reduced it to 192-bit. The M4 Pro reverted back to 256-bit.

The efficiency of the M4 Pro is 75% - ~200-210GB/s
The efficiency of the M4 Max is 40% - ~210-220GB/s - there is no use for the 512-bit bus in the Max M3/M4 series in practice.
The memory bandwidth in these SoCs is primarily for the benefit of their large iGPU and perhaps NPU. They were not even designed for the CPU cores to use more than a certain amount of it, as you've observed.
 
The memory bandwidth in these SoCs is primarily for the benefit of their large iGPU and perhaps NPU. They were not even designed for the CPU cores to use more than a certain amount of it, as you've observed.
This is shared memory. Therefore, when igpu is not load, the entire bandwidth is available to the processor cores and that is why the M4 Max and especially the M3, has a memory controller with a very low efficiency in reality. It is simply a disgrace compared to the M4 Pro, useless +256 bit bus.

If everything were as you claim, why do the M3 Pro/M3 Max cores have almost twice less real bandwidth available in real tests, if the buses are the same width? This speaks of the poor design of Apple chips in the M3 series and again the poor design of the M4 Max. Real tests have proven everything.
 
Uh, so what memory does it actually use? I'm guessing LPDDR5X @ 256-bit data width. Unless my math is wrong, a nominal bandwidth of 256 GB/s suggests LPDDR5X-8000?
That's really the only question I've long wanted the answer to, but nobody is spelling it out...

256 bits would mean 8 channels of LPDDR5 32-bit or the equivalent of 4 channel DDR5 64-bit which is either a ton of beachfront for a mobile chip or requires something like an Mx packaging design to keep things in check.

For now I am hoping/guessing at two LPCAMM2 modules which match the speeds and the width and would allow for some RAM config variability over soldered LPDDR5, but I guess even with CAMM2 power and signal transistor sizes would increase significantly over an Apple Mx/Lunar Lake die carrier approach. And it makes for larger or thicker mainboards.

I really dislike how those juicy bits are held back!
 
The slide doesn't say "average 2.6 times". It says "average 2.6X faster", which means +260%. That fits all the other numbers.

The graphs have a "100%" bar, so they're depicting performance compared to that reference point - not uplift from a comparison value. So "340%" means 3.4 times the performance of the Core Ultra 9 288V.
I don't think it's math but grammar or semantics.

What they call "2x faster" is in fact 2x as fast, while 200% faster is 3x as fast.

And it is quite intentional misleading by everyone who uses faster/better/less expensive etc. comparative with a factor instead of a percentage: shame on them!
 
  • Like
Reactions: thestryker