News Alleged Zen 5 'Strix Halo' Mobile APU has more GPU cores than RX 7600 XT or PS5 — features monster RDNA 3.5 GPU with 40 compute units

It seems like it has everything it needs to work quite well. Unfortunately it also looks like it will draw a lot of power at idle with the chiplets and GDDR5 and all.
I'm guessing over 20w. Which is fine for desktop. It also looks like it will work best with a 200ish w tdp. Also desktop range. You could probably run it at not too reduced clocks at 100w.

AMD is really best at efficiency with their monolithic mobile line.
 
  • Like
Reactions: gg83
"Memory bandwidth equates to 500 GB/s, which basically matches the RTX 4070 Super."

I am not familiar with LPDDR5x
I know that the M3 Max uses a 512-bit bus width and LPDDR5-6400 to achieve 400 GB/s.
How do you arrive at 500 GB/s when using 256-bit and LPDDR5x-8533?
I ran those numbers through a calculator and I get either 273 GB/s or 546 GB/s, depending on clock multiplier being 1 or 2.

(RTX 4070 with 192-bit, 21000MT/s, and clock multiplier 1 checks out at 504GB/s.)
 
  • Like
Reactions: gg83
Those are all FAKE 'made up' speculative specs. I wouldn't believe or count on any of those data. Waste of time.

Strix Halo will have the biggest impact on the budget/mid-range gaming laptop market, where its GPU should be sufficient to outperform even Nvidia's entry-level laptop GPUs like the RTX 4050 and perhaps even the RTX 4060.

Not correct. Strix HALO series fall under the "Ultimate" APU lineup targeting only the ultra high-end and "premium" Laptop segment, gaming PCs, and mini-PCs. For the budget/mid-range market, there are other APUs AMD has in the pipeline.

These "Halo" chips are even expected to be faster than the upcoming "Fire Range" lineup, which in itself are high-end desktop-replacement mobile chips.

Gfx1151 IDs correspond to high-end premium chips. ROCm patch entries don't lie. Also, as per AMD's internal docs, these chips are dubbed by the 'Ultimate APU" moniker.

dsN6OXz.png


qUbERCQ.png



At some point, we could even see Strix Halo in a handled gaming PC.

Nonsense. These are higher TDP part chips, so they are highly 'unlikely' to ever go in any handheld console.
 
Last edited by a moderator:
Even the concept of these chips is baffling to me because they're designed with 2 CCDs in mind which places them in high cost territory, but they're limited by up to ~273GB/s bandwidth shared which is 4060/7600 territory. There's also the new memory controller itself which is only going to be used with these products. These also won't be usable for desktop as it would break AM5 compatibility, or just run dual channel completely negating the point.

I'd like to see better IGP performance across the board, but this seems like an extremely questionable way to go about it. If they targeted a more midrange area then it would make a lot more sense as it could replace discrete there and be used in handhelds.
 
  • Like
Reactions: gg83
Wow imagine that.. Nvidia has been rereleasing the same GPU so many times now as RTX2060, RTX 3060, RTX 4060 (if you don't think those are the same in performance you are just lying to yourself) that even APUs caught up to it. AMD went along the greedy and will definetly charge a lot for these, but in the end it is what it is... a APU that caught up to the current entry desktop GPUs. Bravo
 
"Memory bandwidth equates to 500 GB/s, which basically matches the RTX 4070 Super."

I am not familiar with LPDDR5x
I know that the M3 Max uses a 512-bit bus width and LPDDR5-6400 to achieve 400 GB/s.
How do you arrive at 500 GB/s when using 256-bit and LPDDR5x-8533?
I ran those numbers through a calculator and I get either 273 GB/s or 546 GB/s, depending on clock multiplier being 1 or 2.

(RTX 4070 with 192-bit, 21000MT/s, and clock multiplier 1 checks out at 504GB/s.)
I've edited the text to correct this. Clearly, the "leaker" ("faker") who provided this information didn't do the math homework properly. As you note, a 256-bit LPDDR5x-8533 configuration would only yield 273 GB/s. Seems as though whoever created this wanted it to equal the Apple M2 Ultra or something and so just put "500 G/s" in there without figuring out exactly what that would mean.

Personally? I bet this is just some fanboy dreaming about what AMD might do, and that the actual product ends up far less potent than suggested. Even a 256-bit LPDDR5x interface would be rather surprising, and I wouldn't be surprised to see a standard 128-bit interface with maybe 20 CUs — half what this pipe dream suggests.
 
  • Like
Reactions: gg83
Nonsense. These are higher TDP part chips, so they are highly 'unlikely' to ever go in any handheld console.
There's been talk of lower TDP versions. If a version with less cores/CUs gets down to the ~28-45W range, then you're around where MSI Claw, Lenovo Legion GO, ROG Ally, etc. are. And maybe these handhelds are all way too power hungry, but there is a market for it.

Only one 8-core chiplet is needed. The 256-bit memory bus is the star of the show.

These also won't be usable for desktop as it would break AM5 compatibility, or just run dual channel completely negating the point.
It will definitely end up in mini PCs from vendors like Minisforum. That's a growing segment and buyers can accept some of the disadvantages (soldered CPU on board, likely soldered memory, relatively high price).

I think it's too physically large to even fit on AM5.

No way this is real.

It's not even a convincing lie.
Strix Halo information leaked a long time ago, and it has started to appear in Linux code. It's coming, hopefully. Problem is it can't be all things for all people. It's primarily intended for high-end laptops. In pre-built mini PCs, it might be good but CPU + GPU combos will easily defeat it if it's too expensive. It's not viable for handhelds unless a relaxed version with only 6-8 cores but the full 256-bit memory can get enough battery life.

In laptops it has the potential to achieve better efficiency in a smaller chassis than the laptop APU + dGPU combos it would be competing with. It might be able to command a higher price, but it would look great if it's cheaper.
 
There's been talk of lower TDP versions. If a version with less cores/CUs gets down to the ~28-45W range, then you're around where MSI Claw, Lenovo Legion GO, ROG Ally, etc. are. And maybe these handhelds are all way too power hungry, but there is a market for it.

Only one 8-core chiplet is needed. The 256-bit memory bus is the star of the show.


It will definitely end up in mini PCs from vendors like Minisforum. That's a growing segment and buyers can accept some of the disadvantages (soldered CPU on board, likely soldered memory, relatively high price).

I think it's too physically large to even fit on AM5.


Strix Halo information leaked a long time ago, and it has started to appear in Linux code. It's coming, hopefully. Problem is it can't be all things for all people. It's primarily intended for high-end laptops. In pre-built mini PCs, it might be good but CPU + GPU combos will easily defeat it if it's too expensive. It's not viable for handhelds unless a relaxed version with only 6-8 cores but the full 256-bit memory can get enough battery life.

In laptops it has the potential to achieve better efficiency in a smaller chassis than the laptop APU + dGPU combos it would be competing with. It might be able to command a higher price, but it would look great if it's cheaper.
Of course Strix Halo is real. These made up specs are definitely not.

40 GPU cores is never going to happen when APUs currently max out at 12. 16 would at least be believable.
 
40 GPU cores is never going to happen when APUs currently max out at 12. 16 would at least be believable.
Honestly 16 isn't even all that believable without a new memory controller. The 780M is quite bandwidth constrained as it is and I'm not sure even dual channel 8533 would be enough. The RX 6400 shares a CU configuration with the 780M, but has the equivalent of dual channel 8000 bandwidth. Keeping in mind that the RX 6400 is an older architecture and also clocks lower which may very well mean it doesn't need as much memory bandwidth in the first place.
 
Of course Strix Halo is real. These made up specs are definitely not.

40 GPU cores is never going to happen when APUs currently max out at 12. 16 would at least be believable.
Strix Point = 16 CUs, Strix Halo = 40 CUs.

The whole point is to go a couple steps above the mainstream, monolithic APUs like Phoenix/Strix Point.

Doubled memory bus helps make 40 CUs viable. That 32 MB of what looks like Infinity Cache (if it's real) would also help greatly.
 
Yeah, the 40CU, 256-bit memory bus has been speculated for a while now.
MLiD had the beans on that since last year.
A pipe-dream would be having a wider memory bus and having memory-on-package, like Lunar lake.

I have mixed feelings with Strix Halo. While the memory bandwidth doubling is a massive improvement over current laptops, it's also not enough at the same time. 256-bit feels like a compromise when they should have gone all in with a 384-bit or 512-bit. Like please break the 400GB/s barrier if you are going to pair it with a 40CU GPU.
 
  • Like
Reactions: usertests
This is technically achievable, but the design is backwards, IMO. CCDs don't need fanout bandwidth, but a GPU sure does and it needs it directly connected to memory/cache. It'd make more sense if the CCDs were LPDDR5x MCDs in this scenario. There are 10.7Gbps LPDDR5x modules from Samsung, which would offer 339.2GB/s at 256b + any MALL bandwidth amplification. These should be ready by the time Strix Halo launches in H1 2025.

I do think Strix Halo will ship with on-package memory in a compact two memory package format supporting 256b (128b*2, 32b*4*2) through vertically stacked memory dies. This is where fanout links are needed to maximize bandwidth, and if a MALL cache is used, it will most likely be wired directly to GPU ROPs without sharing to CPU, as CPU cores already benefit from large L3s.

So, each memory stack needs a logic die underneath it, and this can be where a MALL cache could reside or it can be etched into an active interposer that the SoC sits atop of (or even memory modules).

N4X is a possibility, as long as TSMC met targeted leakage characteristics, as most CCDs will be used in EPYC where efficiency is needed.

Also, 780M in Phoenix/Hawk Point doesn't gain much performance after DDR5-6400 because it's primarily CPU-limited (if power limits are removed); there are huge gains at 5600, 6000, and 6400, then the gains level off. One of the things people fail to realize in APUs is that CPU clocks completely crater when iGPU is used because power is biased toward iGPU. However, 720-1080p are CPU-limited resolutions, so in a perfect scenario, you want both high CPU and GPU clocks, and that's achievable with discrete components where CPU and GPU don't share the same package power. It's just not efficient.

Unified Memory Architecture (UMA) will also help ray tracing BVH accesses as BVH is stored in system RAM and built by CPU cores. iGPUs share the same memory pool, so should be faster to access.

Strix Halo looks to be targeting 1440p at this rate.
 
Last edited:
  • Like
Reactions: usertests
I have mixed feelings with Strix Halo. While the memory bandwidth doubling is a massive improvement over current laptops, it's also not enough at the same time. 256-bit feels like a compromise when they should have gone all in with a 384-bit or 512-bit. Like please break the 400GB/s barrier if you are going to pair it with a 40CU GPU.
Let them cook, and we'll see where the chips land (bench).

If Strix Halo is also packing 32 MB of Infinity Cache, that would make the memory bandwidth more viable. Previous rumors were not consistent about the presence of Infinity Cache. I don't think the CPU and iGPU will both be using the same cache as the speculator says.

Also, if Strix Halo is using up to two Zen 5 chiplets, it seems likely that AMD could make an X3D version if they wanted to, which would further lower power consumption and reduce the memory accesses by the CPU during gaming.

If Strix Halo becomes a successful product, then we could watch AMD iterate on the design. Zen 6 in particular may be changing how the chiplets and interconnect work for the better.
 
Also, 780M in Phoenix/Hawk Point doesn't gain much performance after DDR5-6400 because it's primarily CPU-limited (if power limits are removed); there are huge gains at 5600, 6000, and 6400, then the gains level off. One of the things people fail to realize in APUs is that CPU clocks completely crater when iGPU is used because power is biased toward iGPU. However, 720-1080p are CPU-limited resolutions, so in a perfect scenario, you want both high CPU and GPU clocks, and that's achievable with discrete components where CPU and GPU don't share the same package power. It's just not efficient.
This is basically all incorrect which makes the validity of the rest of your post questionable.

The folks who have replaced the 6400 in the Ally for 7200 have seen linear performance increases in anything that wasn't completely CPU bound (not to mention the DDR5 testing on the socketed APUs). 720/1080p aren't arbitrarily CPU bound just like no resolution is, because it's dependent on your hardware. 12CU is absolutely not enough graphics performance to cause a CPU bound situation in very many titles. The power bias is also not towards the GPU in AMD's APUs unless you're using third party software to force such behavior.
 
  • Like
Reactions: TJ Hooker
I'm pretty sure the "bandwidth equivalent to 500g/s" is referring to the L4 cache, not the DRAM interface. I highly doubt these product will ever hit handhelds, but high-end thin and lights could certainly make use of these. It wouldn't surprise me if an updated Razer blade or similar were to use one of these halo products.
 
Honestly 16 isn't even all that believable without a new memory controller. The 780M is quite bandwidth constrained as it is and I'm not sure even dual channel 8533 would be enough.
Technically a 256-bit interface would represent quad-channel memory and thus double the bandwidth of the current dual-channel solutions. But that would also increase costs quite a bit, for the motherboards and chips, which is why we haven't seen this in the past.

I'd be curious to see modeling of how a 128-bit LPDDR5x interface with 64MB of cache would fare in effective bandwidth when compared with current solutions as well as 256-bit LPDDR5x without such a large cache. My bet is the larger cache would result in similar effective bandwidth and much lower costs, with a similar die size as well, so I'd expect that to happen rather than 256-bit.

But we shall see. Both are viable options, and 64MB cache plus 256-bit would have even higher effective bandwidth. It's really just a question of whether companies and consumers would be willing to pay the necessary premiums to get there.
 
  • Like
Reactions: thestryker
Technically a 256-bit interface would represent quad-channel memory and thus double the bandwidth of the current dual-channel solutions. But that would also increase costs quite a bit, for the motherboards and chips, which is why we haven't seen this in the past.
That's why I mentioned needing a new memory controller first as AMD doesn't have a quad channel memory controller which supports LPDDR/X.
I'd be curious to see modeling of how a 128-bit LPDDR5x interface with 64MB of cache would fare in effective bandwidth when compared with current solutions as well as 256-bit LPDDR5x without such a large cache. My bet is the larger cache would result in similar effective bandwidth and much lower costs, with a similar die size as well, so I'd expect that to happen rather than 256-bit.

But we shall see. Both are viable options, and 64MB cache plus 256-bit would have even higher effective bandwidth. It's really just a question of whether companies and consumers would be willing to pay the necessary premiums to get there.
I agree it seems like the cache route would be the way to go currently given what a quad channel memory controller would mean for the existing market. I bet a quad channel memory controller silicon wise would be cheaper for AMD, but it limits the usability and costs more on board design which I think would outweigh any other cost benefit.
 
  • Like
Reactions: gg83
Personally? I bet this is just some fanboy dreaming about what AMD might do, and that the actual product ends up far less potent than suggested. Even a 256-bit LPDDR5x interface would be rather surprising, and I wouldn't be surprised to see a standard 128-bit interface with maybe 20 CUs — half what this pipe dream susuggests.
Exactly my first thought. When I first got into PC's AMD was promising premium gaming performance, way before they bought ATI. Every year a new APU has been coming out we are wishing for a discrete GPU replacement. What we seem to get is a chip that can run current gen popular games at 1080 on decent settings. And thanks for all the work you put into Tomshardware Jarred!
 
  • Like
Reactions: JarredWaltonGPU
If these specs are correct...IF... There is no way something with a TDP like that would go in the likes of the steam deck or the ROG ally. I'd love something like this in 17" inch workstation (NO D-GPU) laptop with 2X M.2 ssd's and 64gb of ram for VM's. Before anyone suggests Intel be aware the "Big / little" configs are rubbish for my VM's. That way I don't have to buy heavy gaming machines to get a damn 17" screen AND higher end CPU.
 
  • Like
Reactions: kealii123
I've edited the text to correct this. Clearly, the "leaker" ("faker") who provided this information didn't do the math homework properly. As you note, a 256-bit LPDDR5x-8533 configuration would only yield 273 GB/s. Seems as though whoever created this wanted it to equal the Apple M2 Ultra or something and so just put "500 G/s" in there without figuring out exactly what that would mean.

Personally? I bet this is just some fanboy dreaming about what AMD might do, and that the actual product ends up far less potent than suggested. Even a 256-bit LPDDR5x interface would be rather surprising, and I wouldn't be surprised to see a standard 128-bit interface with maybe 20 CUs — half what this pipe dream suggests.
It could be (don’t hold me responsible for this, just trying to hypothesize the leaker’s 500 GB/s) that this is the “effective” bandwidth of the LPDDR5X and the L4 working together similar to how AMD showed slides on the “effective” bandwidth of the RX 6000 series with infinity cache included.
 
Man, I wish/hope this is real. To match the fps/graphics of a $500 PS5 you need an RTX 4080 laptop, and its hard to find one under $2k, and the total system power draw is over 200 watts. I just want a reasonably thin 17" laptop with that PS5/Xbox chip in it, and an NPU attached.