News AMD's game-changing Strix Halo APU, formerly Ryzen AI Max, poses for new die shots

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
>This makes me moist... Happily snap one up in a laptop if it's the right price.

It will never be at the "right" price. Per the name Strix HALO, it's a halo product. Halo products and value are oxymorons. As the saying goes, if you have to ask (about the price), it's not for you.

Yes, everybody is going gaga over cool factor of the mega-iGPU. I was, too, last year--at least the idea of it. But as idea hits reality, from the hefty premium, to the mismatched capability to use-case, the bubble bursts, and I'm left with a case of lukewarm meh.

Strix Halo, at least in its instatiation in Asus Rog Flow Z13, is a lousy product. Gaming perf is mediocre--comparable to 4060 which is an entry-level dGPU, but it's more expensive than last year's Z13 with a 4070 dGPU. 13" screen is too small for a primary gaming laptop. It's heavy, 3.57lb + 1.26lb for power brick (per NotebookCheck). Its fans are loud. So, where are the advantages over a laptop w/ dGPU?

Halo will never be cost-competitive as a gaming product, because it will never get the economy of scale to be a cost-effective replacement for (entry-level) dGPUs. With 120W max TDP, it will never reach its its full potential in a laptop form factor (mobile workstation excepted).

As well, nix desktop PC, as Halo doesn't make sense for desktop CPUs + dGPUs where space isn't a constraint. So that leaves mini-PC as the only option.

The mini-PC market itself is niche, with mostly boutique vendors from Shenzhen. Asus is the only large vendor, and as with Rog Flow, any Asus mini-PC w/ Halo will be at a steep premium--all so you can get an equivalent of the entry-level 4060. FIY, Asus already sells a NUC w/ 4060 for $1269 on Amazon.

The only USP (unique selling prop) I see for Halo is for AI, for its ability to share integrated system memory. For this to make sense, the config would need to be 64GB or 128GB. 32GB would not cut it. But then, the setup would be compute-constrained for any serious work, as Halo is just the equiv of a 4060.

To sum, Strix Halo is another cool idea without a purpose. It's basically a tech demo, like laptops with foldable screens.
 
I'm not 100% sure what the non-functional SVE subset you refer to is, could you clarify? The M4 does have SSVE and SME using the AMX units which is Apple's currently preferred 512bit solution.
I can't find a lot of info about Streaming SVE, but what I've heard is that it's pretty much only useful for enabling SME. That leaves out a lot of core SVE/SVE2 functionality.

Also ... yes the Halo has more cores than the M4 Max, but not the performance of one.
I think it depends on what. Zen 5 has a really strong SMT implementation and Halo has 16 P-cores with 32 threads. I'm sure that's going to win some benchmarks vs. the M4 Max's 12P + 4E cores.

As you point out in a later post yourself, Apple may have changed the PC game first*** and created the permission for AMD to try this themselves in a PC chip, but I agree that AMD creating is still a huge step, a game-changer for non-Macs certainly, and based on reviews it seems to be successful technically****. Hopefully it will be commercially as well.
When Apple came out with the M1 series, especially the bigger ones, I was as thrilled as anyone. I actually predicted Apple would be first to implement on-package memory and I was right. We need someone else to follow, though. Not just the on-package part (which Intel did in Lunar Lake), but really the part about going wider. That's one of the reasons I'm excited about Halo - not so much for what it is, but for the door it opens.

Thanks for the link. For some reason, Halo does much better on efficiency in CB23 Multi.

**Obviously gaming, especially the huge library of non-native Mac games, is going to be a massive advantage for the Halo over an M4 even beyond what the benchmarks say.
Where the above article does have GPU comparisons vs. Apple, Halo hangs pretty well with the M4 Pro.

But I think it is using the N4X node rather than the N4P for the CPUs, yes?
Source? I had thought its CCDs were the exact same as desktop & server Zen 5, although I don't have that on particularly good authority. Do you know anything about what its GPU/IO die is using?

I can't imagine AMD would've made N4X CCDs for just this product. If they did it, then they must be planning on doing a Zen 5+, later this year.

People keep saying they want to see it run at higher power settings, but I'd like to see more tests and wall power measurements at lower power setting.
But, to see its CPU performance benefit from the wider memory interface, you probably do have to explore the top end of its envelope.

Also, for the GPU it's a pity the timing couldn't have worked out to make this RDNA 4. Obviously that wasn't possible, but, like with the M3/4, the extra ray tracing capabilities would've also been very cool to have with all the potential VRAM.
Yeah, but it was never going to be a screamer. I think the main benefits of RDNA4 are going to be in AI and RT. A GPU that size probably still won't have enough RT performance to be very compelling and it's already got AI hardware.

Anyway, thanks for the detailed reply. I appreciate the info.
 
I can see what other manufacturers skipped out, it costs way too freaking much. I bought a Lenovo Legion 4070 laptop for $1300 last year when it was on sale. A high end 13" ipad or android tablet is about $700 on sale right now. Both combined are less than this thing. Plus I can use the tablet as a second monitor when I travel.

I think they could sell a normal size 15-16" laptop with the AI MAX 395 with 128GB of ram for $2500 to developers working on AI. 96GB of VRAM is no joke. It would sell like hotcakes because those big companies have the pockets to pay for it. I'm not sure why they feel the need to make a 14" ultra thin and light laptop or a 13" tablet with this chip.
 
Last edited:
It will never be at the "right" price. Per the name Strix HALO, it's a halo product. Halo products and value are oxymorons. As the saying goes, if you have to ask (about the price), it's not for you.
I hope that will change, especially if this replaces their mobile workstation / desktop replacement tier.

It's heavy, 3.57lb
For a gaming laptop, that's sure not heavy. My 15" work laptop has a dGPU weaker than Halo's and that thing (Dell Precision) weighs 2-3 pounds more!

So, where are the advantages over a laptop w/ dGPU?
Theoretical advantages are:
  • size & weight - due to not needing a separate package + cooling solution for a dGPU.
  • power-savings - due to GPU integration into the CPU and not having dedicated GDDR memory.
  • cost - single package, single cooler, single memory. Okay, so you need like dual LPCAMMs, if not on-package memory. Is that so different than laptops that had dual SO-DIMM slots, not long ago?
  • AI - can certainly benefit from memory bandwidth & capacity of the wider interface.
  • CPU performance - should see a modest uptick from higher bandwidth. Maybe only a few %, but performance is performance.

IMO, the long term prospects for this class of product look good. The key question for me is mainly whether AMD will get enough uptake on this iteration for there to be others.

To sum, Strix Halo is another cool idea without a purpose. It's basically a tech demo, like laptops with foldable screens.
Yes it's niche, but nothing like foldable screens.
 
  • Like
Reactions: dada_dave
With what CPU, though? The 395 has 16c/32t of full Zen 5 goodness. If you don't need that much CPU performance, then it is actually overkill.

I bought mine with the 7745hx (8core 16th) but you can get it with the 7945hx(16c 32th) for a little more. This was last year. Lenovo Legion AMD laptops are on hiatus right now while they transition to the new 9xxx series CPU's. I bought it for games, so 8 cores was better: https://www.lenovo.com/us/en/p/lapt...RSCkdcWJaV6v3XPic2CrjmpvCEqcuOJycY#tech_specs
 
  • Like
Reactions: bit_user
I can't find a lot of info about Streaming SVE, but what I've heard is that it's pretty much only useful for enabling SME. That leaves out a lot of core SVE/SVE2 functionality.
I am far from an expert so take what I say with a huge grain of salt, but from what I gather SSVE is an SME mode that allows for processing vectors rather than matrices in the SME/AMX unit. I can't remember if it is missing any capabilities of the standard SVE2 or not, it might, but I think a lot of the core functionality is there. But obviously there are performance implications (especially latency) of going to an accelerator rather than in-core and adding SVE, even to 128bit units, has advantages over NEON. It's been unclear exactly why ARM vendors, including ARM themselves, have been so slow to adopt SVE in their core designs. Fujitsu did (and will do so for SVE2 in their upcoming core) and there are rumors SVE2 is coming to ARM/Qualcomm/Apple designs soon. If so, then it's possible there were performance/capability needs that had to be addressed for normal CPUs that aren't common knowledge. It's also possible that NEON has simply been good enough. I don't know.

I think it depends on what. Zen 5 has a really strong SMT implementation and Halo has 16 P-cores with 32 threads. I'm sure that's going to win some benchmarks vs. the M4 Max's 12P + 4E cores.

If they were, Cinebench R24 would've been a prime candidate to do so (and it does in R23 more on that below) - as a CPU render it is basically embarrassingly parallel. It is possible that with enough power it could beat an M4 Max just as top end desktop Ryzens do, but the Halo's power curve isn't clear to me. I think they are slightly modified desktop CCD's? (also more on that below)

When Apple came out with the M1 series, especially the bigger ones, I was as thrilled as anyone. I actually predicted Apple would be first to implement on-package memory and I was right. We need someone else to follow, though. Not just the on-package part (which Intel did in Lunar Lake), but really the part about going wider. That's one of the reasons I'm excited about Halo - not so much for what it is, but for the door it opens.

Absolutely. I agree. Personally, as someone who does CUDA work, I'm most looking forwards to the rumored Nvidia-Mediatek devices later this year (the already announced DIGITS might not be what I'm interested in, but that's another topic). But yes, I hope AMD's foray here is also successful.

Thanks for the link. For some reason, Halo does much better on efficiency in CB23 Multi.

Cinebench is an ... interesting benchmark. It's based on the Redshift rendering engine. The NoteBookCheck article is slightly wrong in saying that R23 isn't ARM native. It's true that isn't native on Windows on ARM so Qualcomm chips have to emulate it, but R23 was technically macOS/AS native. However, the initial port of Redshit to macOS/AS must not have gone smoothly as initial CB R23 testing by Andrei on M1 Max revealed some weird power behavior in his initial Anandtech review and it didn't jive with the performance figures from other renderers that had been ported at the time. He contacted the developers who said they understood the problem and were working on a fix. They told him what the problem was but asked him not to publicly disclose it (a lot of people surmise it was making poor use of the vector processors but again not known for sure). Unfortunately I can't link to this because it was a Twitter thread that has been long since nuked. Regardless of what it was, they certainly fixed it and then some as CB went from Apple Silicon's worst benchmark in R23 to one of its best in R24.

So pedantically it would be more correct to for NBC to have said that R23 is non-native for ARM-Windows and non-optimized (though native) for ARM-Mac. That being the case and rendering being so amenable to hyperthreading in general and AMD doing hyperthreading better than Intel (and Intel not doing it all recently) means AMD and the Halo in particular gets a great mix of performance and efficiency in R23 relative to its competitors, beating Apple in performance and everyone else in efficiency (and often performance too).

Where the above article does have GPU comparisons vs. Apple, Halo hangs pretty well with the M4 Pro.
More than hangs well - the Halo GPU should beat the Pro's in the most cases (there a couple of exceptions). The Halo's CPU is good, but they emphasized the GPU more than Apple. So from what I can tell it's got an M4 Pro-like CPU (though whose power envelope may be able to be pushed much higher) and a GPU that lies somewhere in between the Pro and Max, which especially for the PC market is definitely the right call. Honestly I think Apple could probably push their GPUs a little bigger myself - though I know some developers who are CPU heavy and love the M4 Pro precisely because it has an incredibly capable CPU without having to pay for a massive GPU they don't want. That's obviously one of the disadvantages of an SOC approach - it's more difficult to tailor to different use cases (there is some possibility that might change as packaging tech evolves - I have very faint hopes even for the M5 Pro/Max based on rumors - but it'll never be as flexible as a fully disaggregated system over PCIe).

Source? I had thought its CCDs were the exact same as desktop & server Zen 5, although I don't have that on particularly good authority. Do you know anything about what its GPU/IO die is using?

I can't imagine AMD would've made N4X CCDs for just this product. If they did it, then they must be planning on doing a Zen 5+, later this year.
I thought the standard Zen 5 desktop CCDs were N4X?
According to Wikipedia (which jives with what I read elsewhere):

Zen 5c server: N3E
Zen 5 desktop CCD (also Halo CCD?): N4X
Zen 5 mobile (Strix Point, also Halo CCD?): N4P
Zen 5 IOD: N6
Not listed is the Halo's IOD die with the GPU which techpowerup said was one of the 5nm nodes but it didn't specify which.

No one seems to say exactly which compute die the Halo's CCDs were manufactured on, so I'm assuming it's the desktop ones, but Tom's in the article whose comment section we are in is saying the die shot of the Halo CCD is extremely similar to but slightly different from the desktop ones. So I'm not sure. I first assumed the would go the same route as Strix Point and use N4P but more/bigger cores. Then when it became more clear it was similar to the desktop structure, I assumed they would reuse the same dies as the Zen 5 Desktop (which I assume they will for Fire Range?). Now for Halo I'm not 100% certain what's going on, but I'm going to assume it's basically a tweaked desktop die on N4X but it could be N4P (N4X and N4P I think are rule-compatible so it might be not hard to port a design from one to the other).

But, to see its CPU performance benefit from the wider memory interface, you probably do have to explore the top end of its envelope.
I think it depends on what the power curve looks like. Strix Point for instance tops out after a certain point. To use CB R24 as an example: to go from 1022pts to 1166pts costs 58% more power and from 1166 to 1213 costs another 48% power increase. In contrast, we know desktop Ryzen can push beyond the Max's performance with the same 16-core setup as the Halo (although at very high power levels). Unfortunately NBC didn't do a similar power curve in their Halo analysis article as they did in the Point's (they might add more data later), so it isn't clear how it behaves. So more points on both sides? :)

Yeah, but it was never going to be a screamer. I think the main benefits of RDNA4 are going to be in AI and RT. A GPU that size probably still won't have enough RT performance to be very compelling and it's already got AI hardware.

I dunno actually. You might be right. I think people who buy Apple SOCs for home rendering projects probably do indeed go for the 40-GPU core Max which is not what the Halo chip is competing against, but even at the Halo's level I could see a market there. That is one of the big use cases for high VRAM personal systems.

Anyway, thanks for the detailed reply. I appreciate the info.

My pleasure!

EDIT: This review says $2300 for the ASUS model which appears to be the same as the one NBC is quoting 2500 Euros for:


Is the detachable keyboard included in the price? I dunno, someone is either wrong or maybe just Europe is getting the short end of the stick?
 
Last edited:
  • Like
Reactions: bit_user
I can't imagine AMD would've made N4X CCDs for just this product. If they did it, then they must be planning on doing a Zen 5+, later this year.
There is so much conflicting information regarding AMD's node usage on Zen 5 I still haven't gotten to the bottom of it. I was trying to figure out what Strix Halo could have used last night and all I can still say for sure is it's one of TSMC's N4 nodes. Various places claim either N4X or N4P being used in both desktop and mobile. I don't know whether or not TSMC's silicon interconnect can easily be used with different nodes, but I'm guessing Strix Halo is likely all using the same node either way. I agree that AMD is very unlikely to be using a different node just for these CCDs so it's most likely going to be all N4P (just basing on what I consider the most reliable sources for Zen 5 node).

Not that there's really any reason for them to do so, but AMD could just update their specs to say the actual node rather than family!
 
There is so much conflicting information regarding AMD's node usage on Zen 5 I still haven't gotten to the bottom of it. I was trying to figure out what Strix Halo could have used last night and all I can still say for sure is it's one of TSMC's N4 nodes. Various places claim either N4X or N4P being used in both desktop and mobile. I don't know whether or not TSMC's silicon interconnect can easily be used with different nodes, but I'm guessing Strix Halo is likely all using the same node either way. I agree that AMD is very unlikely to be using a different node just for these CCDs so it's most likely going to be all N4P (just basing on what I consider the most reliable sources for Zen 5 node).

Not that there's really any reason for them to do so, but AMD could just update their specs to say the actual node rather than family!
I'm going with the other one - that Halo and desktop is N4X, while Point is N4P - but agreed it's very confusing with a lot of conflicting/vague/missing data. So if it came out that it's all N4P I wouldn't exactly be surprised. Agreed it would be nice if AMD just said the node name, just for clarity for those small number of us who care. :)
 
I am far from an expert so take what I say with a huge grain of salt, but from what I gather SSVE is an SME mode that allows for processing vectors rather than matrices in the SME/AMX unit. I can't remember if it is missing any capabilities of the standard SVE2 or not, it might, but I think a lot of the core functionality is there. But obviously there are performance implications (especially latency) of going to an accelerator rather than in-core and adding SVE, even to 128bit units, has advantages over NEON.
I'm going to refrain from making any further statements about SSVE, SME, or AMX until I actually know more about it, which is not a priority for me at the moment.

It's been unclear exactly why ARM vendors, including ARM themselves, have been so slow to adopt SVE in their core designs.
They did use it in their Neoverse V1 cores, found in the previous generation of Amazon's Graviton CPUs. SVE2 is now included in all of Arm's own ARMv9-A cores. I thought it was now a mandatory part of v9, until I read the about Apple's M4 not supporting it.

there are rumors SVE2 is coming to ARM/Qualcomm/Apple designs soon.
The lack of it is one reason I'm less likely to buy a Snapdragon X laptop. For me, one of the draws of getting an Arm machine would be to play with SVE2.

It's also possible that NEON has simply been good enough. I don't know.
ARM has claimed that just porting the same code from NEON (128-bit) to a 128-bit implementation of SVE2 is good for about a 20% performance boost.

I think I read somewhere that Apple and Nuvia/Qualcomm were dragging their feet on implementing SVE, because they felt Arm botched it in some way. I don't know if Streaming SVE addressed their concerns, or if it was simply unavoidable for Apple to implement SME without it.

If they were, Cinebench R24 would've been a prime candidate to do so (and it does in R23 more on that below) - as a CPU render it is basically embarrassingly parallel.
I understand why you think that, but the issue with Cinebench is that it's a floating-point workload and those don't generally benefit from SMT because one thread can already do quite a good job at saturating a core's backend. If you wanted to see the benefits of Ryzen's SMT, in its best light, a much better benchmark would probably be to look at something like compilation.

Absolutely. I agree. Personally, as someone who does CUDA work, I'm most looking forwards to the rumored Nvidia-Mediatek devices later this year (the already announced DIGITS might not be what I'm interested in, but that's another topic). But yes, I hope AMD's foray here is also successful.
Oh, Digits is super interesting, IMO. I wonder exactly how that came about. Was it intended as their first gen Windows-on-ARM platform, but then something about the Windows aspect was taking too long, so they just decided to release the hardware before it became obsolete?

IMO, it's too expensive for what it does, unless you need that amount of memory to be directly accessible from a GPU/NPU at those speeds.

That's obviously one of the disadvantages of an SOC approach - it's more difficult to tailor to different use cases
That's where chiplets (or "tiles" as Intel calls them) come in!

Thanks for more great info!
 
I can see what other manufacturers skipped out, it costs way too freaking much. I bought a Lenovo Legion 4070 laptop for $1300 last year when it was on sale. A high end 13" ipad or android tablet is about $700 on sale right now. Both combined are less than this thing. Plus I can use the tablet as a second monitor when I travel.

I think they could sell a normal size 15-16" laptop with the AI MAX 395 with 128GB of ram for $2500 to developers working on AI. 96GB of VRAM is no joke. It would sell like hotcakes because those big companies have the pockets to pay for it. I'm not sure why they feel the need to make a 14" ultra thin and light laptop or a 13" tablet with this chip.
Why not a Strix Point with 256 but bus and integrated memory to 64GB of LPDD5X8533. This would allow the 890M to shine and still cost a lot less than any Halo.

I see Halo as a great workstation laptop option if your company is paying for it.
 
  • Like
Reactions: bit_user
I've only seen these rumors today, is Intel spreading them?

It's a 307mm^2 die, basically the same size as the Navi 31 GCD, and a lot smaller than what Nvidia has made on these nodes.
Isn't that the die size of GPU only? With CPU the total size is around 450mm2. Also the GPU (around 300mm2) is being compared to 4070 laptop (similar performance to dekstop 4060) i havent check what exact gpu being used for 4070 laptop but 4060 desktop which is based on AD107 the die size is around 150mm2.
 
Isn't that the die size of GPU only? With CPU the total size is around 450mm2. Also the GPU (around 300mm2) is being compared to 4070 laptop (similar performance to dekstop 4060) i havent check what exact gpu being used for 4070 laptop but 4060 desktop which is based on AD107 the die size is around 150mm2.
The CPU chiplets have very high yields and do not affect the yields of the largest chiplet, which is the I/O one holding the iGPU. This is one of the big advantages of using chiplets.
 
  • Like
Reactions: bit_user
The only USP (unique selling prop) I see for Halo is for AI, for its ability to share integrated system memory. For this to make sense, the config would need to be 64GB or 128GB. 32GB would not cut it. But then, the setup would be compute-constrained for any serious work, as Halo is just the equiv of a 4060.
That's not really accurate. Compute itself isn't as big a deal for LLMs as being able to get the entire model into high bandwidth memory that the compute resource can get to. It's why you can run the full unquanted Deepseek R1 on slow but wide CPU compute on a local dual socket Epyc 7000 system and still get decent it/s from it - it's the 8 channel RAM that matters, the compute is secondary. You can get a 70b model running on multiple Tesla P40 cards that only put out around 9 TOPS and get enough iterations to be usable. A 128GB 395 will be several times faster that that with its NPU alone, not even considering the GPU compute.
 
  • Like
Reactions: bit_user
Isn't that the die size of GPU only? With CPU the total size is around 450mm2.
Yes, that's what the article says.

Also the GPU (around 300mm2) is being compared to 4070 laptop (similar performance to dekstop 4060) i havent check what exact gpu being used for 4070 laptop but 4060 desktop which is based on AD107 the die size is around 150mm2.
According to Wikipedia, the mobile RTX 4070 uses AD106, which is 190 mm^2.
Not sure why you went with AD107, as we're talking a 70-tier GPU and the AD107 is literally Ada's weakest die. To be honest, I'm quite surprised it's not the same AD104 die that the PCIe card RTX 4070 uses.
 
Yes, that's what the article says.


According to Wikipedia, the mobile RTX 4070 uses AD106, which is 190 mm^2.
Not sure why you went with AD107, as we're talking a 70-tier GPU and the AD107 is literally Ada's weakest die. To be honest, I'm quite surprised it's not the same AD104 die that the PCIe card RTX 4070 uses.
That's just how Nvidia has been selling their mobile chips for a few generations, they use one die under the desktop with the same name so they can have name parity with the desktop parts at a given price tier. It's not like they can stick a real 450W 4090 die in a mobile part, but they still want their mobile SKU to have the same name so they basically "promote" the 4080 die when using it for mobile. This works down the chain until you get to the bottom where the 4050m is basically only as fast as current mobile iGPUs from Intel and AMD.
 
It's not like they can stick a real 450W 4090 die in a mobile part, but they still want their mobile SKU to have the same name so they basically "promote" the 4080 die when using it for mobile.
I get that mobile can't run at the same power levels, but a lot of that can be controlled through lower clock speeds. I guess even then, the AD102 is a huge die and probably a bit too much for mobile at even ~1 GHz.