I can't find a lot of info about Streaming SVE, but what I've heard is that it's pretty much only useful for enabling SME. That leaves out a lot of core SVE/SVE2 functionality.
I am far from an expert so take what I say with a huge grain of salt, but from what I gather SSVE is an SME mode that allows for processing vectors rather than matrices in the SME/AMX unit. I can't remember if it is missing any capabilities of the standard SVE2 or not, it might, but I think a lot of the core functionality is there. But obviously there are performance implications (especially latency) of going to an accelerator rather than in-core and adding SVE, even to 128bit units, has advantages over NEON. It's been unclear exactly why ARM vendors, including ARM themselves, have been so slow to adopt SVE in their core designs. Fujitsu did (and will do so for SVE2 in their upcoming core) and there are rumors SVE2 is coming to ARM/Qualcomm/Apple designs soon. If so, then it's possible there were performance/capability needs that had to be addressed for normal CPUs that aren't common knowledge. It's also possible that NEON has simply been good enough. I don't know.
I think it depends on what. Zen 5 has a really strong SMT implementation and Halo has 16 P-cores with 32 threads. I'm sure that's going to win some benchmarks vs. the M4 Max's 12P + 4E cores.
If they were, Cinebench R24 would've been a prime candidate to do so (and it does in R23 more on that below) - as a CPU render it is basically embarrassingly parallel. It is possible that with enough power it could beat an M4 Max just as top end desktop Ryzens do, but the Halo's power curve isn't clear to me. I think they are slightly modified desktop CCD's? (also more on that below)
When Apple came out with the M1 series, especially the bigger ones, I was as thrilled as anyone. I actually predicted Apple would be first to implement on-package memory and I was right. We need someone else to follow, though. Not just the on-package part (which Intel did in Lunar Lake), but really the part about going wider. That's one of the reasons I'm excited about Halo - not so much for what it is, but for the door it opens.
Absolutely. I agree. Personally, as someone who does CUDA work, I'm most looking forwards to the rumored Nvidia-Mediatek devices later this year (the already announced DIGITS might not be what I'm interested in, but that's another topic). But yes, I hope AMD's foray here is also successful.
Thanks for the link. For some reason, Halo does much better on efficiency in CB23 Multi.
Cinebench is an ... interesting benchmark. It's based on the Redshift rendering engine. The NoteBookCheck article is slightly wrong in saying that R23 isn't ARM native. It's true that isn't native on Windows on ARM so Qualcomm chips have to emulate it, but R23 was technically macOS/AS native. However, the initial port of Redshit to macOS/AS must not have gone smoothly as initial CB R23 testing by Andrei on M1 Max revealed some weird power behavior in his initial Anandtech review and it didn't jive with the performance figures from other renderers that had been ported at the time. He contacted the developers who said they understood the problem and were working on a fix. They told him what the problem was but asked him not to publicly disclose it (a lot of people surmise it was making poor use of the vector processors but again not known for sure). Unfortunately I can't link to this because it was a Twitter thread that has been long since nuked. Regardless of what it was, they certainly fixed it and then some as CB went from Apple Silicon's worst benchmark in R23 to one of its best in R24.
So pedantically it would be more correct to for NBC to have said that R23 is non-native for ARM-Windows and non-optimized (though native) for ARM-Mac. That being the case and rendering being so amenable to hyperthreading in general and AMD doing hyperthreading better than Intel (and Intel not doing it all recently) means AMD and the Halo in particular gets a great mix of performance and efficiency in R23 relative to its competitors, beating Apple in performance and everyone else in efficiency (and often performance too).
Where the above article does have GPU comparisons vs. Apple, Halo hangs pretty well with the M4 Pro.
More than hangs well - the Halo GPU should beat the Pro's in the most cases (there a couple of exceptions). The Halo's CPU is good, but they emphasized the GPU more than Apple. So from what I can tell it's got an M4 Pro-like CPU (though whose power envelope may be able to be pushed much higher) and a GPU that lies somewhere in between the Pro and Max, which especially for the PC market is definitely the right call. Honestly I think Apple could probably push their GPUs a little bigger myself - though I know some developers who are CPU heavy and love the M4 Pro precisely because it has an incredibly capable CPU without having to pay for a massive GPU they don't want. That's obviously one of the disadvantages of an SOC approach - it's more difficult to tailor to different use cases (there is some possibility that might change as packaging tech evolves - I have very faint hopes even for the M5 Pro/Max based on rumors - but it'll never be as flexible as a fully disaggregated system over PCIe).
Source? I had thought its CCDs were the exact same as desktop & server Zen 5, although I don't have that on particularly good authority. Do you know anything about what its GPU/IO die is using?
I can't imagine AMD would've made N4X CCDs for just this product. If they did it, then they must be planning on doing a Zen 5+, later this year.
I thought the standard Zen 5 desktop CCDs were N4X?
According to Wikipedia (which jives with what I read elsewhere):
Zen 5c server: N3E
Zen 5 desktop CCD (also Halo CCD?): N4X
Zen 5 mobile (Strix Point, also Halo CCD?): N4P
Zen 5 IOD: N6
Not listed is the Halo's IOD die with the GPU which
techpowerup said was one of the 5nm nodes but it didn't specify which.
No one seems to say exactly which compute die the Halo's CCDs were manufactured on, so I'm assuming it's the desktop ones, but Tom's in the article whose comment section we are in is saying the die shot of the Halo CCD is extremely similar to but slightly different from the desktop ones. So I'm not sure. I first assumed the would go the same route as Strix Point and use N4P but more/bigger cores. Then when it became more clear it was similar to the desktop structure, I assumed they would reuse the same dies as the Zen 5 Desktop (which I assume they will for Fire Range?). Now for Halo I'm not 100% certain what's going on, but I'm going to assume it's basically a tweaked desktop die on N4X but it could be N4P (N4X and N4P I think are rule-compatible so it might be not hard to port a design from one to the other).
But, to see its CPU performance benefit from the wider memory interface, you probably do have to explore the top end of its envelope.
I think it depends on what the power curve looks like.
Strix Point for instance tops out after a certain point. To use CB R24 as an example: to go from 1022pts to 1166pts costs 58% more power and from 1166 to 1213 costs another 48% power increase. In contrast, we know desktop Ryzen can push beyond the Max's performance with the same 16-core setup as the Halo (although at very high power levels). Unfortunately NBC didn't do a similar power curve in their Halo analysis article as they did in the Point's (they might add more data later), so it isn't clear how it behaves. So more points on both sides?
Yeah, but it was never going to be a screamer. I think the main benefits of RDNA4 are going to be in AI and RT. A GPU that size probably still won't have enough RT performance to be very compelling and it's already got AI hardware.
I dunno actually. You might be right. I think people who buy Apple SOCs for home rendering projects probably do indeed go for the 40-GPU core Max which is not what the Halo chip is competing against, but even at the Halo's level I could see a market there. That is one of the big use cases for high VRAM personal systems.
Anyway, thanks for the detailed reply. I appreciate the info.
My pleasure!
EDIT: This review says $2300 for the ASUS model which appears to be the same as the one NBC is quoting 2500 Euros for:
Performance like an RTX 4060, but it’s packed into a slightly heavy Surface-like.
arstechnica.com
Is the detachable keyboard included in the price? I dunno, someone is either wrong or maybe just Europe is getting the short end of the stick?