Review Framework Desktop review: AMD's Strix Halo in a petite, powerful, pricey PC

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Nope. Let's see an itemized spec sheet from you. Please include prices from authorized sellers with full warranty. Include links. And don't forget case and power supply, because that's what Framework includes.

If you're not going to do that much, then you're not serious and you're just blowing smoke.
Sorry, but I don't think you'd want to pay for the effort nor is it my obligation to make you happy (quite a few here would probably agree that's hard to do).

I used Geizhals to come up with the products and prices, which is € in Germany and includes 19% VAT and all applicable tariffs

Here is the CPU: https://geizhals.de/amd-epyc-9015-100-000001553-a3333155.html

And the mainboard: https://geizhals.de/supermicro-h13ssl-nt-retail-mbd-h13ssl-nt-o-a2952750.html

For quotes for DRAM you start here https://geizhals.de/?cat=ramddr3&xf=7500_DDR5~7501_DIMM~7761_RDIMM&sort=p

And I am quite happy to see millions disprove me, that might drive the price down via scale and I don't want AMD loosing money.

But I also want people to think about their choices and opportunities.
 
That is actually the entire point of Strix Halo: to provide a solution for specific workloads that easily tops the usual standard PC/X86 builds at that pricepoints, but without going deep into proper high-end workstation-territory.
Truth be said - while it certainly does lack some flexibility, it comes at an unbeatable price for what it provides.
You can look at Strix Halo as a poor man's "do everything"-machine that will allow for some LLM tinkering, blazing fast multimedia/music production, and even some modest gaming. And all of that without breaking the bank.
And, sorry to repeat myself, even in my eyes all that would good and fine, if a) that's where your use case is and b) they didn't charge an HBM level premium on commodity LP-DDR5.

Currently I see them work very hard on the hype to overcome common sense.
 
Another BS CPU choice. Only 8 cores. Again, to have equivalent CPU power, you'd need an EPYC model with at least 24, since they operate at substantially reduced clocks.

There is exactly 0% overlap between the market for this Framework compact desktop and that EPYC CPU (or systems built around it). Your false alternative of EPYC is fooling no one and the fact that you're still pushing this, in spite of obviously being smart enough to know better, is quite telling.

But I also want people to think about their choices and opportunities.
Not really, if what you're putting out into the world is suggestions of false alternatives.

I think you're just bitter that Strix Halo wasn't cheaper and/or more powerful. However, it was never going to be exactly cheap, since AMD won't cannibalize their mainstream (i.e. Strix Point) market. I'm not surprised they didn't make it more powerful, since it's their first foray into the "large iGPU" domain. This was clearly designed to compete with Mac Pro. That's all it is.
 
Last edited:
  • Like
Reactions: ottonis
>That is actually the entire point of Strix Halo: to provide a solution for specific workloads that easily tops the usual standard PC/X86 builds at that pricepoints, but without going deep into proper high-end workstation-territory.

I agree, with caveats. Framework Desktop (FWDT hereafter) is a good first step, but it has substantial limitations, and software support (Windows and Linux) is still very much WIP. For LLM use, it's very bleeding edge. $2K is well within the enthusiast budget, but I suspect its useful lifespan will be short, before more functional and finished options become available. Probably a year.

[Aside: I would love to see Intel jump into this edge AI space, as it said it would. Hoping to see Nova Lake AX becoming a reality.]

LLM speed (tokens/sec) has two components, prompt processing (commonly measured as pp512) and token generation (tg128). Strix Halo has decent memory bandwidth & memory size for token generation, but prompt processing requires high compute that comes with dGPU. That's why you see Wendell of Level1Tech using an eGPU via USB4 for KVcache. PP time is typically much smaller than TG time, so it's a small bottleneck objectively, but the latency, or time to first token (TTFT) is important in terms of perceived speed.

Similarly, Strix Halo isn't suited for imagegen like Stable Diffusion, which requires high compute that comes with a fast dGPU. Because SD is targeted toward consumer dGPUs, 16-32GB VRAM is sufficient for most operations including training.

This segues into FWDT's lack of expansion. It can't accommodate a dGPU. L1Tech's use of eGPU via USB4 (20Gpbs max) would be expensive, slow, and kludgy workaround. You can also use an M.2->OcuLink adapter which is faster, but you'd have a Franken-setup which defeats the purpose of a SFF box.

FWDT lacks high-speed interconnect, for those who want to try multinode cluster. Its 5GbE pales in comparison with GB10's ConnectX-7's 400Gbps throughput (dual node only). 10GbE would be a practical minimum.

The 112GB max memory (with latest Linux drivers) is enough to run dense 70B LLMs, and barely enough to run Llama 4 Scout at Q4, the smallest Llama 4 model. It's not enough for Qwen3-235B-A22B which requires 143GB at Q4. The trend of open LLMs is toward larger sizes and using MoE (mixture of experts). The 112GB limit is at an awkward "almost, but not quite" level to run the most interesting open LLMs. I understand the necessity of using soldered RAM, and 128GB is a cost trade-off made for the small edge AI market (GB10 also has 128GB limit). But 256GB would've been a more "comfortable" limit. I hope to see that for the next iteration.

WIP software: Strix Halo's NPU isn't currently usable except within AMD's Lemonade Server (Windows only). From an AMD dev on /r/locallama, it can't be run in conjunction with iGPU, but only on a separate task. Many other aspects are WIP.

I'm waiting for the release and review of the GB10 lineup, viz Asus Ascent GX10. At $3K, it's still within the enthusiast/small dev reach. It has same RAM size and similar memory bandwidth. Compute should be higher. Software support is better, and CUDA is in play. But it doesn't do Windows, and can't serve as a general-purpose system. N1X next year (similar to GB10) will do WoA, but may not be targeted at the same market as GB10.
 
I'm waiting for the release and review of the GB10 lineup
Yeah, Strix Halo was quite simply never intended to go after the same market as GB10.

You make a lot of good observations why it's a poor substitute, particularly in the form of Framework's Desktop. IMO, it's obvious that Framework's machine was conceived simply as a step up in horsepower from the typical mini desktop, without a specific application in mind.

Interestingly, I did see some benchmarks of Ryzen AI Max laptops that included Cinebench and Blender, also compared with "Fire Range" laptops. Fire Range is basically a desktop Zen 5 CPU in laptop packaging, probably with an accompanying boost in TDP. The margin of victory went about 20% in favor if Fire Range on Cinebench, but went almost as far in the other direction on Blender. Overall, Fire Range had a net advantage of about 9%, but no care was taken to run both laptops at the same TDP.
 
Better yet, Phoronix just dropped a 4-way comparison that included the 9950X. The GeoMeans came out that the 9950X was 5.1% faster, but averaged 69.6% more power consumption!

In particular, it's thrilling to see Ryzen AI Max+ 395 hitting even lower idle power than Intel (via the Core U9 285K: Arrow Lake's flagship)!

In terms of places where absolute performance is higher, I'm only seeing it in a small number of HPC & server benchmarks, as well as a slightly larger proportion of CPU-based AI inferencing. Not surprisingly, really. You also have to consider that it's the only CPU in the test set with LPDDR5X, which is higher-latency. So, something needs to be fairly bandwidth-starved to overcome the latency penalty.
 
Last edited:
Framework's announcement for the desktop in Feb showcased LLMs as a use-case. They even had a 4-unit cluster shown. I think the company knew full well of Strix Halo's potential as the first AI-in-a-box PC.

https://community.frame.work/t/introducing-the-framework-desktop/65008
There's a photo + one paragraph that talks about LLMs. That's very different than saying the product is fundamentally about AI, in the same way as something like Nvidia's GB10/Spark.

I think of Strix Halo as mostly for AI,
Just because you think of it that way doesn't mean that's what it was designed for! The design would've dated back too far, for this product really to be about client-side LLMs.
 
This alternative seems to go into the right direction in terms of where Strix Halo pricing should go, with €1500 for the 128GB model.

Knock off another €500 and color me moderately interested, simply because it also fits my typical µ-server use cases and won't be a total waste if AI use is too constrained.

So far Dragon Range (€450 for a Minisforum BD790i) and Fire Range boards only offer the more traditional dual channel desktop RAM bandwidth, but 24 lanes of PCIe v5 and thus a better and cheaper fit for that niche, while they should be very similar in terms of AMD production cost--unless you factor in the cost of bespoke IOD/CCD chip designs at perhaps too low a scale.

I tried talking the reviewer into running a core-to-core latency benchmark to progress on the Sea-of-Wires vs. Inifinity-Fabric topic, where Strix Halo may suffer much less from CCD-to-CCD cache latency overhead.

I was a bit concerned that AMD might have done some exclusivity deal e.g. with HP for Strix Halo, but either that's over or never was and perhaps they aren't artificially working against further price drops, apart from asking a premium price for a "upper-middle-class" commodity APU.
 
>That is actually the entire point of Strix Halo: to provide a solution for specific workloads that easily tops the usual standard PC/X86 builds at that pricepoints, but without going deep into proper high-end workstation-territory.

I agree, with caveats. Framework Desktop (FWDT hereafter) is a good first step, but it has substantial limitations, and software support (Windows and Linux) is still very much WIP. For LLM use, it's very bleeding edge. $2K is well within the enthusiast budget, but I suspect its useful lifespan will be short, before more functional and finished options become available. Probably a year.
Strix Halo isn't as much a design dictated by a well defined use case, but purely where an intersection of generally diverging technological trends happen to meet, once.

It really just trades traces used for PCIe lanes to double the data lanes for RAM. It seems a workable compromise at exactly this point, because it means you don't have to change any of the other commodity components from which you build systems.

But you can't just trade further traces, up or down to adjust to models demands growing gradually in different directions, e.g. going from the current 256 bit wide RAM bus, bit by bit or even byte by byte to 523, sacrificing what remains of PCIe or some USB signals etc.

Where a truly flexible architecture requires a switch, which unfortunately costs as many transistors and electrons as the APU itself, it hard-wires that single other workable permutation to save that, at the cost of a future, really.

EPYCs give you that flexibility via their giant IOD switch, but at (mostly) higher cost while the analog to the fruity cult "Ultra" variant, which might be another attractive performance point, doesn't have a Strix Halo equivalent.

That's why I judge it a solution looking for a problem to match its capabilites and nothing that has a workable evolutionary path ahead, as genius as AMD might be.
[Aside: I would love to see Intel jump into this edge AI space, as it said it would. Hoping to see Nova Lake AX becoming a reality.]
With Nova Lake Intel is already betting its entire innovative technology potential on duplicating a feature that AMD has well established in the market. And while it's been great to establish and hold a flagpole position, it's actually not a volume niche, not even for AMD. AMD did V-Cache for servers, the consumer parts are a great collateral, but if they didn't sell them in servers, perhaps AMD couldn't afford to make them just for gamers.

I don't see Intel selling those big cache CPU CCDs in servers, nor do I see mixed P/E designs succeeding in servers, so I believe who ever is left at intel is still chasing after the ghosts of ancient glory.
LLM speed (tokens/sec) has two components, prompt processing (commonly measured as pp512) and token generation (tg128). Strix Halo has decent memory bandwidth & memory size for token generation, but prompt processing requires high compute that comes with dGPU. That's why you see Wendell of Level1Tech using an eGPU via USB4 for KVcache. PP time is typically much smaller than TG time, so it's a small bottleneck objectively, but the latency, or time to first token (TTFT) is important in terms of perceived speed.

Similarly, Strix Halo isn't suited for imagegen like Stable Diffusion, which requires high compute that comes with a fast dGPU. Because SD is targeted toward consumer dGPUs, 16-32GB VRAM is sufficient for most operations including training.
Exactly: unless your workload matches ideally to that single optimum Strix Halo has to offer, its value burns very quickly.
This segues into FWDT's lack of expansion. It can't accommodate a dGPU. L1Tech's use of eGPU via USB4 (20Gpbs max) would be expensive, slow, and kludgy workaround. You can also use an M.2->OcuLink adapter which is faster, but you'd have a Franken-setup which defeats the purpose of a SFF box.
Since they traded those PCIe lanes and their v5 power budgets for the iGPU and the wider RAM bus, they are locked in to that single spot, neither EPYCs nor desktops can duplicate exactly, while they sweep a much broader range of use cases.
FWDT lacks high-speed interconnect, for those who want to try multinode cluster. Its 5GbE pales in comparison with GB10's ConnectX-7's 400Gbps throughput (dual node only). 10GbE would be a practical minimum.
Even with the likes of NV-link, any significant scale-out typically requires a bespoke re-design of the base model, unless we find a way to do quantum chip-to-chip links. The smaller you start in terms of GPUs, the more effort you have to invest there. It's not an accident Nvida makes so much money on NV-link ASICs, anyone can do GPUs these days.

And no, 10GbE for GPU scale-out is much like tryin to run a RAID through modems on a serial line (actually a bit better than a SATA line). And bandwidth alone isn't the issue, it's the latency overhead you need to eliminate on Ethernet, which is what Mellanox is good at, by actually talking Infiniband.
The 112GB max memory (with latest Linux drivers) is enough to run dense 70B LLMs, and barely enough to run Llama 4 Scout at Q4, the smallest Llama 4 model. It's not enough for Qwen3-235B-A22B which requires 143GB at Q4. The trend of open LLMs is toward larger sizes and using MoE (mixture of experts). The 112GB limit is at an awkward "almost, but not quite" level to run the most interesting open LLMs. I understand the necessity of using soldered RAM, and 128GB is a cost trade-off made for the small edge AI market (GB10 also has 128GB limit). But 256GB would've been a more "comfortable" limit. I hope to see that for the next iteration.
I am less optimistic and it's not just about the horrific hallucinations. The RTX Pro GDDRx professional series cards, which offer 4x the VRAM capacity of their gamer counterparts, typically match those gaming GPU bandwidths since the hardware is very much the same. But with "only" 1.5TB/s of bandwidth available on RTX 5090 and RTX Pro vs. the 8TB/s of their HBM brethren, token rates already fall below 'human speed', or what most people would tolerate for inference responses, once you actually use the full 96GB capacity.

And at that point Strix Halo would deliver only 1/8th of 1/4th, or 1/32th of what is the accepted performance base of HBM datacenter GPUs. At that point the sweat-shop conversions from China that double the VRAM of gamer GPUs seem more attractive, because the relationship of capacity vs bandwidth is a little better vs the Pro cards.
WIP software: Strix Halo's NPU isn't currently usable except within AMD's Lemonade Server (Windows only). From an AMD dev on /r/locallama, it can't be run in conjunction with iGPU, but only on a separate task. Many other aspects are WIP.
IMHO it's disinformation bordering on fraud: AMD is shooting themselves in the foot there.
I'm waiting for the release and review of the GB10 lineup, viz Asus Ascent GX10. At $3K, it's still within the enthusiast/small dev reach. It has same RAM size and similar memory bandwidth. Compute should be higher. Software support is better, and CUDA is in play. But it doesn't do Windows, and can't serve as a general-purpose system. N1X next year (similar to GB10) will do WoA, but may not be targeted at the same market as GB10.
I'm very glad I got paid rather well researching what you can do with on more pedestrian architectures. So the horrible quality of the LLM results were no issue, and I could recycle all the hardware I bought for other work and gaming.

Otherwise I can only recommend you spend your money elsewhere, as the baselines are going up faster than your ability to run even entry level public models.

Even with the support and budgets Wendel gets, his actual results are more optimism than anything one could sell at profit.

I'd rather have Wendel and others in a similar position dig in a little further, than people who need a new job spend their last pennies on a future that won't run on these machines.
 
Knock off another €500 and color me moderately interested, simply because it also fits my typical µ-server use cases and won't be a total waste if AI use is too constrained.
So, you expect the equivalent of a 9950X, with 128 GB of RAM, and a RX 7600-level GPU + case + power supply for €1000? Not to mention the lack of a NPU and the benefit of its added memory bandwidth. You're being unreasonable.

The operative question isn't what you might wish for. It's what we can realistically expect. You're not being realistic.

So far Dragon Range (€450 for a Minisforum BD790i) and Fire Range boards only offer the more traditional dual channel desktop RAM bandwidth,
That features a previous-generation Zen 4 CPU, no RAM, no case, no PSU, and no GPU.

Here, again, you simply cannot seem to resist the urge to make flawed comparisons.
 
It really just trades traces used for PCIe lanes to double the data lanes for RAM. It seems a workable compromise at exactly this point,
It's a laptop chip. The reason it doesn't have more PCIe lanes is because they're not useful for its primary market.

Just look at the paltry offering of PCIe lanes on Apple's Pro-tier of M-series SoCs. They're even worse.

AMD did V-Cache for servers, the consumer parts are a great collateral, but if they didn't sell them in servers, perhaps AMD couldn't afford to make them just for gamers.
Have they offered 3D cache in any Turin (i.e. Zen 5) models? If so, then I certainly missed it!

The RTX Pro GDDRx professional series cards, which offer 4x the VRAM capacity of their gamer counterparts,
More bad maths.

It's 3x. They get a 50% increase by using 24 Gb dies, instead of 16 Gb. Then, they double this 1.5x multiplier by having 2 dies per channel.
 
Last edited:

Latest posts