News Intel launches $299 Arc Pro B50 with 16GB of memory, 'Project Battlematrix' workstations with 24GB Arc Pro B60 GPUs

Admin

Administrator
Staff member
IF it ends up being a good AI GPU, you could fit four in a typical PC chassis as long as your motherboard has sufficient slots. Four PCIE slots means 96GB VRAM which is a lot for handling AI on a home server and only be about $2000 for GPU's. Sounds like fun!

... Calling Wendell at Level1Techs? :)
 
IF it ends up being a good AI GPU, you could fit four in a typical PC chassis as long as your motherboard has sufficient slots. Four PCIE slots means 96GB VRAM which is a lot for handling AI on a home server and only be about $2000 for GPU's. Sounds like fun!

... Calling Wendell at Level1Techs? :)
Unfortunately GPUs don't scale like CPUs and even in CPUs scaling was never easy or for free (Amdahl's law).

So just taking 4 entry level GPUs to replace one high-end variant doesn't just work out of the box, if even with a carefully taylormade model for that specific hardware: the bandwidth cliff between those GPUs slows performance to CPU-only levels, which are basically PCIe bandwidth levels, mostly.

There is a reason Nvidia can sell those NVLink switching ASICs for top dollar, because they provide the type of bandwidth it takes to somewhat moderate the impact of going beyond a single device.
 
I predict a future as bright and long as Xeon Phi, only much accelerated.

This tastes much more like a desperate attempt to moderate the fall of stock prices than honest delusions, of which Intel had aplenty.

Not even the best and greatest single shot product really has much of a chance in that market. Unless the vast majority of potential buyers truly believe that you have a convincing multi-generation roadmap which you'll be able to execute, nobody is going to invest much thought, let alone commit budgets and people in that direction, just look at how hard it is already for AMD.

And they have a little more to offer than one entrly level GPU chip that fails to sell even with a high-price competition.
 
  • Like
Reactions: JRStern
Good to see Intel taking the lead on dedicated AI GPUs for the consumer market. Pricing is pretty amazing. Hope to see more coverage of these once they come to market.
With all due respect, I’m not sure how this is consumer, AI-dedicated, or leading?
  • B50/B60 are explicitly part of the workstation line up, and run professional drivers
  • The standard display pipelines are enabled, and they’re equipped with the normal amount of outputs
  • 2025Q3, with “full enablement” in Q4 is hardly first to the party when it comes to using workstation GPUs for AI
Two B60s on one PCB is a neat twist that was only being done in rack mount servers and not tower workstations, but if one B60 costs $500 I’d expect the dual version to cost 2x plus a premium for complexity and density.
 
>There is a reason Nvidia can sell those NVLink switching ASICs for top dollar,

No one would argue that Intel B-series would be competitive with Nvidia AI products in the pro market. It's a product for a different market, being priced at thousands of dollars as vs tens or hundreds of thousands.

There's groundswell of enthusiast and startup interest in client-side LLMs/AIs. As of now, there's really no product that serves that segment as vendors (Nvidia mostly) are busy catering to the higher-end segment. So it's not a question of high vs low, but more a question of something vs nothing.

I put your concern to a Perplexity query, which follows (excuse the verbiage). I don't pretend to be familiar with the minutiae of the processes mentioned, but it's fairly evident that speed concerns can be optimized to some degree, and what's offered is better than what's available now.
=====
GPUs do not scale as easily as CPUs due to bandwidth and communication bottlenecks--especially when using PCIe instead of high-bandwidth interconnects like NVLink. Simply combining multiple entry-level GPUs rarely matches the performance of a single high-end GPU, because the inter-GPU communication can become a major bottleneck, negating much of the potential speedup.

How vLLM Addresses This Challenge

vLLM implements several strategies that can help mitigate these scaling limitations:

1. Optimized Parallelism Strategies


2. Memory and Batching Optimizations


3. Super-Linear Scaling Effects


4. Practical Recommendations

. For best scaling, use high-bandwidth GPU interconnects (NVLink, InfiniBand) if available.

. On systems limited to PCIe, prefer pipeline parallelism across nodes and tensor parallelism within nodes to reduce communication overhead.
I have been very much into evaluating the potential of AIs for home use as part of my day job. And for a long time I've been excited about the fact that a home-assistant can afford to be rather less intelligent than an AI that's supposed to replace lawyers, doctors, scientists or just programmers: most servants came from a rather modest background and were only expected to perform a very limited range of activities, so even very small LLMs which could fit into a gaming GPU might be reasonable and able to follow orders over the limited domain of your home.

But the hallucinations remain a constant no matter what model size, and very basic facts of life and the planet are ignored to the point, where I wouldn't trust AIs to control my light switches.

The idea that newer and bigger models would heal those basic flaws has been proven wrong for already several generations and across the range of 1-70B, which hints at underlying systematic issues.

Perhaps there could be a change perhaps at 500B or 2000B, but I have no use for the "smarts" a model like that would provide, certainly the expense wouldn't offset the value gained and that's only if hallucinations could indeed be managed: without a change in the approach, there is no solution in sight, reasoning and mixture of expert models aren't really doing better and walk off a hallucination cliff with invented assumptions.

Note that all of the above approaches described by Perplexity adress model design and model training, which no end-user can afford to do: that would be like doing genetic engineering to create perfect kids for doing chores.

The best you can realistically do for your private AI servant is to get open source models and then provide them with all the context they need to serve you, via RAG or whatever.

And those models won't come taylor-made for Intel Battlematrix, at best Intel might publish one or two demo variants as "proof". What would pay for the effort of just keeping a well known open source model even compatible with this niche and complex hardware base? Let alone something new at the level of a Mistral, Phi, Llama or DeepSeek?
. Tune batch sizes and memory allocation parameters to maximize utilization without overloading communication channels.

Accept that some inefficiency is unavoidable with entry-level hardware and slow interconnects, but vLLM's optimizations will help you get closer to optimal performance than naive multi-GPU setups.
1% improvement per GPU is already "closer", just not enough value return on the invest: at this point AIs are resorting to tautologies.
=====

>This tastes much more like a desperate attempt to moderate the fall of stock prices than honest delusions, of which Intel had aplenty.

Wow. Now you are veering into conspiracy theory and juvenile fanboy territory. So much for hopes of a productive talk.
With 45 years as an IT professional, 20 of those in technical architecture, and the last 10 years in technical architecture for AI work in a corporate research lab, I'm not quite juvenile any more, nor am I much of a fan, or boy, or fanboy: don't mistake my harshness here for ignorance.

I've followed Intel's 80432, the first graphics processor they licensed from NEC during 80286 times, the "Cray on a chip" iAPX850, Itanium and Xeon Phi working in HPC publicly funded research institutes during my thesis and later as part of a company that manufactured HPC computers. I've met and known the guys who designed them for 15 years.

I've had the privilege of being able to put technology which I was enthusiastic about to the test and getting paid for that.

Surprises do happen, but e.g. with regards to the potential of AMD's Zen revival, my prediction (of success) actually proved some of those people wrong.

Most importantly, I've run and benchmarked AI models for performance and scalability myself and supported a much larger team of AI researchers and model designers to do that across many AI domains, not just LLMs.

But I've also done significant LLM testing for the last 2 years, again with a focus on scalability, but also the quality impact of distinct numerial formats for weight representation, quantizations and model size.

That experience has made my very much a sceptic, and that's not the result I was hoping for: I really want my AI servants! But I want them to be loyal, valuable, and not to kill me before I tell them to.

So I feel rather comfortable in my prediction, ...or rather quite a lot of discomfort at how far off any plausible path to success Intel is straying here: desperation becomes the more likely explanation than sound engineering.

But let's just revisit this "product" in 1/2/5 years and see who projected its success better, ok?
 
Last edited:
I predict a future as bright and long as Xeon Phi, only much accelerated.

This tastes much more like a desperate attempt to moderate the fall of stock prices than honest delusions, of which Intel had aplenty.

Not even the best and greatest single shot product really has much of a chance in that market. Unless the vast majority of potential buyers truly believe that you have a convincing multi-generation roadmap which you'll be able to execute, nobody is going to invest much thought, let alone commit budgets and people in that direction, just look at how hard it is already for AMD.

And they have a little more to offer than one entrly level GPU chip that fails to sell even with a high-price competition.
Disagree here, there is certain workloads (LLM) which are memory hungry (higher precision models) and the battlemage architecture supplies sufficient throughput to make this a viable product especially in the price range these are marketed at ($300 for a B50). The B580 is capable of 30 tokens per second, compare that to Chatgpt, which charges ~1$ per million tokens, a B580 would produce 30*3600*24=2.5 million tokens a day, so equivalent of ~$2.50. Roughly pay for itself in ~100 days, payoff slower if electricity cost is factored in. This has a lot of parallels to crypto workloads.

These professional cards come with more vram by default and the B50 should be capable of 24 tokens/s, B60 same as the b580, and the dual version double that [60 tokens/s](it scales quite well as long as bandwidth between units is high).

No one else is offering this much memory bandwidth at this price, intel is serving an unfulfilled market and these should be as hard to find as a b580- due to the lack of competition. While it won't game as well b570 or b580 especially, but for these will perform exceptionally well for professional workloads as long as you don't need CUDA. As these are 1/4 or less the price of the competing RTX pro, these should do quite well.
 
Last edited:
Disagree here, there is certain workloads (LLM) which are memory hungry (higher precision models) and the battlemage architecture supplies sufficient throughput to make this a viable product especially in the price range these are marketed at ($300 for a B50). The B580 is capable of 30 tokens per second, compare that to Chatgpt, which charges ~1$ per million tokens, a B580 would produce 30*3600*24=2.5 million tokens a day, so equivalent of ~$2.50. Roughly pay for itself in ~100 days, payoff slower if electricity cost is factored in. This has a lot of parallels to crypto workloads.

These professional cards come with more vram by default and the B50 should be capable of 24 tokens/s, B60 same as the b580, and the dual version double that [60 tokens/s](it scales quite well as long as bandwidth between units is high).

No one else is offering this much memory bandwidth at this price, intel is serving an unfulfilled market and these should be as hard to find as a b580- due to the lack of competition. While it won't game as well b570 or b580 especially, but for these will perform exceptionally well for professional workloads as long as you don't need CUDA. As these are 1/4 or less the price of the competing RTX pro, these should do quite well.
I agree here
 
Two B60s on one PCB is a neat twist that was only being done in rack mount servers and not tower workstations, but if one B60 costs $500 I’d expect the dual version to cost 2x plus a premium for complexity and density.
Given that it's a PCIe 5.0 x8 GPU which doesn't need a ton of power I'm guessing the only thing they share is power delivery, display output routing and cooling. That shouldn't particularly add much of anything to the cost of manufacture so I'd bet any premium would be based on market demand.
 
Given that it's a PCIe 5.0 x8 GPU which doesn't need a ton of power I'm guessing the only thing they share is power delivery, display output routing and cooling. That shouldn't particularly add much of anything to the cost of manufacture so I'd bet any premium would be based on market demand.
They do not share power delivery.
edit- the gamers nexus video shows that they're essentially 2 full graphics cards (GPU, mem, PD and such) put onto the same PCB
 
I predict a future as bright and long as Xeon Phi, only much accelerated.
LOL

I was going to ask what is an inference workstation, ... I mean in practice, in theory sure yeah fine and I expect there will be such and also expect them to fit in the physical packaging and power profile of a typical smart phone, but this isn't that.
 
Seems they have been impressed with the market reaction to battlemage in gaming and interestingly the non-gaming segment to move on a push like this. It's pretty hard to get BM at MSRP since January so it will be interesting to see how these are received. Saw the Gamers Nexus video, and the multiple GPU platform definitely brings back some happy memories.
 
Disagree here, there is certain workloads (LLM) which are memory hungry (higher precision models) and the battlemage architecture supplies sufficient throughput to make this a viable product especially in the price range these are marketed at ($300 for a B50). The B580 is capable of 30 tokens per second, compare that to Chatgpt, which charges ~1$ per million tokens, a B580 would produce 30*3600*24=2.5 million tokens a day, so equivalent of ~$2.50. Roughly pay for itself in ~100 days, payoff slower if electricity cost is factored in. This has a lot of parallels to crypto workloads.

These professional cards come with more vram by default and the B50 should be capable of 24 tokens/s, B60 same as the b580, and the dual version double that [60 tokens/s](it scales quite well as long as bandwidth between units is high).

No one else is offering this much memory bandwidth at this price, intel is serving an unfulfilled market and these should be as hard to find as a b580- due to the lack of competition. While it won't game as well b570 or b580 especially, but for these will perform exceptionally well for professional workloads as long as you don't need CUDA. As these are 1/4 or less the price of the competing RTX pro, these should do quite well.
Token/s isn't a fixed measure per GPU. A B580 doesn't just do 30 tk/s, it is very much dependent on the model size and the weight representation. INT4 will generally give you 2x INT8, 4x BF16 and 8x FP32 token numbers because you use correspondingly less VRAM to hold the weights, if your GPU is modern enough (quality may be another issue). But if you compensate by using bigger models or if you fill 24GB of VRAM instead of 12GB using the same chip and bus, that means halving the effektive token rate. In terms of performance you can think of LLMs doing a sequential pass through all weights per token. And all current GPU designs are mostly constrained by RAM bandwidth, not their compute performance. Actually you might not even see 50% GPU core loads on LLMs, because they are just waiting for data, even with on-the-fly data type conversions or mixed precision calculations, which older GPUs couldn't handle and CPUs would struggle with.

A dual GPU setup with 48GB is actually very likely to be constained by the PCIe interface to the point where it's delivering singe digit token performance and splitting the models in such a way as to lessen the PCIe bottleneck is very hard even if you can redesign (and train) it to match.

And no, the interface between the two Intel GPUs can't be any better than what the individual chips can offer, which would be PCIe v5 x8: they don't have the equivalent of an NVlink port. And there is a good chance that GPU-to-GPU transfers would just use the two halves of a PCIe v5 x16 slot and be the exact same speed as two single cards in different slots.

If you'd want to go and play with that type of setup I can only recommend you try that with two RTX 50* first, much easier to sell one on if the results disappoint.

It's easy to claim that AMD and Nvidia are all about ripping off potential GPU customers who want double or quad sized VRAM GPUs. I don't see that that is fully wrong.

On the other hand GPU vendors also know that there are diminishing returns and token rates below 20/s, where interaction just becomes too painfully slow for most humans to tolerate. So they actually know how boards like that would be review bombed and sell very poorly, because only very few could derive value from within that niche. Remember the RTX 4060ti with 16GB getting thrashed?

But don't believe me, just go ahead and buy yourself. I've tried with V100 years ago, and with an RTX 4090 an RTX 4070 last year using llama.cpp, all kinds of models and all kinds of layer distributions. As soon as you exhaust the VRAM capacity of a single GPU, you might as well just run the entire model on the CPU and DRAM.