News Nvidia announces Blackwell Ultra B300 —1.5X faster than B200 with 288GB HBM3e and 15 PFLOPS dense FP4

Admin · Mar 18, 2025

Nvidia officially revealed its Blackwell Ultra B300 data center GPU, which packs up to 288GB of HBM3e memory and offers 1.5X the compute potential of the existing B200 solution.

Nvidia announces Blackwell Ultra B300 —1.5X faster than B200 with 288GB HBM3e and 15 PFLOPS dense FP4 : Read more

DS426 · Mar 18, 2025

Gotta have those ultra halo products to firmly seat oneself as "#1".

I just have to say no. More marketing, more "WOW!" numbers, but it's really a distortion of the compute market and synthesizing Moore's Law into a reality that works if you ignore cost and efficiency scaling. Just not impressed when -- as far as "AI" has come -- a quarter of the earth's population still doesn't have access to fresh drinking water, and I don't see that needle changing rapidly.

The technologists and industrialists have horded wealth and IP to unimaginable levels. Kind of strange post for me, but I stand my ground that the more successful you are individually and as a organization, the more you contribute to society. This just screams of excess to me...and is only getting worse. Look at the gaming dGPU situation today, with both AMD and nVidia claiming that they had decent supply and are selling in record numbers.

If you don't have deep pockets, you're out. You - are - out. That's our "bright" future.

bit_user · Mar 19, 2025

The article said:
By leveraging FP4 instructions, using B300 alongside its new Dynamo software library to help with serving reasoning models like DeepSeek, Nvidia says an NV72L rack can deliver 30X more inference performance than a similar Hopper configuration. That figure naturally derives from improvements to multiple areas of the product stack, so the faster NVLink, increased memory, added compute, and FP4 all factor into the equation.

It's not multiplicative. The only way you get to multiply together any improvements is like if you manage to improve the overlap between compute and communication, so that computation isn't blocking on it, as much. That multiple could then be applied to the improvements gained from going to fp4, and the product would be the final speedup. Algorithmic improvements should also factor in, which I suspect a good chunk of that improvement is from. And the 30x number is almost certainly an outlier, rather than typical. Blackwell just isn't that much faster than Hopper.

Improvements in things like HBM bandwidth should only be enough to (hopefully) keep pace with the compute improvements. Same with NVLink. I doubt it will be less blocked on HBM or NVLink than Hopper was.

It could also be something like the increased HBM capacity enabling local storage of weights that previously had to be fetch from an attached GPU or Grace node.

JarredWaltonGPU · Mar 19, 2025

bit_user said:
It's not multiplicative. The only way you get to multiply together any improvements is like if you manage to improve the overlap between compute and communication, so that computation isn't blocking on it, as much. That multiple could then be applied to the improvements gained from going to fp4, and the product would be the final speedup. Algorithmic improvements should also factor in, which I suspect a good chunk of that improvement is from. And the 30x number is almost certainly an outlier, rather than typical. Blackwell just isn't that much faster than Hopper.

Improvements in things like HBM bandwidth should only be enough to (hopefully) keep pace with the compute improvements. Same with NVLink. I doubt it will be less blocked on HBM or NVLink than Hopper was.

It could also be something like the increased HBM capacity enabling local storage of weights that previously had to be fetch from an attached GPU or Grace node.

Yes, which is precisely what I'm saying. We haven't tested this. Nvidia isn't giving exact details. It's just saying it's 30X faster. Now, I personally think "increased memory" is a big part of the increased performance. It could also be the faster NVLink, but I don't know that it's as pertinent. And there's MIG and other stuff in play where perhaps the software is better able to distribute work across the GPUs. Someone else will have to test that; this is just what Nvidia has claimed.

jp7189 · Mar 20, 2025

30x only if you can squint hard enough to make an orange look like an apple... in the same vain as the 5070 being faster than a 4090... different compute process, different precision, etc. It's the reason to see hopper in the comparison... H100 doesn't support the same data types as B200 and B300.

As much as I hate that from a technical stand point, it's perfect for marketing because their target market for upgrades is H100 not B200.

Search

News Nvidia announces Blackwell Ultra B300 —1.5X faster than B200 with 288GB HBM3e and 15 PFLOPS dense FP4

Admin

Administrator

DS426

Commendable

bit_user

Titan

JarredWaltonGPU

Splendid

jp7189

Distinguished

TRENDING THREADS

Latest posts

Moderators online

Share this page