News AI GPU clusters with one million GPUs are planned for 2027 — Broadcom says three AI supercomputers are in the works

Admin · Dec 13, 2024

Broadcom expects three major hyperscale customers to deploy AI supercomputers with 1 million Broadcom-developed AI GPUs in 2027.

AI GPU clusters with one million GPUs are planned for 2027 — Broadcom says three AI supercomputers are in the works : Read more

JRStern · Dec 13, 2024

Can this possibly be true?
If so it is madness at an unprecedented level.
Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.

Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.

Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team, though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis.

gg83 · Dec 13, 2024

How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores. Just to run a crappy computer version of a 4 year old child with instant access to all the worlds twitter statements.

bit_user · Dec 13, 2024

The article said:
The company believes that in 2027, the serviceable addressable market (SAM) for AI XPU and networking will be between $60 and $90 billion, and the firm is positioned to command a leading share of this market.

Meanwhile, AMD and Intel are squabbling over table scraps.

I wonder about software support for Broadcom's AI accelerators. Do popular machine learning frameworks already have backends for them? I think they must, but it hasn't come up on my radar. This seems to be one of the issues AMD long struggled with, and Intel recently said Gaudi 3 will miss sales projections because its software support is running behind schedule. That makes me really curious what Broadcom has been doing!

Also, what are the chances they include a small version of one of the AI accelerator blocks in the next Raspberry Pi SoC?

bit_user · Dec 13, 2024

JRStern said:
Can this possibly be true?
If so it is madness at an unprecedented level.
Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.

I think your numbers are a bit high, but you're in the right order of magnitude. I doubt customers buying 1M units will pay that $30k list price, even in spite of the extremely high demand. I'd peg the final build cost of the facility at not much more than $30B, although that's a pretty insane amount of money. It makes me wonder what other sorts of construction projects have similar price tags!

JRStern said:
Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.

I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.

JRStern said:
Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team,

How do you figure that?

JRStern said:
though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis.

A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.

bit_user · Dec 13, 2024

gg83 said:
How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores.

They're not cores, in the CPU sense of the word. They're just ALU pipelines, but bundled into SIMD-32 blocks (SIMD-32 is 1024 bits, if you'd prefer to look at it that way). So, the H100 has 528 (enabled) SIMD-32 engines, arranged into 132 SMs (Shader Multiprocessors). So, depending on how you look at it, it's either 528 or 132 cores per GPU. The 528 number puts it on par with an Intel server core, since that has 2x AVX-512 FMA ports per core.

JRStern · Dec 13, 2024

bit_user said:
I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.

Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.

bit_user said:
How do you figure that?

Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training). Say they're right. Then 1000 B200s is like 20,000 H100s or whatever.

But more than that ... see next item.

bit_user said:
A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.

The hunger for huge numbers of GPUs came from Altman's "scale is everything!" mantra five years ago, but even the training for ChatGPT 4.o was done in four pieces, using rather less than 100k GPUs of slower vintage.

Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.

Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.

No doubt someone still wants to try their hand at mega-machine monotonic models, but the "scale, scale, scale" idea never made actual sense, when computational cost rises exponentially with scale, scale, scale. It's not the history of computation that stuff works like that, algorithms generally improve as fast or faster than hardware, things get exponentially easier.

bit_user · Dec 13, 2024

JRStern said:
Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.

Training is difficult to scale like that. There's a lot of communication, hence why NVLink is such a beast. Communication doesn't scale linearly, so by having lots more nodes, your communication overhead is going to increase by an even greater amount, possibly even to point where you spend more energy on communication than computation. That's why I think maybe using a solution like half or 1/3rd as fast might be viable, but 1/10th probably isn't.

JRStern said:
Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training).

No, they said it's only 4x as fast at training as Hopper.

https://www.tomshardware.com/pc-com...ute-and-massive-improvements-over-hopper-h100

JRStern said:
Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.

Link?

JRStern said:
Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.

There's a long-practiced concept called "transfer-learning", which is exactly what you're talking about. That's standard practice for at least 6 years.

RedBaron616 · Dec 14, 2024

Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media. I personally don't care as long as I am not required to help pay for more powerplants for them. I merely point out the hypocrisy.

bit_user · Dec 14, 2024

RedBaron616 said:
Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media.

No, I did not. I've heard lots of people are complaining about how much power AI us using. It's literally one of the top complaints I hear about it, even in non-techie contexts and media.

Even on this site, it's been covered quite heavily, from multiple different angles. Here's just a small sampling of recent articles discussing the subject:

RedBaron616 said:
I personally don't care as long as I am not required to help pay for more powerplants for them.

It always has to get paid for, somehow. As powerless consumers, we're sure to foot some of that bill whether we like it or not. I think pretty much the only thing you can do is just try to boycott AI-based services and features as much as you can, in order to make it less profitable for the companies using it.

JRStern · Dec 15, 2024

bit_user said:
No, they said it's only 4x as fast at training as Hopper.

https://www.tomshardware.com/pc-com...ute-and-massive-improvements-over-hopper-h100

In that same article:
"B200 ends up with theoretically 1.25X more compute per chip with most number formats that are supported by both H100 and B200."

NVDA puts out numbers based on chip, module, same, different, peak, total, etc. There are bits of validity to them all, and also bits of invalidity.

bit_user · Dec 15, 2024

JRStern said:
In that same article:
"B200 ends up with theoretically 1.25X more compute per chip with most number formats that are supported by both H100 and B200."

NVDA puts out numbers based on chip, module, same, different, peak, total, etc. There are bits of validity to them all, and also bits of invalidity.

You see right in the headline, where it says:

"The dual-die B200 GPU has 4X the AI training performance and 30X the inference performance of its predecessor."

Right? Couldn't be clearer.

The only time such big clusters are needed is for training. For inference, you only need a small collection of GPUs per instance (just enough to hold the weights in memory) and those nodes can be distributed. So, if we're talking about the cluster size needed by an individual team or project, the only thing that matters is training performance.

froggx · Dec 16, 2024

So... They figure out how to make money off AI yet? It's kinda like a gold rush, the only people that get rich are the ones selling picks and shovels and GPUs.

bit_user · Dec 16, 2024

froggx said:
So... They figure out how to make money off AI yet?

My employer is paying a subscription to some chatbot or another, because I guess they figure some employees will use them no matter what and they don't want company confidential information going to whatever random LLM some employee would decide use, otherwise. I've never tried it.

We also have an Enterprise Github account, which includes some Co-Pilot features. So, I guess those are partially subsidized by subscription fee. This is probably the model a lot of AI deployment will follow, where AI is used to enhance products and services people already use, and is paid for in that way.

JRStern · Dec 16, 2024

bit_user said:
You see right in the headline, where it says:

"The dual-die B200 GPU has 4X the AI training performance and 30X the inference performance of its predecessor."

Right? Couldn't be clearer.

Completely clear and not credible. I can find a dozen quotes each with different numbers.
What does it mean, a dual-die 4X? About 2X per die. Yes? Maybe rounded up from 1.25X. Etc.

froggx · Dec 18, 2024

bit_user said:
My employer is paying a subscription to some chatbot or another, because I guess they figure some employees will use them no matter what and they don't want company confidential information going to whatever random LLM some employee would decide use, otherwise. I've never tried it.

We also have an Enterprise Github account, which includes some Co-Pilot features. So, I guess those are partially subsidized by subscription fee. This is probably the model a lot of AI deployment will follow, where AI is used to enhance products and services people already use, and is paid for in that way.

That's what i've been seeing, but i can never tell if it's actually profitable to host an LLM service like that or if it's more of an attempt to show results for all the money being burned to get there.

I've always been a bit cautious about what I send to LLMs since it's all logged and ended up dabbling in running local LLMs as a result. Not as powerful as a huge multimodal like chatGPT, but if i'm trying to throw some code together then i don't need something that can tell me a story and sing me a song.

bit_user · Dec 18, 2024

froggx said:
if i'm trying to throw some code together then i don't need something that can tell me a story and sing me a song.

I've found that half of the time I've actually tried the code snippets produced by Google Gemini, in the search results, they don't even compile!

I'm sure it will get better, but I found it surprising they didn't even do that much to check its output.

froggx · Dec 18, 2024

bit_user said:
I've found that half of the time I've actually tried the code snippets produced by Google Gemini, in the search results, they don't even compile!

I'm sure it will get better, but I found it surprising they didn't even do that much to check its output.

That checks out. All my stuff is scripts: python, pwsh and bash, so i don't have to deal with compilation, but interpreters spit out errors just fine too. i've found that while AI can write around 80% of the code, it's only managed to do 20% of the work.

News AI GPU clusters with one million GPUs are planned for 2027 — Broadcom says three AI supercomputers are in the works

Administrator

Distinguished

Distinguished

Titan

Titan

Titan

Distinguished

Titan

Prominent

Titan

Distinguished

Titan

Distinguished

Titan

Distinguished

Distinguished

Titan

Distinguished

Share this page