News Nvidia shows off Rubin Ultra with 600,000-Watt Kyber racks and infrastructure, coming in 2027

Admin

Administrator
Staff member
The article said:
whatever comes after could very well push beyond 1MW per rack
So, what's the energy-density of say... an aluminum smelter?
: D

P.S. I'd love to see the home gaming PCs of whoever designed these water-cooling setups:

kt6Rh3H4AhKmywUdrnWNX3.jpg


So, how does it work, when you want to pull one of those blades out of a rack? Is there like a valve you hit that first blows pressurized air into the tubing, so that water doesn't go everywhere when you yank it out?
 
  • Like
Reactions: JarredWaltonGPU
So, how does it work, when you want to pull one of those blades out of a rack? Is there like a valve you hit that first blows pressurized air into the tubing, so that water doesn't go everywhere when you yank it out?
QD-fittings, same as we've been using since watercooling was fully custom with DIY blocks and car radiators (that had to sit outside the chassis).
 
Lots of power for no viable return.
The word on the street is that companies are getting paid $1 per million tokens generated. I don't know if that's accurate, but a traditional LLM might generate 500~1000 tokens for a response. A reasoning model (DeepSeek R1-671B is given as the example) can require 15~20 times more tokens per response. So, not only does it cost more to run, but it needs more hardware to run faster, and presumably people (companies, mostly) would pay more to access its features.

How many tokens per day can an NVL72 rack generate? Seems like Nvidia is saying 5.8 million TPS per MW. With a single NVL72 using about 125kW, that would mean 0.725 TPS roughly. Or 62,640 million tokens per day. Which at $1 per million would be nearly $63K. Minus power costs and other stuff, let's just call it $60K per day potential income?

That would be a potential ROI of maybe 75 days, give or take, if you're able to run that flat out 24/7. It's unlikely to be sustained, but I do think there's a lot more investment here than any of the gamers think. Hundreds of billions are going into AI per year, and that could scale to trillions per year.

And tokens aren't just about ChatGPT. Robotics and image generation and video generation and more can all distill down to tokens per second. Financial analysts will use AI to help model things and decide when to buy/sell. Doctors and the medical field will use AI. Cars will use AI. Our phones already use AI. The list of where AI is being used is pretty much endless.

Jensen has said, repeatedly, that data centers are becoming power constrained. So now it's really a question of tokens per watt, or tokens per MW. And most process node improvements are only giving maybe 20% more efficiency by that metric, while software and architectural changes can provide much greater jumps.

As with so many things, we'll need to wait and see. Does AI feel like a bubble? Sure. But how far will that bubble go? And will it every fully pop? Bitcoin has felt like far more of a bubble for 15 years now, but it keeps going. I would not bet against Nvidia and Jensen, personally. Gelsinger saying Nvidia got "lucky" is such BS, frankly... sour grapes and lack of an ability to push through important projects. Larrabee could have gone somewhere, if Gelsinger and Intel had been willing to take the risk. But it went against the CPU-first ethos and so it didn't happen.
 
  • Like
Reactions: bit_user
I would not bet against Nvidia and Jensen, personally. Gelsinger saying Nvidia got "lucky" is such BS, frankly... sour grapes and lack of an ability to push through important projects. Larrabee could have gone somewhere, if Gelsinger and Intel had been willing to take the risk. But it went against the CPU-first ethos and so it didn't happen.
Yeah, the only minor quibble I have is that Gelsinger needed to do more than just "push through". I mean, they did actually just forge ahead with Xeon Phi using x86 cores. What they needed to do was design the best architecture for the task, not slap together a Frankensteinian card with spare parts from the bin (which they literally did, to some extent - first using Pentium P54C cores, and then using Silvermont Atom cores).

Instead, they assumed the HPC market would prioritize ease of programmability & legacy compatibility above all else, in spite of many decades of history, where the HPC market embraced some fairly odd and esoteric computing architectures. The big difference between the HPC market vs. traditional computers is that the upside of deviating from a mainstream CPU was much bigger than for something like your desktop PC. Enough to justify porting or rewriting some of the software. Again, there's a history of that, so it's not like they couldn't have predicted such an outcome.

At the time, I was following Xeon Phi and about the closest it ever came to Nvidia, on paper, was about a factor of 2 (i.e. half the performance) of Nvidia's datacenter GPUs' fp64 TFLOPS. However, what I've read from people who actually tried to program Xeon Phi is that you couldn't get anywhere near its theoretical performance. So, the reality might've been that they were closer to being behind by like a factor of 5 or 10. And that was with a bit of effort.

I'll bet Intel thought at the time that, "if you build it, they will come" has an extremely poor track record, in computing. Thus, simply making a better architecture is no guarantee of its success. However, Nvidia didn't simply build GPUs for HPC - they aggressively pushed CUDA and shoved free & discounted hardware on lots of universities and grad students. It's no accident that AI researchers embraced CUDA. Nvidia didn't wait for them to come knocking, Jensen went out and found them.
 
Last edited:
  • Like
Reactions: JarredWaltonGPU
Yeah, the only minor quibble I have is that Gelsinger needed to do more than just "push through". I mean, they did actually just forge ahead with Xeon Phi using x86 cores. What they needed to do was design the best architecture for the task, not slap together a Frankensteinian card with spare parts from the bin (which they literally did, to some extent - first using Pentium P54C cores, and then using Silvermont Atom cores).

Instead, they assumed the HPC market would prioritize ease of programmability & legacy compatibility above all else, in spite of many decades of history, where the HPC market embraced some fairly odd and esoteric computing architectures. The big difference between the HPC market vs. traditional computers is that the upside of deviating from a mainstream CPU was much bigger than for something like your desktop PC. Enough to justify porting or rewriting some of the software. Again, there's a history of that, so it's not like they couldn't have predicted such an outcome.

At the time, I was following Xeon Phi and about the closest it ever came to Nvidia, on paper, was about a factor of 2 (i.e. half the performance) of Nvidia's datacenter GPUs' fp64 TFLOPS. However, what I've read from people who actually tried to program Xeon Phi is that you couldn't get anywhere near its theoretical performance. So, the reality might've been that they were closer to being behind by like a factor of 5 or 10. And that was with a bit of effort.

I'll bet Intel thought at the time that, "if you build it, they will come" has an extremely poor track record, in computing. Thus, simply making a better architecture is no guarantee of its success. However, Nvidia didn't simply build GPUs for HPC - they aggressively pushed CUDA and shoved free & discounted hardware on lots of universities and grad students. It's no accident that AI researchers embraced CUDA. Nvidia didn't wait for them to come knocking, Jensen went out and found them.
Yeah, absolutely. Every time Gelsinger trots out Larrabee, I tend to think, "Yeah, but..." and all of the stuff you've listed above. Larrabee was a pet project that in truth wasn't designed well. It was a proof of concept that just didn't interest the Intel executives and board at the time. Which is why I say the "luck" stuff is a bunch of BS.

Nvidia has worked very hard to get where it is. AI didn't just fall into Nvidia's lap haphazardly. As you say, Nvidia went out and found customers. More than that, it saw the potential for AI right when AlexNet first came out, and it hired all those researchers and more to get ahead of the game. If AlexNet could leapfrog the competition for image classification back then, where might a similar approach gain traction? That was the key question and one Nvidia invested heavily into answering, with both hardware and software solutions.

On a related note: Itanium. That was in theory supposed to be a "from the ground up" design for 64-bit computing. The problem was that Intel again didn't focus on creating the best architecture and design. Or maybe it did try to do that and failed? But too often Intel has gotten caught with its feet in two contradictory worlds. You can't do x86 on one hand and a complete new "ideal" architecture on the other hand and have them work together perfectly. Larrabee was the same story for GPUs: x86 on the one hand due to "familiarity" with GPU-like aspects on the other hand for compute.

What's interesting is to hear about where CUDA started. I think Ian Buck was the main person, and he had a team of like four people maybe. He said something about that in a GTC session yesterday. It started so small, but it was done effectively and ended up becoming the dominant force it is today.
 
  • Like
Reactions: bit_user
That future includes GPU servers that are so powerful that they consume up to 600kW per rack.

That future includes GPU servers that are so power hungry that they consume up to 600kW per rack.

Small typo in that sentence.

We are seriously in the 1950's automotive era of performance in computing right now. Bigger and more power hungry by any means necessary, efficiency be damned. Throwing more power at something is not the same as making it better. It goes 1mph faster than last year's model, what does it matter if it gets half the fuel efficiency?
 
That future includes GPU servers that are so power hungry that they consume up to 600kW per rack.

Small typo in that sentence.

We are seriously in the 1950's automotive era of performance in computing right now. Bigger and more power hungry by any means necessary, efficiency be damned. Throwing more power at something is not the same as making it better. It goes 1mph faster than last year's model, what does it matter if it gets half the fuel efficiency?
No, we're really not. We've hit the end of Dennard scaling and easy lithography upgrades. A 20% improvement in performance per watt is all we can expect from the hardware, on its own, so now we need better architectures and software algorithms.

The reason anyone is even willing to consider 600kW racks is that companies need the compute. We have entered a new era of needing way more compute than we currently have available. And if you look at performance per watt, all the new Blackwell Ultra B300, Rubin, and Rubin Ultra aren't just adding more compute and scaling to higher power; they're more efficient.

Rubin Ultra is basically 4X the number of GPUs per rack. So yes, that's 4X the power (and then some), but 6X the performance. If you increase power by 20% and increase performance by 50%, that's better efficiency, which is not the same as being power hungry.

Could you cut power use? Sure. Maybe with lower clocks and voltages you cut performance 30% and reduce power 40% or something. But a ton of the power is going to memory and interchip communications, not just to the GPU cores. So I'm not sure you really can get the same efficiency gains by running these data center chips at lower clocks and voltages as you'd get if we were talking about single GPUs.
 
  • Like
Reactions: bit_user
And tokens aren't just about ChatGPT. Robotics and image generation and video generation and more can all distill down to tokens per second. Financial analysts will use AI to help model things and decide when to buy/sell. Doctors and the medical field will use AI. Cars will use AI. Our phones already use AI. The list of where AI is being used is pretty much endless.
When we unlearn to thinking because LLMs we will become incapable of even asking the right questions of it. We will ruin our civilization with these LLMs.
 
On a related note: Itanium. That was in theory supposed to be a "from the ground up" design for 64-bit computing. The problem was that Intel again didn't focus on creating the best architecture and design. Or maybe it did try to do that and failed?
I think Itanium was the product of Intel trying to get its head around how they could approach GPU-scale computing with a CPU-like architecture. They could see that silicon was reaching a point where we needed to go increasingly parallel and they correctly identified that it was inefficient to do that with an out-of-order architecture. The problem is that they wanted one architecture for all forms of computing, not a bifurcation into serial-optimized P-cores and vector-optimized SIMD-cores (i.e. GPUs). At least, that's one way of trying to understand what happened with IA64.

I think IBM's Cell also was an example of trying to create something GPU-like, from the perspective of a CPU architect. IMO, it actually has more in common with modern NPUs.

What's interesting is to hear about where CUDA started. I think Ian Buck was the main person, and he had a team of like four people maybe. He said something about that in a GTC session yesterday. It started so small, but it was done effectively and ended up becoming the dominant force it is today.
Yeah, I followed GPU computing (or so-called GPGPU), back in the early days. It started out with a few scattered researchers developing tools and even custom programming languages, but they were incredibly limited. The early GPUs with programmable shaders had stringent limitations on how big your shaders could be, what kind of control flow they could have, etc.

What happened next is that GPUs became more and more flexible and general-purpose. It's a little weird to see them basically going backwards, with more fixed-function hardware creeping in for AI and ray tracing.
 
  • Like
Reactions: JarredWaltonGPU
We are seriously in the 1950's automotive era of performance in computing right now. Bigger and more power hungry by any means necessary, efficiency be damned. Throwing more power at something is not the same as making it better. It goes 1mph faster than last year's model, what does it matter if it gets half the fuel efficiency?
If you run the numbers, you'll see that newer GPUs actually are more efficient at AI. That's obscured by the fact that demand is so out of control (and production capacity so limited) that there's an incentive to squeeze as much performance per mm^2 of silicon as possible. That's what's pushing clock speeds and power budgets seemingly without bound. The market isn't content simply to gain more performance at the same rate as efficiency is improving and the hardware is so expensive that it still dwarfs the cost of power & cooling.

Again, performance is increasing faster than power consumption. Therefore, they are more efficient.
 
  • Like
Reactions: JarredWaltonGPU