I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.
Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.
Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training). Say they're right. Then 1000 B200s is like 20,000 H100s or whatever.
But more than that ... see next item.
A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.
The hunger for huge numbers of GPUs came from Altman's "scale is everything!" mantra five years ago, but even the training for ChatGPT 4.o was done in four pieces, using rather less than 100k GPUs of slower vintage.
Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.
Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.
No doubt someone still wants to try their hand at mega-machine monotonic models, but the "scale, scale, scale" idea never made actual sense, when computational cost rises exponentially with scale, scale, scale. It's not the history of computation that stuff works like that, algorithms generally improve as fast or faster than hardware, things get exponentially easier.