• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

News AMD-powered El Capitan is now the world's fastest supercomputer with 1.7 exaflops of performance — fastest Intel machine falls to third place on To...

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
all this compute power and all they can do is weapons test and design, i wonder how quick these machines could figure out for example the covid jab or protein folding for cancers and such things that really plague us as a species. they should be held accountable for wasting money a nuclear bomb destroys stuff if there is a nuclear war the earth is pretty screwed there is no upside, what more do you need to know
 
Would they beat these systems on the high precision benchmarks as well or only the lower precision AI specific benchmarks? (if I understand the metrics correctly)
Yes. Musk's Collosus is just using Nvidia Hoppers. Those have some serious fp64 (33.5 TFLOPS, compared to the MI300A's 61.3). He's got 2.24x as many of them, so the net TFLOPS works out to be higher. Plus, he talking about adding a couple hundred K more.

The system also has ridonculous interconnect bandwidth, thanks to NVLink, NVSwitch, and the latest iteration of Infiniband for the largest scale communication. It's a fully-fledged supercomputer, in every way, @adamXpeter .
 
  • Like
Reactions: JakeTheMate
all this compute power and all they can do is weapons test and design,
Also safe storage, which is something we really ought to care about, as well.

i wonder how quick these machines could figure out for example the covid jab or protein folding for cancers and such things that really plague us as a species.
A lot of them were used for these things. See my link in post #19.
 
  • Like
Reactions: P.Amini
Yes. Musk's Collosus is just using Nvidia Hoppers. Those have some serious fp64 (33.5 TFLOPS, compared to the MI300A's 61.3). He's got 2.24x as many of them, so the net TFLOPS works out to be higher. Plus, he talking about adding a couple hundred K more.

The system also has ridonculous interconnect bandwidth, thanks to NVLink, NVSwitch, and the latest iteration of Infiniband for the largest scale communication. It's a fully-fledged supercomputer, in every way, @adamXpeter .

It is specialized in training AIs.
Musk lies just as many times as he tells truth, but nVidia says Colossus is not using switched fabric Infiniband but Spectrum-X based on Ethernet, developed for AI.
 
  • Like
Reactions: JakeTheMate
It is specialized in training AIs.
That's the purpose for which he constructed the installation, but the building blocks he used are all dual-use for both AI and HPC. You can find plenty of people doing traditional HPC workloads on the exact same types of Hopper GPUs.

Musk lies just as many times as he tells truth, but nVidia says Colossus is not using switched fabric Infiniband but Spectrum-X based on Ethernet, developed for AI.
The Infiniband was my mistake. I mentioned it, since I know that's what Nvidia/Mellanox has pushed previously.

So, please educate me: what about Spectrum-X would make it less suitable for HPC?
 
That's the purpose for which he constructed the installation, but the building blocks he used are all dual-use for both AI and HPC. You can find plenty of people doing traditional HPC workloads on the exact same types of Hopper GPUs.


The Infiniband was my mistake. I mentioned it, since I know that's what Nvidia/Mellanox has pushed previously.

So, please educate me: what about Spectrum-X would make it less suitable for HPC?
Simply it was designed for a single purpose. So probably the most powerful AI trainer. I haven't seen anywhere mentioned it is planned for using in simulating the next gen batteries, spacecraft, or car structure.
Also, it's maker says Spectrum-X is for AI: https://www.nvidia.com/en-us/networking/spectrumx/
I know, AI is the buzzword of these days, but I really just want to see the benchmarks that support the claims listed above.
 
I'm not familiar with the design, but does it actually have an advanced storage system or is it just some form of NAND?
To serve up the AI training data, it's got to have some substantial storage infrastructure. Maybe it's not as write-optimized as you'd find in a typical supercomputer, although they're definitely going to be checkpointing the models, in case of hardware failures.