News Google Launches AI Supercomputer Powered by Nvidia H100 GPUs

Ravestein NL

Great
Jan 26, 2023
32
27
60
Google's A3 GPU supercomputer is the company's latest platform for powering LLMs.

Google Launches AI Supercomputer Powered by Nvidia H100 GPUs : Read more
I have my reservations about all the AI's growing in the wild.
Who is checking what data the AI has access to?
Who is checking if the answers the AI gives on certain questions is factually correct?
In other words I'm afraid that this tech can easily be used to feed people only the data that's "allowed" and true facts can easily stay hidden.
There is always someone pushing the buttons and is that someone to be trusted?
Everybody knows that self regulation in big tech companies doesn't exist!
 
  • Like
Reactions: gg83
May 11, 2023
1
0
10
I call BS. Frontier is now the fastest computer in the world and just hit 1.7 exaflops. Nvidia says that this computer is 15 times faster?
 

samopa

Distinguished
Feb 12, 2015
205
56
18,660
I call BS. Frontier is now the fastest computer in the world and just hit 1.7 exaflops. Nvidia says that this computer is 15 times faster?

They say that A3 is "delivering 26 exaFlops of AI performance", so most likely it is not usual (normal) floating point operations.

Normal Floating Point Operation usually take 80 bit data (or more) per operations, but in AI, such precision is not needed, hence they (usually) use TF32 or even FP8 data format for floating point operations. So, assuming they used FP8 to produce this result, its only need 1/10 performance (hence 10 time faster) than normal floating point operations.

They also do not disclosed how many A3 are used to produce this result.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
@Strider379 ,

They say that A3 is "delivering 26 exaFlops of AI performance", so most likely it is not usual (normal) floating point operations.

Normal Floating Point Operation usually take 80 bit data (or more) per operations, but in AI, such precision is not needed, hence they (usually) use TF32 or even FP8 data format for floating point operations. So, assuming they used FP8 to produce this result, its only need 1/10 performance (hence 10 time faster) than normal floating point operations.
The standard format to use for HPC is to use fp64. I think 80-bit is kind of a weird x87 thing.

A quick search for the H100 specs shows a single chip delivers "up to 34 TFLOPS" @ fp64. Using fp64 tensors, you can achieve 67 TFLOPS. To reach the EFLOPS range, you have to drop all the way to fp8 tensors, at which point it delivers 3958 TFLOPS.

They also do not disclosed how many A3 are used to produce this result.
Actually, they do. They said it incorporates an undisclosed number of 4th Gen (Sapphire Rapids) Xeons (I'd guess 2x) and 8x H100 accelerators. That actually puts it rather below the theoretical peak throughput of 31.7 EFLOPS, although I'd guess the peak presumes boost clocks and what Google is actually reporting is the sustained performance.
 

samopa

Distinguished
Feb 12, 2015
205
56
18,660
Actually, they do. They said it incorporates an undisclosed number of 4th Gen (Sapphire Rapids) Xeons (I'd guess 2x) and 8x H100 accelerators. That actually puts it rather below the theoretical peak throughput of 31.7 EFLOPS, although I'd guess the peak presumes boost clocks and what Google is actually reporting is the sustained performance.
They say the number of H100 used per A3, but they do not say how many A3 they use ;)