News Nvidia Claims Doubled Inference Performance with H100

Admin · Sep 11, 2023

Nvidia unveils TensorRT-LLM software that promises a major boost for LLM interference.

Nvidia Claims Doubled Inference Performance with H100 : Read more

JarredWaltonGPU · Sep 11, 2023

I do have to wonder how much of this is simply from using FP8 computations instead of FP16 / BF16. Half the bandwidth, double the compute, double the performance. But I would seriously doubt that all AI algorithms could use FP8 without encountering problems due to the loss of precision.

More likely is that this is simply a case of the base models and algorithms not being tuned very well. Getting a 2X speedup by focusing on optimizations, especially when done by Nvidia people with a deep knowledge of the hardware, is definitely possible.

The Hardcard · Sep 11, 2023

JarredWaltonGPU said:
I do have to wonder how much of this is simply from using FP8 computations instead of FP16 / BF16. Half the bandwidth, double the compute, double the performance. But I would seriously doubt that all AI algorithms could use FP8 without encountering problems due to the loss of precision.

More likely is that this is simply a case of the base models and algorithms not being tuned very well. Getting a 2X speedup by focusing on optimizations, especially when done by Nvidia people with a deep knowledge of the hardware, is definitely possible.

Inference in many cases can go much lower than eight bit. Large language models are functioning at upwards of 98% of full precision accuracy with just five bits and even two bit inference is usable. FP8 will in most cases be indistinguishable from full precision.

Search

News Nvidia Claims Doubled Inference Performance with H100

Admin

Administrator

JarredWaltonGPU

Splendid

The Hardcard

Distinguished

TRENDING THREADS

Latest posts

Moderators online

Share this page