News Nvidia: H100 AI Performance Improves by Up to 54 Percent With Software Optimizations

This doesn't come much of a surprise though, since to achieve this level of performance on H100, NVIDIA is using FP8 performance through its transformer engine that is embedded within the Hopper architecture. Ampere lacks this.

It works on a per-layer basis by analyzing all of the workload that is sent through it and then attests whether the data can be run in FP8 without compromising efficiency. Suppose if the data can be run in FP8, then it'll use that, if not, then the transformer engine will utilize the FP16 Math Ops and FP32 accumulate to run the data instead.

Since the Ampere arch lineup didn't have a Transformer engine architecture, it was run on FP16+FP32, rather than FP8.

Also, while on the topic of MLPerf, it is worth mentioning that
Deci has also announced that it has achieved record-breaking inference speed on NVIDIA GPUs at MLPerf. This chart shows the throughput performance per TeraFLOPs as achieved by Deci and other submitters within the same category.

It appears that with Deci, teams can replace 8 NVIDIA A100 cards with just one NVIDIA H100 card, while getting higher throughput and better accuracy (+0.47 F1).

image002-1.png
 
Last edited:
BTW, it appears that NVidia has also showcased a performance boost with the Jetson AGX Orin SOC/platform. I presume that Nvidia used the JetPack SDK 5.1 kit to achieve this feat. When it comes to performance alone, the Orin SOC shows an 81% boost while in power efficiency, the chip shows up to a 63% performance jump which is dramatic.

NVIDIA seem to be more committed to the GPU and silicon longevity in the server space than AMD, no?. Just curious to know the future of Orin SOC.


NVIDIA-MLPerf-Hopper-H100-L4-Ada-GPUs-Performance-Benchmarks-_7.png
 

TRENDING THREADS