News Intel Habana Gaudi Beats Nvidia's H100 in Visual-Language AI Models: Hugging Face

Admin · Aug 31, 2023

Habana's new Gaudi HPU (Habana Processing Unit) has been show to eat Nvidia's lunch when training Visual-Language (VL) models, showcasing a fast-changing AI acceleration landscape that likely won't be on the hands of a single player for too long.

Intel Habana Gaudi Beats Nvidia's H100 in Visual-Language AI Models: Hugging Face : Read more

bit_user · Aug 31, 2023

Ugh. Okay, here we go...

This immediately stuck me as weird, because Nvidia added hardware acceleration for JPEG decoding. It's described specifically in reference to the A100, here:

Leveraging the Hardware JPEG Decoder and NVIDIA nvJPEG Library on NVIDIA A100 GPUs | NVIDIA Technical Blog

According to surveys, the average person produces 1.2 trillion images that are captured by either a phone or a digital camera. The storage of such images, especially in high-resolution raw format…

developer.nvidia.com

There, they compare it to CPU-accelerated and GPU (software)-accelerated decoding:

cpu-threads-required-by-hybrid-decoder-on-v100-2-625x386.png

As that implies, there are two options for GPU-accelerated decoding:

Use generic CUDA code/cores to accelerate the parallel portions of decoding (e.g. dequantization, IDCT, resampling, and colorspace transform).
Use the NVJPEG engine, newly added to the A100.

According to this diagram, the NVJPEG Engine even handles huffman decoding:

That diagram assumes you want the final image on the CPU, but elsewhere in the page they state the library has the capability for:

* Input to the library is in the host memory, and the output is in the GPU memory.

So, what are we to make of Hugging Face's data? I checked the blog entry cited by the article, and they make absolutely no mention of nvJPEG. I went one step further, and searched the linked git repo, also finding no reference to nvJPEG. I wouldn't say that's conclusive, because I don't know enough about how all of its dependencies are provided or exactly where you'd expect to see nvJPEG show up, if it's indeed capable of being used. However, I think I've done enough digging that questions should be raised and answered for.

If their blog post were instead an academic paper, you'd absolutely expect them to mention nvJPEG and either demonstrate that it's being used or explain why not. If they were comparing against nvJPEG, then you'd expect them to point out how superior Habana's solution is to even Nvidia's purpose-built hardware engine. As it stands, this smells fishy. Either the study's authors are not truly disinterested in the outcome, or somewhat surprisingly ignorant and incurious about Nvidia's solution to this problem. Given that they correctly pointed out that it's a bottleneck, it'd be awfully surprising for Nvidia not to have taken notice or done anything to effectively alleviate it.

Another thought I had is that I don't know how heavy-weight that Hugging Face model is. If I were looking to accentuate a bottleneck in JPEG decoding, I'd use a relatively light-weight model that caters well to the other strengths of Habana's hardware. In other words, even if their experiment is properly conducted, their findings might not be applicable to many other models people are using, not to mention newer chips like the H100.

Search

News Intel Habana Gaudi Beats Nvidia's H100 in Visual-Language AI Models: Hugging Face

Admin

Administrator

bit_user

Titan

Leveraging the Hardware JPEG Decoder and NVIDIA nvJPEG Library on NVIDIA A100 GPUs | NVIDIA Technical Blog

TRENDING THREADS

Latest posts

Moderators online

Share this page