News OpenAI Whisper Audio Transcription Benchmarked on 18 GPUs: Up to 3,000 WPM

Dr3ams

Reputable
Sep 29, 2021
255
280
5,060
What?! Here we go again. This time it's the AI stooges that will buy up truck loads of GPUs and again jack the prices through the roof.
 
I'm loving these series from you, Jarred, so thanks a lot for doing them and helping people get their feet wet into the AI scene.

Interesting to see how much better nVidia is in compute than AMD for these, although it does not come as a shocker in "popular" disciplines given how much market share nVidia has. AMD has a lot of catching up to do, for sure. Plus, RDNA is not compute heavy and CDNA is where they are putting all the chips; at least, that is what I remember AMD saying or excusing themselves with.

Could you get a CDNA card at all to compare? Do they even exist in the Pro market? xD

Regards.
 
I'm loving these series from you, Jarred, so thanks a lot for doing them and helping people get their feet wet into the AI scene.

Interesting to see how much better nVidia is in compute than AMD for these, although it does not come as a shocker in "popular" disciplines given how much market share nVidia has. AMD has a lot of catching up to do, for sure. Plus, RDNA is not compute heavy and CDNA is where they are putting all the chips; at least, that is what I remember AMD saying or excusing themselves with.

Could you get a CDNA card at all to compare? Do they even exist in the Pro market? xD

Regards.
I do not have any data center cards, from either AMD or Nvidia (or Intel). I'm not sure I even want to try to tackle that topic, as it would take a lot of time to determine how to test the various GPUs, and the payoff in traffic probably wouldn't be there. But hey, if AMD, Nvidia, or Intel want to send me a PCIe card that would work in a standard Windows PC, have at it! 🙃

As for AMD, RDNA cards do not have any true equivalent to the CDNA tensor / matrix cores. With RDNA 2/3, there's an "AI accelerator" that basically uses the FP16 units in a slightly more optimized fashion, but it's still about a tenth of what Nvidia is doing with its tensor cores (give or take). Frankly, I think the only way AMD would ever do true tensor core hardware on its GPUs is if it becomes part of a DirectX specification.

I'd love to see something like that where code could be written that would work on Nvidia tensor, Intel XMX, and AMD AI accelerator hardware. But I'm not sure there's enough industry support for yet another standard that it would ever happen. So instead, all of the stuff that uses tensor hardware ends up being basically proprietary.
 
  • Like
Reactions: -Fran-
I do not have any data center cards, from either AMD or Nvidia (or Intel). I'm not sure I even want to try to tackle that topic, as it would take a lot of time to determine how to test the various GPUs, and the payoff in traffic probably wouldn't be there. But hey, if AMD, Nvidia, or Intel want to send me a PCIe card that would work in a standard Windows PC, have at it! 🙃

As for AMD, RDNA cards do not have any true equivalent to the CDNA tensor / matrix cores. With RDNA 2/3, there's an "AI accelerator" that basically uses the FP16 units in a slightly more optimized fashion, but it's still about a tenth of what Nvidia is doing with its tensor cores (give or take). Frankly, I think the only way AMD would ever do true tensor core hardware on its GPUs is if it becomes part of a DirectX specification.

I'd love to see something like that where code could be written that would work on Nvidia tensor, Intel XMX, and AMD AI accelerator hardware. But I'm not sure there's enough industry support for yet another standard that it would ever happen. So instead, all of the stuff that uses tensor hardware ends up being basically proprietary.
As I understand it, one of the "cool" things of ROCm, is that is can run some of the CUDA code as it comes with a small translation (not emulation) layer built in. It's a similar dxvk situation, but better than nothing, I guess.

Have you looked into that, by any chance?

Regards.
 

leroj9

Commendable
May 31, 2021
2
0
1,510
This is a great article. I'm a NVIDIA user and wasn't so happy with the accuracy of Whisper Desktop (Const-me). I moved on to "Faster Whisper" which lets large-v2 work on my 8GB RTX 2080 and am very happy with the performance.

You have to install it with Python and there are annoying NVIDIA CuDNN and CUDA libraries to install but get past that and it works very well. I also use it through SubtitleEdit https://github.com/Softcatala/whisper-ctranslate2
 
Jul 2, 2024
1
0
10
Hey there @JarredWaltonGPU! Just found that article and love it, thx!

Just tried whisperdesktop on my R9 5900X, 32GB, RTX3080 (10GB) with whisper large V3 and it’s waaay below realtime here. Any ideas on why that is? VRAM usage is at 4GB 🤷‍♂️ and gpu at 95%.

Help much appreciated!
Manuel
 
Last edited:
Hey there @JarredWaltonGPU! Just found that article and love it, thx!

Just tried whisperdesktop on my R9 5900X, 32GB, RTX3080 (10GB) with whisper large V3 and it’s waaay below realtime here. Any ideas on why that is? VRAM usage is at 4GB 🤷‍♂️ and gpu at 95%.

Help much appreciated!
Manuel
I don’t know if driver versions might make a difference, and I haven’t tested this in a while. 3080 when I tested was doing 1366 WPM, but that was for transcription of a recording. I wouldn’t expect the CPU to matter much, but I also only tested with ggml-large.bin. I’m not sure what version I used.