News GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate

Status
Not open for further replies.
The big deficiency GPUs have on AI workloads is data locality. Data movement is energy-intensive. If you look at the dataflow architectures that have come onto the AI scene, they all have a lot more local SRAM distributed among the processing elements, in order to address this. Nvidia has massively boosted the amount of SRAM it incorporates into its GPUs, over the past couple generations, but it's still as little as 1/10th as much as some purpose-built AI chips have done.

This gets at the heart of GPU vs. AI accelerator architecture, because realtime rendering famously has very little data locality and depends heavily on random-access performance. That's precisely what made GPUs such bandwidth monsters that crypto coins like Ethereum were able to exploit. It's also what's stymied multi-GPU rendering and what's kept multi-die (i.e. multi-compute die) GPUs from breaking into the mainstream.

The main reasons I think Nvidia's GPUs haven't already been displaced by purpose-built AI accelerators are:
  • Nvidia has such momentum, scale, and market dominance that its integration and software support are second to none. People trying to innovate on AI techniques & applications don't want to waste a bunch of time fussing with broken or incomplete software stacks or integration, which has been the bane of AMD's efforts and I'm sure most custom AI hardware is in even worse shape.
  • Nvidia has so many resources that they can afford to optimize even a sub-optimal architecture, to the point where it can compete with anything out there.
  • Right now, most of the big AI users care much more about innovation and are willing to live with high hardware prices and operations costs (i.e. mostly due to poor efficiency). If the AI race ever settles down, we could see costs & efficiency starting to bubble up as higher priorities. Already, with large-scale deployment of LLMs, it seems to be getting a lot of mindshare, though LLMs push the technology so far that I'm not sure anyone (other than possibly Cerebras) has a much more efficient alternative.

the 'purpose' of purpose-built silicon is not stable. AI is not as static as some people imagined and trivialize [like] 'it is just a bunch of matrix multiplies'."
I've been saying this for ages. People (usually hardware designers) are quick to trivialize the requirements of AI hardware. In actual fact, you need quite a bit of programmability and not just a bunch of SRAM and fast matrix multiplies.

Nvidia's GPUs have become so versatile in adding support for new data formats that it has become inherently harder - even for custom silicon - to compete against them.
I think the custom silicon has actually been leading the charge on custom data formats. For instance, Google Brain was the pioneer of the BF16, back when GPUs still only supported IEEE-754 FP16. Lots of examples exist of other innovations in data formats, from companies like Nervana, Tenstorrent, and some others I'm forgetting.
 
Last edited:
Status
Not open for further replies.