With the frenzy that ChatGPT has generated, companies (and presumably individuals) are scrambling to get up to speed on AI development. I'm wondering if that would cause a surge in demand for Nvidia GPUs, especially the 4000 series. Are we going to see a jump in Nvidia GPU pricing as we did with crypto?
I've long figured the AI-fueled demand for GPUs has been a somewhat consistent undercurrent that maybe hasn't gotten quite the press of the crypto boom. You do have a point that chatGPT might indeed supercharge the sector, however there are probably enough purpose-built AI accelerators on the market that not all of that demand will go towards GPUs.
Luckily for the gaming community, it seems ChatGPT is far too big either for training or inference on consumer GPUs.
According to someone in that thread, it's about 500 GB. So, you'd need at least an 8-GPU HGX H100 system to train it (or probably even inference it at any kind of decent speed).
The Most Powerful End-to-End AI and HPC Data Center Platform.
www.nvidia.com
Of course, Nvidia is currently making both H100 and their RTX 4000 models on the same process node. That means they can shift wafer allocation between the two. Late last year, they announced they would be doing just that (i.e. shifting wafers over to H100 or maybe Grace CPU production).
Nvidia made products dedicated to crypto. Will there be a market for AI-dedicated PC add-in cards?
Tensor cores came from the desire to accelerate AI workloads. Their gaming GPUs have quite a bit of AI horsepower, but it's their 100-series products that are really suited towards working with such huge models - not only due to compute horsepower and bandwidth, but also multiple NVLink ports for scaling to multi-GPU.
The only change they could really make is to cut down on the HPC-oriented compute (i.e. the fp64 stuff), to make room for even more tensor cores. As it stands, I'm not sure how much die space that stuff occupies, but I once compared the transistor counts of Vega20 with its predecessor and found less than 6% additional transistors needed to add half-rate fp64 support. That number seems small to me, but it suggests that we'll probably continue to see both HPC and AI both continue to be served by a single product line.
For reference, here's a table I recently made to compare the AI compute power of high-end consumer vs. datacenter GPU products.
Make | Model | Tensor TFLOPS (f16) | Memory Bandwidth (TB/s) |
---|
Intel | Xeon 8480+ | 115 | 0.31 |
Intel | Xeon Max 9480 | 109 | 1.14 |
Nvidia | H100 | 1979* | 3.35 |
Nvidia | RTX 4090 | 661* | 1.01 |
Nvidia | RTX 3090 | 142 | 0.94 |
Intel | Data Center GPU Max 1550 | 839 | 3.20 |
Intel | A770 | 138 | 0.56 |
AMD | Instinct MI250X | 383 | 3.28 |
AMD | RX 7900 XTX | 123 | 0.96 |
AMD | RX 6950 XT | 47 | 0.58 |
* Nvidia has taken to rating their tensor performance to assume sparsity, which is up to 2x their performance on dense matrices. To estimate their performance on dense matrices, simply divide these figures by 2. That said, sparsity is quite common, which is why they bothered to accelerate it. So, it's not entirely unfair for them to quote the sparse number.