News Elon Musk Buys Thousands of GPUs for Twitter's Generative AI Project

bit_user · Apr 12, 2023

JamesJones44 said:
A single one, no, but cluster of them will though.

The stat was 1000 A100 GPUs, not one.

JamesJones44 · Apr 12, 2023

bit_user said:
The stat was 1000 A100 GPUs, not one.

I'm talking about a cluster, not a single instance. For example Amazon's "UltraClusters" can have over 4000 a100 allocated. Now Amazon charges a boatload for their preconfigured versions of that. However, you can allocate those EC2 instances and cluster them yourself using various tools available to make an a100 farm as small as 8 GPUs or as big as Amazon has available. You could cluster together 1k a100's by allocating the correct amount of P4ds instances.

bit_user · Apr 12, 2023

JamesJones44 said:
I'm talking about a cluster, not a single instance. For example Amazon's "UltraClusters" can have over 4000 a100 allocated. Now Amazon charges a boatload for their preconfigured versions of that. However, you can allocate those EC2 instances and cluster them yourself using various tools available to make an a100 farm as small as 8 GPUs or as big as Amazon has available. You could cluster together 1k a100's by allocating the correct amount of P4ds instances.

Okay, thanks for explaining.

I thought you were talking about instances with Nvidia Tesla P4 cards.

https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/Tesla-P4-Product-Brief.pdf

JamesJones44 · Apr 12, 2023

bit_user said:
Okay, thanks for explaining.

I thought you were talking about instances with Nvidia Tesla P4 cards.

https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/Tesla-P4-Product-Brief.pdf

oh yeah, that wouldn't work for a large network. Might be OK for basic segmentation models.

Sorry for the confusion there, I know you were talking about getting hardware from Tesla and I curve balled it with suggesting Amazon EC2.

endlessoutput · Apr 21, 2023

bit_user said:
Hmm... Musk has business relationships that pose some interesting questions.

As one of the OpenAI founders, can he no longer gain access to their technology, or does Microsoft now control too much of the company for him to have such pull?

Considering Tesla's AI hardware sounds pretty impressive, why not arrange to buy some of theirs? Are there technological differences that significantly disadvantage it on running transformer networks?

Yes... but, Nvidia is basically the single supplier of the hardware everyone wants to use. That gives them quite a bit of leverage, in any price negotiations.

10k GPUs is a lot of money for a company that (I think) is still making losses. If we assume about $20k each (including the servers to host them), that's a cool $200M. Only about 1% of what he paid for Twitter, but probably a multiple of Twitter's annual hardware spend.

Somehow, I had a figure of $18k in mind. Not sure if I'm misremembering that or maybe the street price has shot way up since then. shopping.google.com shows prices anywhere from $28.5k to ebay prices of $43k or more.

Then, I thought I'd see what Dell's list price is, so I popped over to dell.com and looked at the price of adding one to a PowerEdge R750xa. They want an absolutely astounding $86,250 per H100 PCIe card, and they make you add a minimum of 2 GPUs to the chassis!!! Having a decent amount of experience with Dell servers at my job, I know they like big markups for add-ons, but I'm still pretty stunned by that one.

If you know anything about these, you're probably aware that the PCIe cards aren't even the best type of H100. What you really want are the SXM version. And a further irony is that a pair of the current H100's cannot even run GPT-3, which is why Nvidia recently announced a refresh of its H100 with more memory, due out in Q3.

^ i got my first 2 pcie H100s for testing a couple of months ago, i paid 15k USD ea. they were a bit cheaper because they were for ... well, testing lol. now an 8 card sxm5 dual geno server with 8 h100 is around 250k usd and the pcie version is about 230k usd. mind you my deployment is pretty big so im saving some money. The 700w tdp on the sxm5 cards is outrageous, the new NVL 96gb (dual so 192gb per "pair") is going to be where is at. for LLM models the a6000 is still best bang for your buck. scumbags at nvidia took nvlink off the 4090 and ada 6000 so larger models you have to block up and do some funky stuff to make it work .

Search

News Elon Musk Buys Thousands of GPUs for Twitter's Generative AI Project

bit_user

Titan

JamesJones44

Reputable

bit_user

Titan

JamesJones44

Reputable

endlessoutput

TRENDING THREADS

Latest posts

Moderators online

Share this page