News Elon Musk and Larry Ellison begged Nvidia CEO Jensen Huang for AI GPUs at dinner

Admin · Sep 14, 2024

Larry Ellison talked about how he asked Jensen Huang over dinner to give him 131,072 Blackwell GPUs for the planned Zettascale Oracle Supercluster.

Elon Musk and Larry Ellison begged Nvidia CEO Jensen Huang for AI GPUs at dinner : Read more

Elusive Ruse · Sep 14, 2024

Jensen be eating good! More power to him, he has earned it.

hotaru251 · Sep 14, 2024

if two rich guys are begging another guy to take their $ they should be paying more in taxes.

Tonet666 · Sep 14, 2024

Why this sound like a scene from the move The Godfather 1 🤣🤣

JRStern · Sep 14, 2024

The customer is always right, ROFLMAO.

Ellison says that frontier AI models coming in the next three years will cost $100 billion to train, echoing Anthropic CEO Dario Amodei’s thoughts on the matter.

Yeah, and Sam Altman who wants to raise $1t so he can be a player (that means he takes 10% off the top).
But they're wrong.
OpenAI's talk about their new o1 model that "thinks before it answers", does more work at inference time rather than putting all the load on pre-training. D'yall see what that means?
First it means needing more GPUs for inference time, OK, but maybe not that many. Actually GPU may not even be the right architecture to optimize this kind of inference, but that's complicated.
But it means a LOT less GPUs for training.
It even suggests that trillion parameter models will be decomposed saving exponential amounts of training work.

IOW I think it's the right path.
And also that it will cut even the big boys' demand for mass quantities of GPUs by a lot, by far more than are incrementally needed for inference.
Have a nice day.

bit_user · Sep 14, 2024

Elusive Ruse said:
Jensen be eating good! More power to him, ...

He should enjoy it, while it lasts. Such a power imbalance never lasts forever.

hotaru251 said:
if two rich guys are begging another guy to take their $ they should be paying more in taxes.

This is funny, but it's actually their businesses that want the GPUs from his business.

Now, say what you will about real corporate taxes rates...
(if it weren't off-topic, that is. So, actually please don't.)

Tonet666 said:
Why this sound like a scene from the move The Godfather 1 🤣🤣

It definitely leaves a bad taste, when backroom deals can gain someone such an important advantage in what's supposed to be an open market. I'll bet Jensen would rather Larry not have posted about it, but at least it shed a glimpse of transparency.

jp7189 · Sep 14, 2024

JRStern said:
The customer is always right, ROFLMAO.

Yeah, and Sam Altman who wants to raise $1t so he can be a player (that means he takes 10% off the top).
But they're wrong.
OpenAI's talk about their new o1 model that "thinks before it answers", does more work at inference time rather than putting all the load on pre-training. D'yall see what that means?
First it means needing more GPUs for inference time, OK, but maybe not that many. Actually GPU may not even be the right architecture to optimize this kind of inference, but that's complicated.
But it means a LOT less GPUs for training.
It even suggests that trillion parameter models will be decomposed saving exponential amounts of training work.

IOW I think it's the right path.
And also that it will cut even the big boys' demand for mass quantities of GPUs by a lot, by far more than are incrementally needed for inference.
Have a nice day.

I'd actually say the future is inferencing moving outwards towards low power end user devices. A few large training datacenters and disturbed inferencing on very efficient models

bit_user · Sep 14, 2024

jp7189 said:
I'd actually say the future is inferencing moving outwards towards low power end user devices. A few large training datacenters and disturbed inferencing on very efficient models

I think it depends a lot on how big the models are. You're not going to be using something like GPT 4 on a cell phone any time soon, just due to its sheer size. Not only does it chew up lots of storage, but also download bandwidth. Then, there's the issue of battery power, if you're inferencing huge models very much, like with some kind of Alexa/Siri assistant.

There would also be IP concerns about letting models run on edge devices, for those which aren't already open source. All someone needs to do is find one device with a known exploit to bypass memory encryption and now your model leaks out into the world.

jp7189 · Sep 14, 2024

bit_user said:
I think it depends a lot on how big the models are. You're not going to be using something like GPT 4 on a cell phone any time soon, just due to its sheer size. Not only does it chew up lots of storage, but also download bandwidth. Then, there's the issue of battery power, if you're inferencing huge models very much, like with some kind of Alexa/Siri assistant.

There would also be IP concerns about letting models run on edge devices, for those which aren't already open source. All someone needs to do is find one device with a known exploit to bypass memory encryption and now your model leaks out into the world.

I'm thinking of the pruning and distillation work e.g. mistral et al are doing. The focus is on maintaining accuracy while greatly reducing size and processing power. The minitron 8b runs on fairly low power.

JRStern · Sep 14, 2024

bit_user said:
I think it depends a lot on how big the models are. You're not going to be using something like GPT 4 on a cell phone any time soon, just due to its sheer size. Not only does it chew up lots of storage, but also download bandwidth.

If there were value in it I suppose a terabyte model could be run on a phone, you just download it once a year or so, perhaps from some local genius bar where you pay $29 for the privilege (or purchase it on ROM and just buy an upgrade and plug it in as available). I think 1tb would cover GPT4, and if not I'll bet it could be compressed some just for edge distribution, with little or no impact on performance.

Now the problem is that this new form of inference may need more horsepower than one typically finds in a phone. The old form probably is OK, the new form I'm going to guess not so much. But again, going to the other side, *your* phone may very well learn *your* patterns of interference and be able to cache the most important parts specifically to perform YOUR inferences. That's the real promise of edge computing, that it can be as individual as you are, assuming that's a good thing, LOL.

bit_user · Sep 14, 2024

jp7189 said:
I'm thinking of the pruning and distillation work e.g. mistral et al are doing. The focus is on maintaining accuracy while greatly reducing size and processing power.

It's not like LLMs are renown for their accuracy. Yeah, a pruned version will still be useful for certain things, but current LLMs have more issues than just their resource requirements.

bit_user · Sep 14, 2024

JRStern said:
If there were value in it I suppose a terabyte model could be run on a phone, you just download it once a year or so, perhaps from some local genius bar where you pay $29 for the privilege (or purchase it on ROM and just buy an upgrade and plug it in as available). I think 1tb would cover GPT4, and if not I'll bet it could be compressed some just for edge distribution, with little or no impact on performance.

If the model is 1 TB, then inferencing speeds are going to be limited by how long it takes to read 1 TB from your phone's storage, which typically isn't the fastest NAND out there. It's also going to burn yet more power, continually having to read it in.

Pierce2623 · Sep 15, 2024

JRStern said:
The customer is always right, ROFLMAO.

Yeah, and Sam Altman who wants to raise $1t so he can be a player (that means he takes 10% off the top).
But they're wrong.
OpenAI's talk about their new o1 model that "thinks before it answers", does more work at inference time rather than putting all the load on pre-training. D'yall see what that means?
First it means needing more GPUs for inference time, OK, but maybe not that many. Actually GPU may not even be the right architecture to optimize this kind of inference, but that's complicated.
But it means a LOT less GPUs for training.
It even suggests that trillion parameter models will be decomposed saving exponential amounts of training work.

IOW I think it's the right path.
And also that it will cut even the big boys' demand for mass quantities of GPUs by a lot, by far more than are incrementally needed for inference.
Have a nice day.

Buy the hype. Have fun with a model where they take shortcuts in training it.

JRStern · Sep 15, 2024

bit_user said:
If the model is 1 TB, then inferencing speeds are going to be limited by how long it takes to read 1 TB from your phone's storage, which typically isn't the fastest NAND out there. It's also going to burn yet more power, continually having to read it in.

No query touches more than a tiny fraction of that.
It would be interesting to have actual numbers.
It might be down as low as 1mb, let's call it 10mb, that would be 0.00001%, about the same as one medium-res picture.

JRStern · Sep 15, 2024

Pierce2623 said:
Buy the hype. Have fun with a model where they take shortcuts in training it.

LOL. If you looked into how they "train" it now you wouldn't touch the current versions.
"Shortcuts" are more like going straight to the destination instead of knocking on every door until you happen to hit the right one.
Warp speed.

USAFRet · Sep 15, 2024

JRStern said:
No query touches more than a tiny fraction of that.
It would be interesting to have actual numbers.
It might be down as low as 1mb, let's call it 10mb, that would be 0.00001%, about the same as one medium-res picture.

The problem is which fraction?

JRStern · Sep 15, 2024

USAFRet said:
The problem is which fraction?

Sure, well, that's why it's not exactly like fetching a single picture or even a video 1000x larger.
So it's a little more like complex SQL query of a database.
But it's 1tb because it holds data for 1000 topics, and your query is typically just one facet of one topic.

bit_user · Sep 15, 2024

JRStern said:
No query touches more than a tiny fraction of that.
It would be interesting to have actual numbers.
It might be down as low as 1mb, let's call it 10mb, that would be 0.00001%, about the same as one medium-res picture.

I don't doubt that a minority of the weights are needed, but your figure seems extremely low, to me. What's your source?

Also, as @USAFRet pointed out, the weights you need aren't all going to be compactly organized. NAND organizes storage into blocks and you pretty much have to read an entire block just to access one small part of it. So, the overheads and amplification will make reading quite a bit more expensive than if you just loaded in a file of the same length as the specific weights would occupy.

JRStern said:
it's 1tb because it holds data for 1000 topics, and your query is typically just one facet of one topic.

It's not neatly taxonomized. I expect there will be some coherency and ad hoc structure to it, in order to achieve the necessary representational efficiency, but it won't be partitioned, per se.

ex_bubblehead · Sep 16, 2024

Once again those who think our rules are something of a suggestion have ruined yet another thread. Sanctions are now on the table.

News Elon Musk and Larry Ellison begged Nvidia CEO Jensen Huang for AI GPUs at dinner

Administrator

Estimable

Splendid

Distinguished

Distinguished

Titan

Distinguished

Titan

Distinguished

Distinguished

Titan

Titan

Commendable

Distinguished

Distinguished

Titan

Distinguished

Titan

Titan

Share this page