News DeepSeek might not be as disruptive as claimed, firm reportedly has 50,000 Nvidia GPUs and spent $1.6 billion on buildouts

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Nobody lied.

DeepSeek was extremely transparent that $5.576 million is only variable training cost, not including capital cost associated with prior research, acquisition costs, data fixed costs. It's the people who misquote them that doesn't know AI that is misrepresenting things.


https://arxiv.org/html/2412.19437v1

DeepSeek-V3 Technical Report​



DeepSeek was very open that this is purely training compute cost, not the total "fixed+variable" cost. It's the media who misquote them and misrepresent it as a "all-in-one" cost that should be criticized.
Sure. However they did seem to mislead to make bigger headlines. If proven true, they didn't reveal they distilled their model from OpenAI. They left that out it would seem and that is awfully conspicuous. Who knows how much that helped them achieve what they did in the timeline they cited. Should they be celebrated for this? Or are we saying distillation is a fair game just as LLMs sucking up copyrighted content from across the web to train is fair game? No easy answers there unless you start with some prejudice towards the country of origin.

Aside from that, DeepSeek showed some innovative steps. Turning a portion of the GPUs they had into L3 memory cache controllers was pretty brilliant. My understanding is cache memory access is a big limiter in training performance. So that's just huge for performance tuning perspective. I am wondering if just revealing this technique will cause a major jump in GPU usage efficiency for everyone or have companies like OpenAI already been doing this without revealing as such? Releasing the model open source could be disruptive, but I'm not sure. Better models are coming. Will they pull this off again to update/improve the model? I doubt it. How far will this open source model carry a company that attempts to build off it before it becomes obsolete? Seems risky given it still is not cheap to buy even low end GPU's to run this model.

As such I think at this point we're all wildly speculating on the net impact of DeepSeek's release to AI LLM development. Anyone saying big tech LLM development is dead is reaching. Anyone claiming this all a big chinese orchestrated scam is reaching. Let's wait and see.

I am thoroughly enraptured by the headlines in this space almost daily. It is fascinating both learning about the tools as they improve as well as using them more and more every day. I can't imagine this is just some hype train that will never arrive at the station from an investment perspective. All the while I look forward to reading the news in a couple months to see what people have done attempting to replicate DeepSeek's approach and/or building upon their opensource. Some more disruption and competition is good even if it shakes my fellow US investors. We need this to beomce more efficient for smaller companies to build and run. I don't want 2-3 giant tech companies controlling access to this emerging technology because it needs a nuclear reactor's worth of energy output to build/run. Nevermind the supply constraint for the chips.
 
If I collect all the "if", "suppose", "believe" and synonyms of them and phrases with that meaning from the comments and from the semianalysis "report", I will have enough for a whole year. I will eat them raw, fried, baked, boiled in soup.
 
It's not necessarily a conspiracy by the state. DeepSeek is run by a hedge fund, which might've profited off the fall in Nvidia shares, when they made their announcement of how much it cost to train.
I would agree if they were public, but being private and going open source made it seem much more like a state to state swing at the markets. I think it was just way to weaken the US marekts before the possibilities of tariffs.
 
I wonder how long it'll take for this thread to be locked due to politics. Besides that, I don't trust the 1.6 billion number if they lied about the 6 million number. Who knows, maybe it cost 1 trillion but the power cost was only 1.6 billion.
will have enough for a whole year. I will eat them raw, fried, baked, boiled in soup.
Boil em mash em stick em in a stew
 
Last edited:
The question I have based on this article is just how was this Chinese company, that hires almost exclusively from mainland China, able to buy 50,000 NVIDIA datacenter GPUs given there is a complete ban on them to China? And not just the older, discontinued A100’s as previously reported, or even consumer ones, but the latest Hopper GPUs?
From what I've read, they already know where to look... Singapore. The sudden spike in exports to Singapore is clearly visible in NVIDIA's financial reports. Do you think someone will do anything about this? Investigate more? No, they're already flooding the internet with articles to save NVIDIA's stock price...

We're screwed. We have a massive giant with an inflated market capitalization that looms like a specter in the stock market, and if it crashes, it risks causing damage throughout the entire market. This is what happens when there are such concentrations. The problem is NVIDIA. Too many people, including analysts who didn't understand anything, not even the difference between training and inference, have inflated it senselessly, and now with every little issue it faces, the entire market trembles (because everyone know that it's inflated...they want to make money with it but they know that sooner or later...). Anything even remotely related to them, in the same sector or not, is now at risk, even the companies that supply screws for their servers, 😆
 
Last edited:
Sure. However they did seem to mislead to make bigger headlines. If proven true, they didn't reveal they distilled their model from OpenAI. They left that out it would seem and that is awfully conspicuous. Who knows how much that helped them achieve what they did in the timeline they cited. Should they be celebrated for this? Or are we saying distillation is a fair game just as LLMs sucking up copyrighted content from across the web to train is fair game? No easy answers there unless you start with some prejudice towards the country of origin.

Aside from that, DeepSeek showed some innovative steps. Turning a portion of the GPUs they had into L3 memory cache controllers was pretty brilliant. My understanding is cache memory access is a big limiter in training performance. So that's just huge for performance tuning perspective. I am wondering if just revealing this technique will cause a major jump in GPU usage efficiency for everyone or have companies like OpenAI already been doing this without revealing as such? Releasing the model open source could be disruptive, but I'm not sure. Better models are coming. Will they pull this off again to update/improve the model? I doubt it. How far will this open source model carry a company that attempts to build off it before it becomes obsolete? Seems risky given it still is not cheap to buy even low end GPU's to run this model.

As such I think at this point we're all wildly speculating on the net impact of DeepSeek's release to AI LLM development. Anyone saying big tech LLM development is dead is reaching. Anyone claiming this all a big chinese orchestrated scam is reaching. Let's wait and see.

I am thoroughly enraptured by the headlines in this space almost daily. It is fascinating both learning about the tools as they improve as well as using them more and more every day. I can't imagine this is just some hype train that will never arrive at the station from an investment perspective. All the while I look forward to reading the news in a couple months to see what people have done attempting to replicate DeepSeek's approach and/or building upon their opensource. Some more disruption and competition is good even if it shakes my fellow US investors. We need this to beomce more efficient for smaller companies to build and run. I don't want 2-3 giant tech companies controlling access to this emerging technology because it needs a nuclear reactor's worth of energy output to build/run. Nevermind the supply constraint for the chips.
I feel that if their “compute power used” claims prove true. The fact that their model was made through using their “distillation” process on open source models, then they’ve still told the truth. Regardless of the methods they used, if they trained an equivalent performing model on a fraction of the resources then they’ve still essentially told the truth. My only concern is if the amount of compute power they claimed to have used turns out to be BS.
 
Sure. However they did seem to mislead to make bigger headlines. If proven true, they didn't reveal they distilled their model from OpenAI. They left that out it would seem and that is awfully conspicuous. Who knows how much that helped them achieve what they did in the timeline they cited. Should they be celebrated for this? Or are we saying distillation is a fair game just as LLMs sucking up copyrighted content from across the web to train is fair game? No easy answers there unless you start with some prejudice towards the country of origin.

Aside from that, DeepSeek showed some innovative steps. Turning a portion of the GPUs they had into L3 memory cache controllers was pretty brilliant. My understanding is cache memory access is a big limiter in training performance. So that's just huge for performance tuning perspective. I am wondering if just revealing this technique will cause a major jump in GPU usage efficiency for everyone or have companies like OpenAI already been doing this without revealing as such? Releasing the model open source could be disruptive, but I'm not sure. Better models are coming. Will they pull this off again to update/improve the model? I doubt it. How far will this open source model carry a company that attempts to build off it before it becomes obsolete? Seems risky given it still is not cheap to buy even low end GPU's to run this model.

As such I think at this point we're all wildly speculating on the net impact of DeepSeek's release to AI LLM development. Anyone saying big tech LLM development is dead is reaching. Anyone claiming this all a big chinese orchestrated scam is reaching. Let's wait and see.

I am thoroughly enraptured by the headlines in this space almost daily. It is fascinating both learning about the tools as they improve as well as using them more and more every day. I can't imagine this is just some hype train that will never arrive at the station from an investment perspective. All the while I look forward to reading the news in a couple months to see what people have done attempting to replicate DeepSeek's approach and/or building upon their opensource. Some more disruption and competition is good even if it shakes my fellow US investors. We need this to beomce more efficient for smaller companies to build and run. I don't want 2-3 giant tech companies controlling access to this emerging technology because it needs a nuclear reactor's worth of energy output to build/run. Nevermind the supply constraint for the chips.
Actually you the actual reason why they missed out the distilled part, it’s not to lead the headlines or earn money, it’s to please the leader, if they stated they distilled form Open AI they can’t claim overtaken the world and is independent from any US sanctions, which is what the leader wanted to promote. If they dont fulfil that they will have much more to worry than money
 
  • Like
Reactions: bit_user
One thing to note it's 50,000 hoppers (older H20, H800s) to make DeepSeek, whereas xAi needs 100,000 H100s to make GrokAI, or Meta's 100,000 H100s to make Llama 3. So even if you compare fixed costs, DeepSeek needs 50% of the fixed costs (and less efficient NPUs) for 10-20% better performance in their models, which is a hugely impressive feat.
Why do you assume they're using all of their GPUs to train a single model? This seems unlikely, to me. They have many researchers investigating different approaches, different tunings, and even developing other models.

One thing that people don't understand is, no matter what model OpenAI publishes, DeepSeek will distill the output, and make it free/publically available
Unless the legality of doing so is successfully challenged. I'm not saying whether it will be, just that OpenAI is almost certain to try.

Also, OpenAI can implement measures to try and detect people doing this and cut them off.
 
One thing that people don't understand is, no matter what model OpenAI publishes, DeepSeek will distill the output, and make it free/publically available (v3 is dstilled 4o, r1 is distilled o1, and they are going to clone o3 etc...) So 90% of the AI LLM market will be "commoditized", with remaining occupied by very top end models, which inevitably will be distilled as well. OpenAI's only "hail mary" to justify enormous spend is trying to reach "AGI", but can it be an enduring moat if DeepSeek can also reach AGI, and make it open source?
I am pretty sure they will, but that's exactly when innovatioin will be started to be killed off, it's essentially privacy, when it make's everything commoditized, you just kill off those who invest in hundreds of thousands of H100 or similar plus all those expert who created the model, and when you outcompete them, good luck you can have something progressing forward.
 
A company outside of China and with no bad history buys H100s, sells them to China, profits. If you won’t need the H100s for yourself in the future there’s no real risk to doing it once for a company.
That's why a new set of restrictions was about to go into effect, last month. I haven't heard what happened with these:

The aspect you're missing is financial. The tier-1 countries on that list are probably all members of the financial transaction monitoring network used to track organized crime, terrorism finance, etc.

As soon as you start doing import/export of thousands of these GPUs, you're moving tens of millions of $, or more. That's not trivial to hide and it's also illegal if you don't go through all the proper formalities that would reveal what you're doing. Obviously, the systems in place for spotting such smuggling aren't perfect, but it's another set of tools and trained individuals whose job it is to watch out for these sorts of illegal movements.
 
Sure. However they did seem to mislead to make bigger headlines. If proven true, they didn't reveal they distilled their model from OpenAI. They left that out it would seem and that is awfully conspicuous. Who knows how much that helped them achieve what they did in the timeline they cited. Should they be celebrated for this? Or are we saying distillation is a fair game just as LLMs sucking up copyrighted content from across the web to train is fair game? No easy answers there ...
IMO, distillation is like just memorizing the problems & answers likely to be on a test, rather than really learning the material. As long as you stay within the areas covered by the benchmarks used to evaluate it, the performance is probably pretty good. However, I'd expect it to be much more uneven and have some hard limits to what it knows or can do.
 
That’s an impressive investment! $1.6 billion in AI hardware really shows how serious the push is in the AI space right now. It’ll be interesting to see how these resources are utilized and how they impact Nvidia’s market position
I guess you didn't hear about Musk buying 100k H100 GPUs? At the street price of $30k each, that's a cool $3B. He's planning on expanding it to 2-3 times the size:

Even bigger is the recently announced Stargate project, which aims to start with an initial $100B investment and has plans to eventually ramp up to $500B?

Of course, that was before all this DeepSeek commotion. So, we'll have to see if they get cold feet and scale back those plans quite substantially.
 
DeepSeek’s scale is impressive, but raw GPU power and investment don’t guarantee disruption. Execution, model quality, and real-world adoption matter just as much.
 
  • Like
Reactions: bit_user
The most relevant thing isn't whether DeepSeek has damaged or will damage NVIDIA in the future, but rather the fact that as soon as the news spread (and by the way, we already had information about DeepSeek in December, so well before), a small breeze was enough to make NVIDIA's stock collapse.

It was like a test. And this is what's really interesting, because it's tangible proof of how solid NVIDIA's position in the AI sector is perceived to be, practically nothing. All it takes is a group of skilled programmers in China to show off an inference model (and who knows what could emerge in the coming months) and everyone runs away from NVIDIA. They have solid foundations, indeed