News DeepSeek might not be as disruptive as claimed, firm reportedly has 50,000 Nvidia GPUs and spent $1.6 billion on buildouts

I'm not shocked but didn't have enough confidence to buy more NVIDIA stock when I should have. Now Monday morning will be a race to sell airline stocks and buy some big green before everyone else does.
 
  • Like
Reactions: SirStephenH
I'm not shocked but didn't have enough confidence to buy more NVIDIA stock when I should have. Now Monday morning will be a race to sell airline stocks and buy some big green before everyone else does.
I think any big moves now is just impossible to get right. I am in a holding pattern for new investments, and will just put them into something interesting bearing for probably a few months, and let the rest ride. No way to guess right on this roller coaster.

I do think the reactions really show that people are worried it is a bubble whether it turns out to be one or not.
 
  • Like
Reactions: bit_user
$1.6 billion is still significantly cheaper than the entirety of OpenAI's budget to produce 4o and o1.

The exact dollar amount doesn't exactly matter, it's still significantly cheaper, so the overall spend for $500 Billion StarGate or $65 Billion Meta mega farm cluster is wayyy overblown.

Plus, the key part is it's open sourced, and that future fancy models will simply be cloned/distilled by DeepSeek and made public. So "commoditization" of AI LLM beyond the very top end models, it really degrades the justification for the super mega farm builds.
 
Ehh this is kinda mixing up two different sets of numbers. Those GPU's don't explode once the model is built, they still exist and can be used to build another model. The $6 million number was how much compute / power it took to build just that program. Building another one would be another $6 million and so forth, the capital hardware has already been purchased, you are now just paying for the compute / power. Most models at places like Google / Amazon / OpenAI cost tens of millions worth of compute to build, this isn't counting the billions in hardware costs.
 
The fact that the hardware requirements to actually run the model are so much lower than current Western models was always the aspect that was most impressive from my perspective, and likely the most important one for China as well, given the restrictions on acquiring GPUs they have to work with.

Being that much more efficient opens up the option for them to license their model directly to companies to use on their own hardware, rather than selling usage time on their own servers, which has the potential to be quite attractive, particularly for those keen on keeping their data and the specifics of their AI model usage as private as possible. And once they invest in running their own hardware, they are likely to be reluctant to waste that investment by going back to a third-party access seller.

I guess it most depends on whether they can demonstrate that they can continue to churn out more advanced models in pace with Western companies, especially with the difficulties in acquiring newer generation hardware to build them with; their current model is certainly impressive, but it feels more like it was intended it as a way to plant their flag and make themselves known, a demonstration of what can be expected of them in the future, rather than a core product.

So, I guess we'll see whether they can repeat the success they've demonstrated - that would be the point where Western AI developers should start soiling their trousers.

Either way, ever-growing GPU power will continue be necessary to actually build/train models, so Nvidia should keep rolling without too much issue (and maybe finally start seeing a proper jump in valuation again), and hopefully the market will once again recognize AMD's importance as well. Ideally, AMD's AI systems will finally be able to offer Nvidia some proper competition, since they have really let themselves go in the absence of a proper competitor - but with the advent of lighter-weight, more efficient models, and the status quo of many corporations just automatically going Intel for their servers finally slowly breaking down, AMD really needs to see a more fitting valuation.
 
Ehh this is kinda mixing up two different sets of numbers. Those GPU's don't explode once the model is built, they still exist and can be used to build another model. The $6 million number was how much compute / power it took to build just that program. Building another one would be another $6 million and so forth, the capital hardware has already been purchased, you are now just paying for the compute / power. Most models at places like Google / Amazon / OpenAI cost tens of millions worth of compute to build, this isn't counting the billions in hardware costs.
Well said.

The $6 million is the "variable" cost, whereas the $1.6 billion is the "fixed cost."

One thing to note it's 50,000 hoppers (older H20, H800s) to make DeepSeek, whereas xAi needs 100,000 H100s to make GrokAI, or Meta's 100,000 H100s to make Llama 3. So even if you compare fixed costs, DeepSeek needs 50% of the fixed costs (and less efficient NPUs) for 10-20% better performance in their models, which is a hugely impressive feat.

So even if you account for the higher fixed cost, DeepSeek is still cheaper overall direct costs (variable AND fixed cost).

One thing that people don't understand is, no matter what model OpenAI publishes, DeepSeek will distill the output, and make it free/publically available (v3 is dstilled 4o, r1 is distilled o1, and they are going to clone o3 etc...) So 90% of the AI LLM market will be "commoditized", with remaining occupied by very top end models, which inevitably will be distilled as well. OpenAI's only "hail mary" to justify enormous spend is trying to reach "AGI", but can it be an enduring moat if DeepSeek can also reach AGI, and make it open source?
 
Look I'm no genius nor do I understand all the implications.. but when I saw these facts - 1) claims of a hilariously paltry budget + 2) ai performance conveniently similar to that of chat gpts o1 + 3) from a rando Chinese financial company turned AI company - the LAST thing I thought was woowww major breakthrough. Are there innovations, yes. More like, innovations on how to copy & build off others work, potentially illegally. Oh and this just so happens to be what the Chinese are historically good at.

I saw the reactions of ppl losing their sht thought.. damn ppl are really not as smart/informed as I assume them to be. Then you noticed the CCP bots in droves all over .. so obvious. Also a red flag

I'm Chinese, raised in North America. My mom LOVES China (and the CCP lol) but damn guys you gotta see things clearly through non western eyes. Get it through your heads - how do you know when China's lying - when they're saying gddamnn anything. It's just the facts and how they operate.
 
Look I'm no genius nor do I understand all the implications.. but when I saw these facts - 1) claims of a hilariously paltry budget + 2) ai performance conveniently similar to that of chat gpts o1 + 3) from a rando Chinese financial company turned AI company - the LAST thing I thought was woowww major breakthrough. Are there innovations, yes. More like, innovations on how to copy & build off others work, potentially illegally. Oh and this just so happens to be what the Chinese are historically good at.

I saw the reactions of ppl losing their sht thought.. damn ppl are really not as smart/informed as I assume them to be. Then you noticed the CCP bots in droves all over .. so obvious. Also a red flag

I'm Chinese, raised in North America. My mom LOVES China (and the CCP lol) but damn guys you gotta see things clearly through non western eyes. Get it through your heads - how do you know when China's lying - when they're saying gddamnn anything. It's just the facts and how they operate.
It doesn't matter what the number is.

$1.6 billion fixed cost is still significantly cheaper than the tens of billions fixed cost used by OpenAI to train their model.

Also, DeepSeek's published paper is very clear that the $6 million does not include capital costs, only variable costs. When OpenAI discusses it's training cost, they use variable cost too, not fixed cost.

So no, this is still a highly impactful innovation, the target number doesn't matter, it's still significantly cheaper either way.
 
does it really matter either sides version in the end?

Point is they did it with less than competitors, w/o access to newest hardware, & the end result is a much more efficient product.


$1.6B is still a lot better even in worst case.
Lol now we're just glossing over details.. 1.6 billion vs 6 million- you think there's no meaningful difference there? It's just 'less' than competitors. It's a MASSIVE difference.

The inference efficiency is a step forward. But at what cost? Stealing IP (alleged)? Is that what we want to support now?
 
I couldn't find anywhere on the SemiAnalysis "report" where they show actual data. Everything they said was preceded by "We believe". That being said, of course High Flyer spent a fortune on the tech supporting DeepSeek. What is important is that they are probably using the tech to support a dozen other Chinese entities about to explode with further disruptions.
 
Deepseek is a market disruptor, and this news don’t materially change the fact. 6 million in my mind is not possible since the cost of buying those Nvidia hardware probably exceeds that amount. Regardless, 1.6 billion is nothing when compared to the money OpenAi (which ironically is not open), Gemini, etc, “invested”. Just OpenAI’s quarterly loss even today, is greater than the total amount invested. The other consideration is the lower hardware requirement also uses less power which does not require building nuclear power plants to keep running.
 
China tech is never what it seems and the playing field isn't fair either since the CCP calls the shots for companies & they help them get around laws and regulations on top of funding Chinese 'products' to unfairly compete on prices vs Western companies. For example electric cars, solar panels, and their attempt with this model.. last year's 'kirin' CPU was just an Intel rebadge etc etc..

Land of facades & lies sadly. Hopefully someday that changes, the people there deserve it to change
 
It was my first assumption that they simply reused results from other AI engines and used it for their own, and my second assumption that they had more resources than they said.

Evidence shows that both assumptions are very likely correct.
 
First rule of tech when dealing with Chinese companies. They are part of the state and the state has a vested interest in making the USA and Europe look bad. Triple check their numbers. Do the same for Elon.
It's not necessarily a conspiracy by the state. DeepSeek is run by a hedge fund, which might've profited off the fall in Nvidia shares, when they made their announcement of how much it cost to train.
 
When there’s movement, there’s profit to be made.

NVDA at $117.10 turned out to be a good time to buy more at discount. We’ll see how good of a decision that was in the months and years to come.
 
The real disruptive part is releasing the source and weights for their models.
Eh, their models aren't perfect. Have you ever heard the old saying: "the first one is free, then they jack up the price"?

The release of these models could be to get some attention, but perhaps they don't plan to keep doing it, especially as their models continue to improve. They already have a cloud service you can use. In the future, that could be the only way they allow people to access their models.
 
China tech is never what it seems and the playing field isn't fair either since the CCP calls the shots for companies & they help them get around laws and regulations on top of funding Chinese 'products' to unfairly compete on prices vs Western companies. For example electric cars, solar panels, and their attempt with this model.. last year's 'kirin' CPU was just an Intel rebadge etc etc..

Land of facades & lies sadly. Hopefully someday that changes, the people there deserve it to change
Nobody lied.

DeepSeek was extremely transparent that $5.576 million is only variable training cost, not including capital cost associated with prior research, acquisition costs, data fixed costs. It's the people who misquote them that doesn't know AI that is misrepresenting things.


https://arxiv.org/html/2412.19437v1

DeepSeek-V3 Technical Report​

Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1 ........DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

DeepSeek was very open that this is purely training compute cost, not the total "fixed+variable" cost. It's the media who misquote them and misrepresent it as a "all-in-one" cost that should be criticized.
 
Nobody lied.

DeepSeek was extremely transparent that $5.576 million is only variable training cost, not including capital cost associated with prior research, acquisition costs, data fixed costs. It's the people who misquote them that doesn't know AI that is misrepresenting things.


https://arxiv.org/html/2412.19437v1

DeepSeek-V3 Technical Report​



DeepSeek was very open that this is purely training compute cost, not the total "fixed+variable" cost. It's the media who misquote them and misrepresent it as a "all-in-one" cost that should be criticized.
No one trusts any number that comes out of China. can you name one invention by China in the modern era?
 
  • Like
Reactions: Unolocogringo