parkerthon
Distinguished
Sure. However they did seem to mislead to make bigger headlines. If proven true, they didn't reveal they distilled their model from OpenAI. They left that out it would seem and that is awfully conspicuous. Who knows how much that helped them achieve what they did in the timeline they cited. Should they be celebrated for this? Or are we saying distillation is a fair game just as LLMs sucking up copyrighted content from across the web to train is fair game? No easy answers there unless you start with some prejudice towards the country of origin.Nobody lied.
DeepSeek was extremely transparent that $5.576 million is only variable training cost, not including capital cost associated with prior research, acquisition costs, data fixed costs. It's the people who misquote them that doesn't know AI that is misrepresenting things.
https://arxiv.org/html/2412.19437v1
DeepSeek-V3 Technical Report
DeepSeek was very open that this is purely training compute cost, not the total "fixed+variable" cost. It's the media who misquote them and misrepresent it as a "all-in-one" cost that should be criticized.
Aside from that, DeepSeek showed some innovative steps. Turning a portion of the GPUs they had into L3 memory cache controllers was pretty brilliant. My understanding is cache memory access is a big limiter in training performance. So that's just huge for performance tuning perspective. I am wondering if just revealing this technique will cause a major jump in GPU usage efficiency for everyone or have companies like OpenAI already been doing this without revealing as such? Releasing the model open source could be disruptive, but I'm not sure. Better models are coming. Will they pull this off again to update/improve the model? I doubt it. How far will this open source model carry a company that attempts to build off it before it becomes obsolete? Seems risky given it still is not cheap to buy even low end GPU's to run this model.
As such I think at this point we're all wildly speculating on the net impact of DeepSeek's release to AI LLM development. Anyone saying big tech LLM development is dead is reaching. Anyone claiming this all a big chinese orchestrated scam is reaching. Let's wait and see.
I am thoroughly enraptured by the headlines in this space almost daily. It is fascinating both learning about the tools as they improve as well as using them more and more every day. I can't imagine this is just some hype train that will never arrive at the station from an investment perspective. All the while I look forward to reading the news in a couple months to see what people have done attempting to replicate DeepSeek's approach and/or building upon their opensource. Some more disruption and competition is good even if it shakes my fellow US investors. We need this to beomce more efficient for smaller companies to build and run. I don't want 2-3 giant tech companies controlling access to this emerging technology because it needs a nuclear reactor's worth of energy output to build/run. Nevermind the supply constraint for the chips.