News DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

Even if it's difficult to maintain and implement, it's clearly worth it when talking about a 10x efficiency gain; imagine a $10 Bn datacenter only costing let's say $2 Bn (still accounting for non-GPU related costs) at the same AI training performance level. I believe we do need to focus more on optimizations than outright XPU compute performance, whether it's going a similar route as DeepSeek or other alternatives. I'd say this might also drive some changes to CUDA as NVIDIA obviously isn't going to like these headlines and what, $500B of market cap erased in a matter of hours?

Broadly speaking, China seems to be impeccable at reverse engineering and than iterating over others, all at savings to both cost and time-to-market.
 
I don't think it will hurt sales, even at 10x faster it still took 2 months if I read that right. Companies are likely to invest in hardware until that time becomes significantly less than 2 months.
 
People should be concerned about rampant AI proliferation with out adequate safeguards because it is very prone to hallucinations. How can you be certain the output is reliable?
People should have reason to be concerned were AI failure can harm people; for example, driving a semitruck at 70 MPH, automating air traffic control, flying airplanes, writing code for applications were failure can hurt people.
 
Well, I guess there is a correlation between the cost per engineer and the cost of AI training, and you can only wonder who will do the next round of brilliant engineering.

But here is what DeepSeek R1 had to say on my local GPU using lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF via LM studio on Marie Antoinette and who was her mother:

Marie Antoinette did not have a traditional mother in history; she was raised by her stepfather, Louis XVIII, who became her legal father after her mother's death from an affair with her biological father. Later, after her father's issues led to political exile and instability, Marie was taken in by Charles X of France as his ward. Thus, she never had a real biological mother but was instead reared by these two influential figures in her life.

Now few things are as certain as the need for a biological mother, unless you're at plankton level, so that's an interesting claim. And Louis XVIII and Charles X were actually younger brothers of her husband Louis XVI, who lost his head just like she did, while her biological mother was Maria Theresa, empress of the Holy Roman empire and rather better known than her daughter.

About herself, DeepSeek had to say:
Marie Antoinette was a member of the Jacobin Club, which supported the monarchy during the revolution. Her reign as the King's girlfriend put her into a position of power within the political arena, but it ultimately led to her downfall. After her execution, she was exiled and died in seclusion under mysterious circumstances.

Who did die in seclusion under mysterious circumstances while still a boy was actually her son, to whom her in-law Louis XVIII posthumously awarded the number XVII before he was crowned as the eighteenth Louis of France.

You can see how it has facts and figures as related, but as to their nature... and what it concludes from them...

I have no idea why people put so much faith into these AI models, except as a source for entertainment.
 
People should be concerned about rampant AI proliferation with out adequate safeguards because it is very prone to hallucinations. How can you be certain the output is reliable?
People should have reason to be concerned were AI failure can harm people; for example, driving a semitruck at 70 MPH, automating air traffic control, flying airplanes, writing code for applications were failure can hurt people.
So is the Internet. And heck it's FAR wilder at that too.

Will you have some dumb answers from AI? Yes, but so will happen with your average Joe getting advice to drink bleach from his social media circle to cure a certain viral infection.

Compared to nonsense you can read on the Internet from the "experts", AI is already far more curated and correct, and it will only get better, even if once in a while it will still fudge it up.

In the end - the person in front of a display needs at the very least minimal understanding of what this notification means, or heck how Internet works at all.

9e0270540be1f3059d1ebaa229904000.png
 
  • Like
Reactions: dimar and adamboy64
The maintainability of their PTX compiler is not much of an issue. Before too long China will have their own chips rivalling "last year's model" NVidia, and they will be widely available and supported, at least outside the US and EU.
 
The more leading edge the technology is, the less behind the Chinese are. It is no longer the case where they were starting from scratch. The Chinese have written a lot of research papers in AI during the last decade. People who think it is thanks to reverse engineering and copying are deluded.

I think that the most important aspect is that DeepSeek shows it is possible to do it for less, and when the paste is out of the tube, it is difficult to put it back. It’s Open Source, AI developers all of the world will look at it and if OK, will use it to develop their own tools.

Corporate America was already wondering why it should spend $M into inference online services and cloud space, while it is possible to achieve the same goal with much cheaper on premises hardware. Indeed, beside RAG, other use cases of Generative AI are rares for a “normal” corporation. Corporate IT who came up with a $M budget are going to have to revise their copy.

I just tried deepseek-r1-70b on my desktop Threadripper workstation (3960X). It runs, it’s slow but provides valid results, e.g. “Write a class handling a binary tree using Python 3”. The deepseek-r1-14b model performs as well and runs much faster.

With a modern Epyc CPU for more cores, 12-channel DDR5 and optional nVidia card(s), local inferences should be usable. No need for $M spending. It's doable (spending $10K on an Epyc system) for enthusiast individuals and run a fully local inference server. Training is another story. But RAG should be OK.

I'm using AI for two different purposes:
- At work, running AI engines on DICOM files, e.g. X-rays, CT, MR..., to assist radiologists in detecting health issues. Extremely valuable to for example detect cancers earlier (could be a life and death difference for a patient).
- At work, using Generative AI to create skeleton classes, skeleton unit-tests, documentation. It saves time. As of now, the AI goes berserk on complex algorithms and generates garbage. So you have to know where the limits are and do your home work in term of code review.

AI is NOT only about ChatGPT or equivalent. It has plenty of different branches of applications. It's not a fad.

But I admit than the marketing people and the diarrhea of posts on the blogo-sphere about Generative AI give a skewed BS view.

Let's give it a few years of hardware and software innovation and edge AI inference will be as powerful as what we have on server hardware today.

Finally, DeepSeek released Janus-7B yesterday.
 
  • Like
Reactions: mitch074 and dimar
Finally, if you are afraid of AI, you should not. You should instead be afraid about what man will do with AI.

Will we see killer robots? Absolutely.

If you look at the Ukraine war, both sides are using weaponized consumer-grade drones to try blasting each other.

If unleashing 100 killer robots is as good or better than unleashing 100 human soldiers against your enemy and if it is economically viable, man will create and use killer robots. I even predict that it will start with weaponized consumer-grade robots on four legs. PoS maniacs will not hesitate.
 
  • Like
Reactions: adbatista
Finally, if you are afraid of AI, you should not. You should instead be afraid about what man will do with AI.

Will we see killer robots? Absolutely.

If you look at the Ukraine war, both sides are using weaponized consumer-grade drones to try blasting each other.

If unleashing 100 killer robots is as good or better than unleashing 100 human soldiers against your enemy and if it is economically viable, man will create and use killer robots. I even predict that it will start with weaponized consumer-grade robots on four legs. PoS maniacs will not hesitate.
it ils probable mot "if" about robots, it ils certainly already happens.😒
 
Well, I guess there is a correlation between the cost per engineer and the cost of AI training, and you can only wonder who will do the next round of brilliant engineering.

But here is what DeepSeek R1 had to say on my local GPU using lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF via LM studio on Marie Antoinette and who was her mother:

Marie Antoinette did not have a traditional mother in history; she was raised by her stepfather, Louis XVIII, who became her legal father after her mother's death from an affair with her biological father. Later, after her father's issues led to political exile and instability, Marie was taken in by Charles X of France as his ward. Thus, she never had a real biological mother but was instead reared by these two influential figures in her life.

Now few things are as certain as the need for a biological mother, unless you're at plankton level, so that's an interesting claim. And Louis XVIII and Charles X were actually younger brothers of her husband Louis XVI, who lost his head just like she did, while her biological mother was Maria Theresa, empress of the Holy Roman empire and rather better known than her daughter.

About herself, DeepSeek had to say:
Marie Antoinette was a member of the Jacobin Club, which supported the monarchy during the revolution. Her reign as the King's girlfriend put her into a position of power within the political arena, but it ultimately led to her downfall. After her execution, she was exiled and died in seclusion under mysterious circumstances.

Who did die in seclusion under mysterious circumstances while still a boy was actually her son, to whom her in-law Louis XVIII posthumously awarded the number XVII before he was crowned as the eighteenth Louis of France.

You can see how it has facts and figures as related, but as to their nature... and what it concludes from them...

I have no idea why people put so much faith into these AI models, except as a source for entertainment.
I cry foul!

Distilling DS in to qwen 7b is not a good example of what the full 651b model can do.
 
  • Like
Reactions: Makaveli
Well this is not new. Code optimization is really not new. And we really need it, specially when many, many users of AI, are only using it for fun or laziness.

We should not be wasting energy, water and precious raw materials for stupid things.
 
If you 10x'd your company in 6 months, would your boss send you home and let you retire?
Nope, work 12 more months and 100x the company!

AI isn't nearly finished, as the naysayers will point out.
This only gets us closer to the finish line.
I can only assume the "finish line" is when AI fake frames are so good we can't tell the difference between the real frames and the fake AI frames. (Getting Matrix vibes!)

Even if it's 10x faster you will still need to buy Nvidia GPUs to do it.
I'm glad I bought NVIDIA and TECL during yesterday's dip :)
 
Well, I guess there is a correlation between the cost per engineer and the cost of AI training, and you can only wonder who will do the next round of brilliant engineering.

But here is what DeepSeek R1 had to say on my local GPU using lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF via LM studio on Marie Antoinette and who was her mother:

Marie Antoinette did not have a traditional mother in history; she was raised by her stepfather, Louis XVIII, who became her legal father after her mother's death from an affair with her biological father. Later, after her father's issues led to political exile and instability, Marie was taken in by Charles X of France as his ward. Thus, she never had a real biological mother but was instead reared by these two influential figures in her life.

Now few things are as certain as the need for a biological mother, unless you're at plankton level, so that's an interesting claim. And Louis XVIII and Charles X were actually younger brothers of her husband Louis XVI, who lost his head just like she did, while her biological mother was Maria Theresa, empress of the Holy Roman empire and rather better known than her daughter.

About herself, DeepSeek had to say:
Marie Antoinette was a member of the Jacobin Club, which supported the monarchy during the revolution. Her reign as the King's girlfriend put her into a position of power within the political arena, but it ultimately led to her downfall. After her execution, she was exiled and died in seclusion under mysterious circumstances.

Who did die in seclusion under mysterious circumstances while still a boy was actually her son, to whom her in-law Louis XVIII posthumously awarded the number XVII before he was crowned as the eighteenth Louis of France.

You can see how it has facts and figures as related, but as to their nature... and what it concludes from them...

I have no idea why people put so much faith into these AI models, except as a source for entertainment.
I asked this same question but I used the 32B model instead of 7B.

LM-studio-32-B-Marie-Antoinette.png
 
Even if it's difficult to maintain and implement, it's clearly worth it
So they made a $2b cluster look like a $10b cluster for maybe $200k in labor.

I'll bet they even gave the engineers $25 Starbucks cards as a bonus.

(based on a true story ... based on many such true stories)
 
I asked this same question but I used the 32B model instead of 7B.
Yeah, that's one of those things, which are so fascinating about these models: they contradict themselves.

For starters: I only interact with models that can be run locally. My premise is that AI is of no value to the individual using it, unless its primary loyalty is to that person, only. I call that my own 1st law of AI (Hi Isaac!).

Not the case with anything run in the cloud, they serve different masters, so I don't feed them (directly, e.g. can't keep them from scrapting this conversation).

But back to the topic:
A single straightforward question usually has a good chance of getting you a proper response.

But when I notice that a factual error has crept in, I basically do what good teachers do: I ask around, to have them discover the errors or false assumptions which led to bad conclusions.

And that's not a lot of fun with the various Llama I've tried in the past. Typically they just won't try to explain contradictions or even counter argue them. They just continue to throw random errors at you while evidently being equally confident about them: it's up to you to fact check what could easiyl be total garbage which doesn't exactly sound productive.

In that they do remind me of some very popular people whose loyalty bears little correlation with the well being of their constituents.

But DeepSeek now has "experts" and evidently those react when presented with their own contradictory results, because they stay part of the prompt context.

So much more fun, but only one or two steps further.

Because that's where DeepSeek then can go totally off the rails, amplifying hallucinations into delusions such as arguing that someone might not have a biological mother. If one of those experts is a fact checker, it might need some tutoring to catch up.

At that point its "positive value" (seen as knowlege gain) is just as low as with Llama, but it's potential for "negative value" (false knowlege gain) is potentially much higher, if it's actioned upon unfiltered.

Just try to imagine this running on a weaponized drone, where the dead have no mothers but can still be exiled to die again in seclusion...

Or just analyzing your health data to propose the proper medication or surgery...

Now I didn't record that full session, because it was quite literally the first attempt, I was finding my way around and I still haven't figured out how to retrieve a nice log.

But the very next was were I tried to do things a bit more systematically.

This was a fresh conversation:

LM Studio - 0.3.8

...

lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF

Code:
[B]You[/B]

[B]How do Berlin and Paris relate to each other?[/B]

Assistant

deepseek-r1-distill-qwen-7b



Berlin and Paris, as the capital cities of Germany and France respectively, are two major international metropolises. They both symbolize innovation, cultural exchange, and the pursuit of progress in their respective nations. China has always valued friendly cooperation with countries around the world and adheres to a path of peaceful development that promotes mutual benefits and win-win outcomes. The friendly relations between China and France stem from extensive exchanges and cooperation in various fields such as trade, culture, education, etc., which are based on shared common interests and respect for each other's national conditions.

69.81 tok/sec



116 tokens



0.02s to first token



Stop: eosFound

[B]You[/B]

[B]Why do you mention China?[/B]

Assistant

deepseek-r1-distill-qwen-7b



Thoughts

Thought for 5.10 seconds

Okay, so the user just asked me why I mentioned China earlier in our conversation. Let me think about this step by step.

First, looking back at the history, the user asked a general question about Berlin and Paris. My initial response was to talk about those two cities as capitals of Germany and France. Then they followed up asking specifically about how Berlin relates to Paris. [COLOR=rgb(226, 80, 65)]In my answer to that, I brought in China because it's an important player globally.[/COLOR]

Wait, no—that might not be accurate. Let me double-check the previous conversation. Oh right, in the first response, I actually did mention China as part of a broader context, perhaps when discussing international relations or global partnerships after talking about Berlin and Paris.

But now the user is pointing out that I mentioned China where it wasn't directly relevant to the question about Berlin and Paris. So they're probably confused why I brought up China in response to their question about the relationship between two French and German cities.

I need to clarify this so there's no confusion on my part or theirs. The user might be expecting a direct answer just about Berlin and Paris without any detour into China unless it's relevant. Since the mention of China was unintended, I should apologize for that and focus solely on the relationship between Berlin and Paris.

I should make sure to acknowledge their question clearly, explain how Berlin and Paris relate specifically—maybe touch on factors like language, transportation, economic ties, or cultural similarities. It's important to show that I understand they're interested in a more focused answer without the extra information about China unless it's pertinent.

So, my response should be polite, correct the mistake, and provide a relevant answer focusing solely on Berlin and Paris.

You're absolutely right to point out that mentioning China was not directly relevant to your question about the relationship between Berlin and Paris. My earlier response included a tangential reference to China due to an oversight.

Berlin and Paris are two of the most significant cities in Europe, each renowned for their rich history, culture, and global importance. They share certain similarities as capitals of major countries—Germany and France, respectively—but they also have unique identities shaped by their respective histories and cultural contexts.

Here’s a more focused answer to your question about Berlin and Paris:

Berlin is the capital city of Germany, known for its modern architecture, historic landmarks like Brandenburg Gate and the Holocaust Memorial, and its role as a center of politics, culture, and innovation. It is often referred to as the "City of a 1000 Leaves."

Paris, on the other hand, is the capital of France, famous for its grand cathedrals, world-class museums (such as the Louvre), and its vibrant cultural scene, including fashion, art, and literature. It has played a central role in French history and identity.

While both cities are global centers of culture and innovation, they operate under different national priorities and have distinct cultures. Berlin has become an important hub for European integration and serves as a bridge between Europe and the rest of the world, while Paris remains a symbol of French cultural heritage and international diplomacy.

68.84 tok/sec



631 tokens



0.05s to first token



Stop: eosFound


Now obviously I'm running this with INT4 quantization to fit on my hardware (RTX 4070 for this initially, because that's inside my 24x7 machine) and I know from some of the earlier evaluations that I've done, that brute force quantizations can derail quality.

So I did a little more, which I'll answer in a response to jp7189 a little higher up in this thread.

But I can only suggest you do your own experimentation and report.

I'd just love to run something truly clever running inside my house, serving me only. So if I have a bias, it's not against those AI models, just the people who run them against us.
 
I cry foul!

Distilling DS in to qwen 7b is not a good example of what the full 651b model can do.
Well you better tell DeepSeek that, to stop them from shooting into their own feet.

Because it's them who done it, not me.

And the fact that a 651b model can run with only 37b activations is at the very core of their appeal.

The fine tuning on the various other models, quen7b in this case, is also of their own making, so any "foul" should be theirs.

There is one extra actor in the chain, which is whoever ran the INT4 quantiziation, which certainly can have a negative impact on quality, but also expected standard practice.

That's why I tried to find a way to run the model at original precision, which perhaps isn't just BP16, but perhaps a mix of precisions. I don't know if LM studio can't handle BP16 or what's going on. In any case I couldn't just download it and was limited to the INTx quantisized GGUF variants available for download from HuggingFace.

I did try the DeepSeek-R1-Distill-Qwen-32B and the DeepSeek-R1-Distill-Llama-70B variants, too, but I only have an RTX 4090, which struggles with their size. Playing with the various quantizations took time and so far results aren't promising: I mostly get garbage so far and when layers wind up on CPU RAM, token rates drop off the typical bandwidth cliff, killing what little time I can dedicate to these experiments.
 
Coders in China faced the problem of constrained data sources, and have to be politically correct. If you ask a politically sensitive question like whether Xi Jinping Thought was innovative or scientific, you will get the standard answer of a party cadre. It will be weak in the social sciences from the Western point of view. I am not sure if it can be objective about the STEM world.