News China's newest homegrown AI chip matches industry standard at 45 TOPS — 6nm Arm-based 12-core Cixin P1 starting mass production

Status
Not open for further replies.
This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?
 
This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?
40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.

As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.

I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8).
 
40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.

As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.

I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8).
You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.
 
  • Like
Reactions: usertests
You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.
Nvidia's marketers certainly tried to give it more attention:

https://www.tomshardware.com/tech-i...rement-is-only-good-enough-for-basic-ai-tasks
ajv3ThcwJLNAMoJobcq4RP-970-80.jpg.webp


But the fact is, Microsoft is pushing for NPUs to go everywhere, Apple already had them, etc. Rapidly creating a minimum baseline means developers can target it. NPUs aren't just going into laptops, but also millions of office desktops without discrete GPUs (starting with Arrow Lake, mostly).

It's also an additional resource you can use while using 100% of your GPU for something else, like a game. However, the NPU is taking up die space that could have been omitted or used for more cache, cores, iGPU, etc. Microsoft has forced everyone to pay the price.
 
This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve
They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.

the industry is slow-walking progress to maximize sales.
Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.

If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?
Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.

If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar.
 
They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.
None that I can find.

Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.
OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.

Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.

If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar.
What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).

I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS.
 
OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.
Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!

What is the point of the 45 TOPS standard in the first place?
Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification.
 
  • Like
Reactions: ThomasKinsley
Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!
What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time. So what is it that these new 45 TOPS chips give us that current chips and graphics cards do not?

Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models. And if I was a developer trying to make the latest software/AI model, I'd be skipping the laptops and tuning my model for commonly-used GPUs, including the RTX3060 and up.

Somehow this entire segment is being ignored as marketing is going for 45 TOPS instead. What are the benefits? If the AI is on the cloud then your system's TOPS don't matter because the servers are doing the work for you. If your system is running the load, then why not get better hardware and leave the laptops in the dust?

Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification.
Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better. Not trying to be argumentative. I'm trying to think of a good argument to justify their position as well.
 
What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time.
I don't know if you're familiar with the concept of a Turing machine, but anything that can be reduced to digital computation is computable by one. All you get, by adding complexity, is just making it faster.

Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models.
In a laptop? Maybe for you, but most corporate users (including me) simply will not lug around such a huge gaming laptop. They want something thin & light, with decent battery life - even if they're using AI features like MS CoPilot.

Somehow this entire segment is being ignored
Huh? You can buy high-end business laptops that have a dGPU. They're just big, loud, and often have poor battery life (at least, in my experience).

If your system is running the load, then why not get better hardware and leave the laptops in the dust?
Yes, if you don't need a laptop, then don't buy one! Geez...

Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better.
Well, if 45 TOPS is sufficient, then inflating the spec would be bad due to making the minimum hardware more expensive, hot, loud, bulky, etc. That means fewer people will adopt it, which is contrary to MS' interests, being a software company that's trying to make $$$ off of their AI software.
 
None that I can find.


OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.


What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).

I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS.
The primary purpose of a NPU on an end user computer is not to train LLM models but to use pre-trained models to customize and inference based on your local information. It can be used to process voice, text or image based inputs and generate outputs. By including data from your emails, web searches and documents it can (theoretically) provide you with more appropriate answers. 45 TOPS is an estimate to allow near real time responses.
 
  • Like
Reactions: ThomasKinsley
This makes sense. Can a GPU do the same task?
Not as efficiently, or else Intel and AMD wouldn't bother with NPUs and would just make their iGPUs bigger.

Somewhere in their Meteor Lake product introduction slides, Intel has a comparison of the relative power & performance of the CPU, GPU, and NPU on AI workloads. The GPU is way more efficient than the CPU, but the NPU is more efficient still. I'll see if I can find it.
 
Last edited:
Not as efficiently, or else Intel and AMD wouldn't bother with NPUs and would just make their iGPUs bigger.

Somewhere in their Meteor Lake product introduction slides, Intel has a comparison of the relative power & performance of the CPU, GPU, and NPU on AI workloads. The GPU is way more efficient than the CPU, but the NPU is more efficient still. I'll see if I can find it.
I think I know which slide you're referring to. I meant a dGPU.
 
This one:
ZagXKPtqztMGcvr5vnpDva.jpg


Yes, GPUs can do the same task. A dGPU isn't really that different than an iGPU, other than being bigger and having relatively greater memory bandwidth.
Is anyone making NPUs as add in cards for desktops? It seems like there's room for something that's higher performing than a dGPU, and less expensive for inference only tasks.
 
Last edited:
Is anyone making NPUs as add in cards for desktops? It seems like there's room for something that's higher performing than a dGPU, and less expensive for inference only tasks.
Tenstorrent is one, but their current generation of cards are based on a pretty old ASIC made on an old 12 nm node. It's arguably competitive with RTX 3000 GPUs, but not RTX 4000.

A few other companies make accelerators like this for servers or embedded devices.
 
Status
Not open for further replies.