News China's newest homegrown AI chip matches industry standard at 45 TOPS — 6nm Arm-based 12-core Cixin P1 starting mass production

Admin · Jul 31, 2024

The Cixin P1 has entered the tape-out phase of its development, with test models seen by the Chinese tech press in a recent launch event. The 12-core Arm CPU on the 6nm process will reach 45 TOPS across its CPU, GPU and NPU.

China's newest homegrown AI chip matches industry standard at 45 TOPS — 6nm Arm-based 12-core Cixin P1 starting mass production : Read more

ThomasKinsley · Jul 31, 2024

This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?

usertests · Jul 31, 2024

ThomasKinsley said:
This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?

40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.

As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.

I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8).

ThomasKinsley · Aug 1, 2024

usertests said:
40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.

As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.

I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8).

You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.

usertests · Aug 1, 2024

ThomasKinsley said:
You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.

Nvidia's marketers certainly tried to give it more attention:

https://www.tomshardware.com/tech-i...rement-is-only-good-enough-for-basic-ai-tasks

But the fact is, Microsoft is pushing for NPUs to go everywhere, Apple already had them, etc. Rapidly creating a minimum baseline means developers can target it. NPUs aren't just going into laptops, but also millions of office desktops without discrete GPUs (starting with Arrow Lake, mostly).

It's also an additional resource you can use while using 100% of your GPU for something else, like a game. However, the NPU is taking up die space that could have been omitted or used for more cache, cores, iGPU, etc. Microsoft has forced everyone to pay the price.

bit_user · Aug 1, 2024

ThomasKinsley said:
This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve

They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.

ThomasKinsley said:
the industry is slow-walking progress to maximize sales.

Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.

ThomasKinsley said:
If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?

Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.

If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar.

ThomasKinsley · Aug 1, 2024

bit_user said:
They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.

None that I can find.

bit_user said:
Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.

OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.

bit_user said:
Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.

If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar.

What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).

I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS.

bit_user · Aug 2, 2024

ThomasKinsley said:
OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.

Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!

ThomasKinsley said:
What is the point of the 45 TOPS standard in the first place?

Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification.

ThomasKinsley · Aug 2, 2024

bit_user said:
Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!

What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time. So what is it that these new 45 TOPS chips give us that current chips and graphics cards do not?

Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models. And if I was a developer trying to make the latest software/AI model, I'd be skipping the laptops and tuning my model for commonly-used GPUs, including the RTX3060 and up.

Somehow this entire segment is being ignored as marketing is going for 45 TOPS instead. What are the benefits? If the AI is on the cloud then your system's TOPS don't matter because the servers are doing the work for you. If your system is running the load, then why not get better hardware and leave the laptops in the dust?

bit_user said:
Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification.

Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better. Not trying to be argumentative. I'm trying to think of a good argument to justify their position as well.

bit_user · Aug 2, 2024

ThomasKinsley said:
What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time.

I don't know if you're familiar with the concept of a Turing machine, but anything that can be reduced to digital computation is computable by one. All you get, by adding complexity, is just making it faster.

https://en.wikipedia.org/wiki/Turing_machine

ThomasKinsley said:
Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models.

In a laptop? Maybe for you, but most corporate users (including me) simply will not lug around such a huge gaming laptop. They want something thin & light, with decent battery life - even if they're using AI features like MS CoPilot.

ThomasKinsley said:
Somehow this entire segment is being ignored

Huh? You can buy high-end business laptops that have a dGPU. They're just big, loud, and often have poor battery life (at least, in my experience).

ThomasKinsley said:
If your system is running the load, then why not get better hardware and leave the laptops in the dust?

Yes, if you don't need a laptop, then don't buy one! Geez...

ThomasKinsley said:
Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better.

Well, if 45 TOPS is sufficient, then inflating the spec would be bad due to making the minimum hardware more expensive, hot, loud, bulky, etc. That means fewer people will adopt it, which is contrary to MS' interests, being a software company that's trying to make $$$ off of their AI software.

anoldnewb · Aug 2, 2024

ThomasKinsley said:
None that I can find.

OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.

What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).

I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS.

The primary purpose of a NPU on an end user computer is not to train LLM models but to use pre-trained models to customize and inference based on your local information. It can be used to process voice, text or image based inputs and generate outputs. By including data from your emails, web searches and documents it can (theoretically) provide you with more appropriate answers. 45 TOPS is an estimate to allow near real time responses.

ThomasKinsley · Aug 2, 2024

anoldnewb said:
The primary purpose of a NPU on an end user computer is not to train LLM models but to use pre-trained models to customize and inference based on your local information.

This makes sense. Can a GPU do the same task?

bit_user · Aug 2, 2024

ThomasKinsley said:
This makes sense. Can a GPU do the same task?

Not as efficiently, or else Intel and AMD wouldn't bother with NPUs and would just make their iGPUs bigger.

Somewhere in their Meteor Lake product introduction slides, Intel has a comparison of the relative power & performance of the CPU, GPU, and NPU on AI workloads. The GPU is way more efficient than the CPU, but the NPU is more efficient still. I'll see if I can find it.

ThomasKinsley · Aug 2, 2024

bit_user said:
Not as efficiently, or else Intel and AMD wouldn't bother with NPUs and would just make their iGPUs bigger.

Somewhere in their Meteor Lake product introduction slides, Intel has a comparison of the relative power & performance of the CPU, GPU, and NPU on AI workloads. The GPU is way more efficient than the CPU, but the NPU is more efficient still. I'll see if I can find it.

I think I know which slide you're referring to. I meant a dGPU.

bit_user · Aug 2, 2024

ThomasKinsley said:
I think I know which slide you're referring to.

This one:

ThomasKinsley said:
I meant a dGPU.

Yes, GPUs can do the same task. A dGPU isn't really that different than an iGPU, other than being bigger and having relatively greater memory bandwidth.

jp7189 · Aug 3, 2024

bit_user said:
This one:

Yes, GPUs can do the same task. A dGPU isn't really that different than an iGPU, other than being bigger and having relatively greater memory bandwidth.

Is anyone making NPUs as add in cards for desktops? It seems like there's room for something that's higher performing than a dGPU, and less expensive for inference only tasks.

bit_user · Aug 3, 2024

jp7189 said:
Is anyone making NPUs as add in cards for desktops? It seems like there's room for something that's higher performing than a dGPU, and less expensive for inference only tasks.

Tenstorrent is one, but their current generation of cards are based on a pretty old ASIC made on an old 12 nm node. It's arguably competitive with RTX 3000 GPUs, but not RTX 4000.

https://www.anandtech.com/show/2148...wormhole-ai-processors-466-fp8-tflops-at-300w

A few other companies make accelerators like this for servers or embedded devices.

Search

News China's newest homegrown AI chip matches industry standard at 45 TOPS — 6nm Arm-based 12-core Cixin P1 starting mass production

Admin

Administrator

ThomasKinsley

Notable

usertests

Distinguished

ThomasKinsley

Notable

usertests

Distinguished

bit_user

Titan

ThomasKinsley

Notable

bit_user

Titan

ThomasKinsley

Notable

bit_user

Titan

anoldnewb

Distinguished

ThomasKinsley

Notable

bit_user

Titan

ThomasKinsley

Notable

bit_user

Titan

jp7189

Distinguished

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page