News New Chinese GPGPU arrives to challenge Nvidia's AI dominance but falls woefully short - Loongson unveils AI and HPC GPU, up to 1 Tflops of performa...

Admin · Dec 1, 2023

Loongson goes after scientific and AI computing with the LG200 'GPGPU' that promises up to 1 TFLOPS performance per node.

New Chinese GPGPU arrives to challenge Nvidia's AI dominance but falls woefully short - Loongson unveils AI and HPC GPU, up to 1 Tflops of performa... : Read more

bit_user · Dec 1, 2023

I just tried uploading that slide image to Google Translate, which seems to work surprisingly well. It revealed basically the same information included in the article, so maybe the author already did exactly that. Something funny: it seems they refer to their multi-node link as "Dragon Chain technology".

The article does contain a glaring error:

Nvidia's H100 delivers 67 FP64 TFLOPS.

No, that's the fp32 rate. Its fp64 throughput is exactly half of that, at 33.5 TFLOPS. It is a lot, and a huge leap from even the previous generation's A100 (which managed only 9.7). I'm guessing they got embarrassed by AMD leap-frogging them on this metric, in the previous generation.

List of Nvidia graphics processing units - Wikipedia

en.wikipedia.org

Also:

Loongson calls its LG200 accelerator a "GPGPU," which certainly implies that the part not only supports AI, HPC, and graphics workloads but can also perform general-purpose computing. But we can only wonder what the company means by general-purpose computing.

No, "GPGPU" just means it's more general than graphics. That extends as far as HPC and AI, but usually not much beyond. Anything GPU-like is still practically limited to workloads that are number-crunching, highly parallel, and SIMD-friendly.

GPGPU is an old term, going back at least 20 years. It just referred to using a GPU for computations other than graphics.

Sorry to bother you, but I'm sure @JarredWaltonGPU can set Anton straight on these points.

collin3000 · Dec 1, 2023

Should be noted that Teraflops aren't everything. But assuming that the 256 giga flops to 1 Teraflops has the Teraflops as the fp32 rate (although I'd guess it's int8) that puts this card's compute power on par with the Nvidia Quadro 6000 that was released back in 2010.

I love competition in markets with few suppliers. But they've got a long long way to go to be competitive. Especially when the RTX 4080 which can still be shipped into China has a FP32 of 48.74 Teraflops and even a GTX 1060 can pull 4.38 Teraflops. So even if all GPU shipments into China were banned, the government will be better off commandeering every single gamers graphics card even if it was old. Compared to currently using this chip.

ivan_vy · Dec 3, 2023

think about this like a proof-of-concept, a can-be-done and boy they will, Loongson CPUs are improving at a surprising rate, GPUs from MooreThreads are great on HW side (awful on SW) but this means the bottom and medium segment will be served, high end users can still buy and import from 3rd parties but soon will be feed too, how? huge R&D budgets paid by a captive (by chinese government and US sanctions) huge medium and low consumer markets.

bit_user · Dec 4, 2023

collin3000 said:
assuming that the 256 giga flops to 1 Teraflops has the Teraflops as the fp32 rate (although I'd guess it's int8) that puts this card's compute power on par with the Nvidia Quadro 6000 that was released back in 2010.

Given that int8 performance is called out separately, I think they mean floating-point when they say FLOPS. That is what it stands for, after all!

Typically, there's a 2:1 ratio between fp32 and fp64. Likewise, there's usually a 2:1 ratio between fp16 and fp32. So, that would neatly cover the spread of 256 GFLOPS to 1 TFLOPs, where the low end is fp64 and the high end is fp16.

collin3000 said:
the RTX 4080 which can still be shipped into China has a FP32 of 48.74 Teraflops and even a GTX 1060 can pull 4.38 Teraflops. So even if all GPU shipments into China were banned, the government will be better off commandeering every single gamers graphics card even if it was old. Compared to currently using this chip.

They do talk about multi-node scalability, although that's going to be more for AI than most other workloads.

Anyway, this is a start. It's probably a good sign to see them put forth some plausible-sounding claims, rather than how MooreThreads way over-promised and under-delivered. Even if they delivered far more competitive hardware today, it would take a while for the software to catch up. If they deliver something modest now, it can at least be used as a development vehicle to build up the software stack & ecosystem.

bit_user · Dec 4, 2023

ivan_vy said:
GPUs from MooreThreads are great on HW side (awful on SW)

Show me evidence the hardware is actually good. I've seen their claims about its specs, but not a single benchmark even remotely close to proving them.

I suspect the MTT S80 hardware is probably full of bugs and bottlenecks that the software has to work around, and that's one reason it's taken them so long to get the drivers in half-decent shape. Even with the best drivers, those GPUs probably won't perform anything like their specs suggest.

ivan_vy said:
huge R&D budgets paid by a captive (by chinese government and US sanctions) huge medium and low consumer markets.

Didn't they just have a round of layoffs? I'm having some trouble finding the article, but it was within the last month.

As for consumers propping up the low-end, that only works if the cards are cost-competitive with options from Intel, AMD, and Nvidia, which the S80 isn't (even after all the markdowns).

ivan_vy · Dec 4, 2023

bit_user said:
Show me evidence the hardware is actually good.

"The GPU's clock speed is set at 1.8 GHz, and maximum compute performance has been measured at 14.2 TFLOPS. A 256-bit memory bus grants a bandwidth transfer rate of 448 GB/s. PC Watch notes that the card's support for PCIe Gen 5 x 16 (offering up to 128 GB/s bandwidth) is quite surprising, given the early nature of this connection standard."

Moore Threads MTT S80 GPU Benchmarked by PC Watch Japan

The Moore Threads MTT S80 gaming-oriented graphics card has been tested mostly by Chinese hardware publications, but Japan's PC Watch has managed to get hold of a sample unit configured with 16 GB GDDR6 (14 Gbps) for evaluation purposes and soon published their findings in a "HotHot REVIEW!" The...

www.techpowerup.com

quite a beast , on paper...not so good on real world scenarios, drivers are improving

China's fastest domestic gaming GPU gets massive performance boost from new drivers, up to 80% jump in some games, as the country grapples with RTX 4090 ban

Gains ranging from 30% to 80%.

www.tomshardware.com

remember ARC by Intel? 750% boost, why is relevant? first GPU in decades

Intel claims up to 750% gaming boost with latest Arc graphics drivers

Halo gets the biggest improvement, along with older DirectX11 games, but a few new titles get big gains too

www.pcworld.com

MTT founded by Zhang Jianzhong the former global vice president of NVIDIA and general manager of Nvidia China, the team is not a bunch of guys in a garage or the likes from Ouya.

about the layoff, 1% seems not so dramatic
"Moore Threads Intelligent Technology Beijing Co. plans to cut a single-digit percentage of its roughly 1,000 employees,"

China AI Chipmaker Moore Threads Cuts Jobs After US Blacklisting

(Bloomberg) -- Moore Threads, a Chinese developer of graphics processors and AI accelerators, is cutting jobs after the US added the three-year-old firm to a trade blacklist last month.Most Read from BloombergRockstar Plans to Announce Much Anticipated ‘Grand Theft Auto VI’Stocks Slide as Powell...

finance.yahoo.com

for 3.4 billion value company

China's Moore Threads valued at $3.4 billion in funding before U.S. curbs - sources

Chinese chip design startup Moore Threads agreed a capital raise that brought its valuation to roughly 25 billion yuan ($3.449 billion) shortly before it was hit by U.S. export controls, two sources familiar with the matter said. The U.S. in October added the graphic processing unit designer to...

finance.yahoo.com

No need to be too competitive when the only GPU sold in China will be domestic cards - a race spearheaded by MTT- and the sanctions are drawing exactly this scenario.

bit_user · Dec 4, 2023

ivan_vy said:
"The GPU's clock speed is set at 1.8 GHz, and maximum compute performance has been measured at 14.2 TFLOPS. A 256-bit memory bus grants a bandwidth transfer rate of 448 GB/s. PC Watch notes that the card's support for PCIe Gen 5 x 16 (offering up to 128 GB/s bandwidth) is quite surprising, given the early nature of this connection standard."

That's not evidence. You're just quoting specs. I'm looking for proof the hardware isn't hopelessly borked. Are there any synthetic tests that show it nearing even a single one of its theoretical performance limits?

ivan_vy said:
drivers are improving

And yet the card still performs like hot garbage.

ivan_vy said:
remember ARC by Intel?

These guys aren't Intel - they haven't earned any benefit of the doubt. Intel had over a decade of building iGPUs, so it's a fair chance that even if Intel messed up a few things about Alchemist, they probably at least got most of the important stuff right - and yet, the A770 underperforms its specs by about 2x.

ivan_vy said:
MTT founded by Zhang Jianzhong the former global vice president of NVIDIA and general manager of Nvidia China, the team is not a bunch of guys in a garage or the likes from Ouya.

Right now, they look like a bunch of absolute clowns, especially when you go back and look at the promises and hype they created leading up to its launch.

ivan_vy said:
No need to be too competitive when the only GPU sold in China will be domestic cards

That doesn't look set to happen any time soon.

ivan_vy · Dec 4, 2023

bit_user said:
Are there any synthetic tests that show it nearing even a single one of its theoretical performance limits?

"In some very specific synthetic benchmarks, the MTT S80 GPU does better than an RTX 3060."

The MTT S80 Chinese GPU has the performance of a GeForce GTX 1060

Earlier on we discussed the announcement of this GPU series, Moore Threads, a Chinese company, has released the first graphics card with PCIe 5.0 support. The MTT S80 gaming GPU just came out this we...

www.guru3d.com

again, drivers are the biggest problems.

bit_user said:
they look like a bunch of absolute clowns,

this is more an opinion. MTT poached AMD and Nvidia engineers, you don´t look for the worse if you want to attract talent.

No need to be too competitive when the only GPU sold in China will be domestic cards

bit_user said:
That doesn't look set to happen any time soon.

looks like it will get worse before (if ever) gets better.

US govt warns that sanctions swerving GPUs will fall under their 'control the very next day'

Secretary Raimondo said she didn’t want to specifically call out Nvidia.

www.tomshardware.com

yes, I know she is talking about AI, but GPUs are so entrenched with AI that it will inevitable limit availability in China as already happening.

bit_user · Dec 4, 2023

ivan_vy said:
again, drivers are the biggest problems.

How do you actually know that?

ivan_vy said:
this is more an opinion. MTT poached AMD and Nvidia engineers, you don´t look for the worse if you want to attract talent.

Intel poached Raja Koduri from AMD. Doesn't mean anything.

Unless you've worked in these organizations, you don't know which engineers actually know their stuff and which are full of hot air.

ivan_vy · Dec 4, 2023

bit_user said:
How do you actually know that?

simply wouldn't improve once produced.

bit_user said:
Raja Koduri

as much as we make fun of Raja, he has been with S3, Apple, AMD and Intel; with AMD landed PS5, XBOX winning GPU designs. I don't think too many people were wrong with him.
I understand your point, HW blueprints can be copied (stolen?), read a bunch of whitepapers and so, the success is in the software: looking at CUDA, how the PS3 weird architecture shinned in the end of lifecycle, how AMD ages like fine wine, etc... these fast times are unrelenting and competition is hard.
The hardware race is tight and China eventually will catch on, the software will be the deciding factor in years to come.

bit_user · Dec 4, 2023

ivan_vy said:
simply wouldn't improve once produced.

My point wasn't that the drivers weren't bad, just that even perfect drivers won't result in the level of performance suggested by the card's specs.

I've programmed some buggy hardware, in my career. I also know that GPUs have their share of bugs, and sometimes the workarounds for these bugs affect performance. That's my core contention - either though bottlenecks in the design or implementation bugs, I think those GPUs won't ever perform well.

The next generation will be their shot at redemption. I will wait and see.

Search

News New Chinese GPGPU arrives to challenge Nvidia's AI dominance but falls woefully short - Loongson unveils AI and HPC GPU, up to 1 Tflops of performa...

Admin

Administrator

bit_user

Titan

List of Nvidia graphics processing units - Wikipedia

collin3000

Distinguished

ivan_vy

Reputable

bit_user

Titan

bit_user

Titan

ivan_vy

Reputable

Moore Threads MTT S80 GPU Benchmarked by PC Watch Japan

China's fastest domestic gaming GPU gets massive performance boost from new drivers, up to 80% jump in some games, as the country grapples with RTX 4090 ban

Intel claims up to 750% gaming boost with latest Arc graphics drivers

China AI Chipmaker Moore Threads Cuts Jobs After US Blacklisting

China's Moore Threads valued at $3.4 billion in funding before U.S. curbs - sources

bit_user

Titan

ivan_vy

Reputable

The MTT S80 Chinese GPU has the performance of a GeForce GTX 1060

US govt warns that sanctions swerving GPUs will fall under their 'control the very next day'

bit_user

Titan

ivan_vy

Reputable

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page