News New Chinese GPGPU arrives to challenge Nvidia's AI dominance but falls woefully short - Loongson unveils AI and HPC GPU, up to 1 Tflops of performa...

Status
Not open for further replies.
I just tried uploading that slide image to Google Translate, which seems to work surprisingly well. It revealed basically the same information included in the article, so maybe the author already did exactly that. Something funny: it seems they refer to their multi-node link as "Dragon Chain technology".

The article does contain a glaring error:
Nvidia's H100 delivers 67 FP64 TFLOPS.
No, that's the fp32 rate. Its fp64 throughput is exactly half of that, at 33.5 TFLOPS. It is a lot, and a huge leap from even the previous generation's A100 (which managed only 9.7). I'm guessing they got embarrassed by AMD leap-frogging them on this metric, in the previous generation.

Also:
Loongson calls its LG200 accelerator a "GPGPU," which certainly implies that the part not only supports AI, HPC, and graphics workloads but can also perform general-purpose computing. But we can only wonder what the company means by general-purpose computing.
No, "GPGPU" just means it's more general than graphics. That extends as far as HPC and AI, but usually not much beyond. Anything GPU-like is still practically limited to workloads that are number-crunching, highly parallel, and SIMD-friendly.

GPGPU is an old term, going back at least 20 years. It just referred to using a GPU for computations other than graphics.

Sorry to bother you, but I'm sure @JarredWaltonGPU can set Anton straight on these points.
 
Last edited:
Should be noted that Teraflops aren't everything. But assuming that the 256 giga flops to 1 Teraflops has the Teraflops as the fp32 rate (although I'd guess it's int8) that puts this card's compute power on par with the Nvidia Quadro 6000 that was released back in 2010.

I love competition in markets with few suppliers. But they've got a long long way to go to be competitive. Especially when the RTX 4080 which can still be shipped into China has a FP32 of 48.74 Teraflops and even a GTX 1060 can pull 4.38 Teraflops. So even if all GPU shipments into China were banned, the government will be better off commandeering every single gamers graphics card even if it was old. Compared to currently using this chip.
 
  • Like
Reactions: Order 66
think about this like a proof-of-concept, a can-be-done and boy they will, Loongson CPUs are improving at a surprising rate, GPUs from MooreThreads are great on HW side (awful on SW) but this means the bottom and medium segment will be served, high end users can still buy and import from 3rd parties but soon will be feed too, how? huge R&D budgets paid by a captive (by chinese government and US sanctions) huge medium and low consumer markets.
 
assuming that the 256 giga flops to 1 Teraflops has the Teraflops as the fp32 rate (although I'd guess it's int8) that puts this card's compute power on par with the Nvidia Quadro 6000 that was released back in 2010.
Given that int8 performance is called out separately, I think they mean floating-point when they say FLOPS. That is what it stands for, after all!

Typically, there's a 2:1 ratio between fp32 and fp64. Likewise, there's usually a 2:1 ratio between fp16 and fp32. So, that would neatly cover the spread of 256 GFLOPS to 1 TFLOPs, where the low end is fp64 and the high end is fp16.

the RTX 4080 which can still be shipped into China has a FP32 of 48.74 Teraflops and even a GTX 1060 can pull 4.38 Teraflops. So even if all GPU shipments into China were banned, the government will be better off commandeering every single gamers graphics card even if it was old. Compared to currently using this chip.
They do talk about multi-node scalability, although that's going to be more for AI than most other workloads.

Anyway, this is a start. It's probably a good sign to see them put forth some plausible-sounding claims, rather than how MooreThreads way over-promised and under-delivered. Even if they delivered far more competitive hardware today, it would take a while for the software to catch up. If they deliver something modest now, it can at least be used as a development vehicle to build up the software stack & ecosystem.
 
GPUs from MooreThreads are great on HW side (awful on SW)
Show me evidence the hardware is actually good. I've seen their claims about its specs, but not a single benchmark even remotely close to proving them.

I suspect the MTT S80 hardware is probably full of bugs and bottlenecks that the software has to work around, and that's one reason it's taken them so long to get the drivers in half-decent shape. Even with the best drivers, those GPUs probably won't perform anything like their specs suggest.

huge R&D budgets paid by a captive (by chinese government and US sanctions) huge medium and low consumer markets.
Didn't they just have a round of layoffs? I'm having some trouble finding the article, but it was within the last month.

As for consumers propping up the low-end, that only works if the cards are cost-competitive with options from Intel, AMD, and Nvidia, which the S80 isn't (even after all the markdowns).
 
Show me evidence the hardware is actually good.
"The GPU's clock speed is set at 1.8 GHz, and maximum compute performance has been measured at 14.2 TFLOPS. A 256-bit memory bus grants a bandwidth transfer rate of 448 GB/s. PC Watch notes that the card's support for PCIe Gen 5 x 16 (offering up to 128 GB/s bandwidth) is quite surprising, given the early nature of this connection standard."
quite a beast , on paper...not so good on real world scenarios, drivers are improving

remember ARC by Intel? 750% boost, why is relevant? first GPU in decades


MTT founded by Zhang Jianzhong the former global vice president of NVIDIA and general manager of Nvidia China, the team is not a bunch of guys in a garage or the likes from Ouya.

about the layoff, 1% seems not so dramatic
"Moore Threads Intelligent Technology Beijing Co. plans to cut a single-digit percentage of its roughly 1,000 employees,"
for 3.4 billion value company

No need to be too competitive when the only GPU sold in China will be domestic cards - a race spearheaded by MTT- and the sanctions are drawing exactly this scenario.
 
"The GPU's clock speed is set at 1.8 GHz, and maximum compute performance has been measured at 14.2 TFLOPS. A 256-bit memory bus grants a bandwidth transfer rate of 448 GB/s. PC Watch notes that the card's support for PCIe Gen 5 x 16 (offering up to 128 GB/s bandwidth) is quite surprising, given the early nature of this connection standard."
That's not evidence. You're just quoting specs. I'm looking for proof the hardware isn't hopelessly borked. Are there any synthetic tests that show it nearing even a single one of its theoretical performance limits?

drivers are improving
And yet the card still performs like hot garbage.

remember ARC by Intel?
These guys aren't Intel - they haven't earned any benefit of the doubt. Intel had over a decade of building iGPUs, so it's a fair chance that even if Intel messed up a few things about Alchemist, they probably at least got most of the important stuff right - and yet, the A770 underperforms its specs by about 2x.

MTT founded by Zhang Jianzhong the former global vice president of NVIDIA and general manager of Nvidia China, the team is not a bunch of guys in a garage or the likes from Ouya.
Right now, they look like a bunch of absolute clowns, especially when you go back and look at the promises and hype they created leading up to its launch.

No need to be too competitive when the only GPU sold in China will be domestic cards
That doesn't look set to happen any time soon.
 
Are there any synthetic tests that show it nearing even a single one of its theoretical performance limits?
"In some very specific synthetic benchmarks, the MTT S80 GPU does better than an RTX 3060."
again, drivers are the biggest problems.
they look like a bunch of absolute clowns,
this is more an opinion. MTT poached AMD and Nvidia engineers, you don´t look for the worse if you want to attract talent.

No need to be too competitive when the only GPU sold in China will be domestic cards
That doesn't look set to happen any time soon.
looks like it will get worse before (if ever) gets better.
yes, I know she is talking about AI, but GPUs are so entrenched with AI that it will inevitable limit availability in China as already happening.
 
  • Like
Reactions: Order 66
again, drivers are the biggest problems.
How do you actually know that?

this is more an opinion. MTT poached AMD and Nvidia engineers, you don´t look for the worse if you want to attract talent.
Intel poached Raja Koduri from AMD. Doesn't mean anything.

Unless you've worked in these organizations, you don't know which engineers actually know their stuff and which are full of hot air.
 
  • Like
Reactions: Order 66
How do you actually know that?
simply wouldn't improve once produced.
Raja Koduri
as much as we make fun of Raja, he has been with S3, Apple, AMD and Intel; with AMD landed PS5, XBOX winning GPU designs. I don't think too many people were wrong with him.
I understand your point, HW blueprints can be copied (stolen?), read a bunch of whitepapers and so, the success is in the software: looking at CUDA, how the PS3 weird architecture shinned in the end of lifecycle, how AMD ages like fine wine, etc... these fast times are unrelenting and competition is hard.
The hardware race is tight and China eventually will catch on, the software will be the deciding factor in years to come.
 
  • Like
Reactions: Order 66
simply wouldn't improve once produced.
My point wasn't that the drivers weren't bad, just that even perfect drivers won't result in the level of performance suggested by the card's specs.

I've programmed some buggy hardware, in my career. I also know that GPUs have their share of bugs, and sometimes the workarounds for these bugs affect performance. That's my core contention - either though bottlenecks in the design or implementation bugs, I think those GPUs won't ever perform well.

The next generation will be their shot at redemption. I will wait and see.
 
  • Like
Reactions: ivan_vy
Status
Not open for further replies.