This article is frustratingly thin on details, especially when tempting us with this morsel:
The Article said:
Interestingly, the 910C may be able to outperform Nvidia’s upcoming Blackwell-based B20 according to a prediction made by SemiAnalysis’s Dylan Patel.
I took a quick look at the home page, but didn't see any recent article specifically concerning Huawei's accelerators. If anyone has more information about them, please share.
I did find this, but note that the 910C specs are just guesses:
Each time that the United States has figured out that it needed to do export controls on massively parallel compute engines to try to discourage China
www.nextplatform.com
I'm pretty sure the reason their fp32 TFLOPS overshadows Nvidia's by so much is that they're probably Tensor TFLOPS, whereas Nvidia only supports vector operations on fp32. The closest corresponding spec for Nvidia's H100 would be 989 TF32 TFLOPS (495 without sparsity). TF32 is a specialized number format with the range of fp32, but only the precision of fp16. So, it's not
quite equivalent, but should get us into the ballpark, when trying to interpret the above data.