News China's secretive Sunway Pro CPU quadruples performance over its predecessor, allowing the supercomputer supercomputer to hit exaflop speeds

Status
Not open for further replies.
Each Sunway SW26010 Pro has a maximum FP64 throughput of 13.8 TFLOPS, which is massive. For comparison, AMD's 96-core EPYC 9654 has a peak FP64 performance of around 5.4 TFLOPS.
It's a false comparison, though. The SW26010 Pro is a hybrid CPU/GPU. As a hybrid, it packs more raw compute than a CPU, but can't handle general-purpose computation as well. Nor does it have as much raw compute as GPUs, like AMD's MI250X (which packs 28 to 48 fp64 TFLOPS). As such, the best point of comparison is probably with something like Fujitsu's A64FX

Another way to think of it is sort of like 6 Cell processors on a chip. Like the Cell, the bulk of its compute lies in the 2-way in-order cores that operate via scratchpad memory. Programming these is probably a lot more like programming a GPU than a CPU. In fact, I wouldn't be surprised if they used OpenCL to utilize them in the exact same way.

a dual-channel DDR4-3200 (51.2 GB/s) memory subsystem is barely enough for 64 cores, each featuring a 512-bit vector FPU and capable of up to 16 FP64 FLOPS/cycle.
Big mistake not to use GDDR memory for this.
 
Status
Not open for further replies.