News AMD Announces the Instinct MI100 GPU, CDNA Breaks 10 TFLOPS Barrier

thGe17

Reputable
Sep 2, 2019
70
23
4,535
First of all: the A100 values are a little bit mixed up.
With Tensor Cores:
bfloat16 or FP16 = 312 TFlops (with sparsity up to 624 TFlops)
TF32 = 156 TFlops (with sparsity up to 312 TFlops)
("FP32-like/precision equivalent" matrix ops for training)
INT8 = 624 TOPS (with sparsity up to 1248 TOPS)

Additionally the regular base FP64 performance is 9.7 TFlops, but additionally Ampere can calculate FP64 MMA-ops via Tensor Cores in full precision and they have extended their CUDA-X libs for easy handling, therefore the resulting FP64 for a lot of (or even most?) workloads should be much higher than 9.7 TFlops FP64.

In the end it seems that MI100 is no match for Ampere, especially not with regards to AI workloads.
 
First of all: the A100 values are a little bit mixed up.
With Tensor Cores:
bfloat16 or FP16 = 312 TFlops (with sparsity up to 624 TFlops)
TF32 = 156 TFlops (with sparsity up to 312 TFlops)
("FP32-like/precision equivalent" matrix ops for training)
INT8 = 624 TOPS (with sparsity up to 1248 TOPS)

Additionally the regular base FP64 performance is 9.7 TFlops, but additionally Ampere can calculate FP64 MMA-ops via Tensor Cores in full precision and they have extended their CUDA-X libs for easy handling, therefore the resulting FP64 for a lot of (or even most?) workloads should be much higher than 9.7 TFlops FP64.

In the end it seems that MI100 is no match for Ampere, especially not with regards to AI workloads.
I guess, but its still based on the usage of tensor in addition to the raw FP performance. If you're after raw FP performance, the mi100 seems to have higher numbers on paper. The amount of FP workloads that can be increased by specific use of CUDA and Nvidia's various ML libraries does seem to be quite large, but that's still something up to the individual company/researcher to determine if its useful so it's still useful to compare the "base" FP64/32 results.

Seems like AMD's making strides, but Nvidia will still have the upper hand with their years of investment into CUDA and various ML learning workloads. It's gonna take AMD more that just having decent hardware to make up the difference.
 
  • Like
Reactions: thGe17

thGe17

Reputable
Sep 2, 2019
70
23
4,535
Yes, so it seems. Additonally nVidia announced its upgraded A100 with 80 GiB HBM2E with 2 TB/s bandwidth and additionally Infiniband 400G. Looks as if they have only waited for AMD to make the first move. ;-)

So hopefully AMD will have more luck with its new cards than with the last Gen, which looked also promising on paper.