I have a general question in regards to AI chips. I wasn't sure in what Forum I should post it and decided to put it under Graphics Cards because in my example it touches the new Nvidia Turing chips,
I am reading up on specialized chips for machine learning and AI. Such as the Tensor cores on the RTX Turing cards.
What kind of performance number is usually used for those kind of chips in order to compare the computational power of the different AI chips.
For example, Nvidia claims that the TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.
Let's say I want to compare that to what the Neural Engine of the Apple A12 can offer when it comes to machine learning operations. Apple claims that their 8-core neural engine which is part of the A12 chip can do 5 trillion operations per second, but doesn't specify which kind of operations.
How would be the best way to compare the computational capabilities of those two chips when it come to machine learning? Or with other AI chips....its more a general question with those two chips as an example.
I am reading up on specialized chips for machine learning and AI. Such as the Tensor cores on the RTX Turing cards.
What kind of performance number is usually used for those kind of chips in order to compare the computational power of the different AI chips.
For example, Nvidia claims that the TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.
Let's say I want to compare that to what the Neural Engine of the Apple A12 can offer when it comes to machine learning operations. Apple claims that their 8-core neural engine which is part of the A12 chip can do 5 trillion operations per second, but doesn't specify which kind of operations.
How would be the best way to compare the computational capabilities of those two chips when it come to machine learning? Or with other AI chips....its more a general question with those two chips as an example.