[SOLVED] Question about Nvidia Tensor / or AI chips in general

mjbn1977

Distinguished
I have a general question in regards to AI chips. I wasn't sure in what Forum I should post it and decided to put it under Graphics Cards because in my example it touches the new Nvidia Turing chips,

I am reading up on specialized chips for machine learning and AI. Such as the Tensor cores on the RTX Turing cards.

What kind of performance number is usually used for those kind of chips in order to compare the computational power of the different AI chips.

For example, Nvidia claims that the TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.

Let's say I want to compare that to what the Neural Engine of the Apple A12 can offer when it comes to machine learning operations. Apple claims that their 8-core neural engine which is part of the A12 chip can do 5 trillion operations per second, but doesn't specify which kind of operations.

How would be the best way to compare the computational capabilities of those two chips when it come to machine learning? Or with other AI chips....its more a general question with those two chips as an example.
 
Solution
FYI, the ARM cores on the Xavier are also custom. In neural networks it isn't the core which has the real performance so much as it is the GPU. You will want to look at the Volta 512 core GPU used on Xavier, in combination with the CUDA10 specs. Here is where I suggest you start for that:
https://elinux.org/Jetson_AGX_Xavier

Here's the CUDA URL:
https://developer.nvidia.com/cuda-toolkit

More URLs are shown from the eLinux.org link, e.g., TensorRT information.

Second, do you want a low power consumption solution? If this matters I can't imagine anything else being better (other than perhaps older generations if you don't need that much power). So far as SoCs go you won't find anything from anyone with that much GPU...
I'm not all that familiar with the Apple A12. Do you mean this Apple A12?
https://en.wikipedia.org/wiki/Apple_A12

It says it is ARMv8.3-a, which is very current. The equivalent from NVIDIA would be the Xavier. I can't claim to know for sure, but from what I've seen Xavier will run circles around anything else in the same genre. Take for example that the A12 is listed as 4GB of RAM, and the Xavier has 16GB of RAM...all such current chips share RAM between CPU and GPU, and neural networks are a good reason to have more RAM. You'd have to go all the way back to the NVIDIA TX1 to have only 4GB of RAM. There are a lot of models you simply can't run with that little RAM.

The place where the NVIDIA versions really distinguish themselves is usually in the amount of power consumed for a given level of computation. I don't know what the A12 does in that regard, but if power consumption matters to you then you'll probably end up with the NVIDIA version.

It really looks to me like the A12 is intended for smart phones. Xavier is built from the start for autonomous machines running some sort of CUDA-based neural network.
 

2sidedpolygon

Prominent
Jul 1, 2018
775
0
660


What on earth are you talking about? This isn't an answer to their question. You can't just go out and buy the A12. They're asking what benchmarks of AI performance there are. Also, "I"m not aware of the A12, but I know the A12"? What? Is this a joke reply?
 


You didn't read it from me when you say:
"I"m not aware of the A12, but I know the A12"

Who said that? Not me. Please read. All I did was give a view of some things to consider. Perhaps you shouldn't reply if you're going to mis-quote.
 

2sidedpolygon

Prominent
Jul 1, 2018
775
0
660


"I'm not all that familiar with the Apple A12. Do you mean this Apple A12?"
 

mjbn1977

Distinguished
Well, we still far away from the answer. Anyway.....Apple claims that the the 8 core neural engine built into the A12 (next to its 6 core CPU and 4 core GPU) can do 5 Trillion operations per second. So....what I try to figure out is how I can compare that to the, in my example, Tensor core on the Turing chip. Nvidia is throwing so many different numbers around that I don't know which one I should use for comparison.

In the wikipedia article that LinuxDevice linked it says: "5 trillion 8-bit operations per second". Would that compare to Nvidias 228 TOPS of INT8 claim??
 
FYI, the ARM cores on the Xavier are also custom. In neural networks it isn't the core which has the real performance so much as it is the GPU. You will want to look at the Volta 512 core GPU used on Xavier, in combination with the CUDA10 specs. Here is where I suggest you start for that:
https://elinux.org/Jetson_AGX_Xavier

Here's the CUDA URL:
https://developer.nvidia.com/cuda-toolkit

More URLs are shown from the eLinux.org link, e.g., TensorRT information.

Second, do you want a low power consumption solution? If this matters I can't imagine anything else being better (other than perhaps older generations if you don't need that much power). So far as SoCs go you won't find anything from anyone with that much GPU power...but you mentioned 8-bit operations. There are enormous differences in some cases depending on data type/precision. Be sure to know what your data type is before you compare.

I have a generation of Tegra SoC boards sitting next to me, ranging from the old Tegra3 through Xavier. Other people are still catching up to the old generation. From what I see on Wikipedia the Apple chip is not designed for full AI and autonomous machines...I am guessing it is intended for smart phones and perhaps tablets. Variations exist on all of the Tegra SoCs (starting at K1) for formats to use in different environments, e.g., industrial or vehicle.
 
Solution
If you want some details on Xavier (and you can also ask questions here), there is a similar question on benchmarks here:
https://devtalk.nvidia.com/default/topic/1046147/jetson-agx-xavier/instructions-and-models-to-duplicate-jetson-agx-xavier-deep-learning-inference-benchmarks/

You can also ask questions, though you have to register (this does not result in spam).

The forum listing for developers of most of the embedded NVIDIA products is here:
https://devtalk.nvidia.com/default/board/139/embedded-systems/

The developer forum specific to Xavier is here:
https://devtalk.nvidia.com/default/board/326/jetson-agx-xavier/