Nvidia's Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance With INT8, TensorRT Support

Status
Not open for further replies.

bit_user

Splendid
Herald
For training, it can take billions of TeraFLOPS to achieve an expected result over a matter of days (while using GPUs)
Hmmm... Using round numbers, GPUs top out at about 20 TFLOPS (fp16). So, 2 billion TFLOPS would take a single top-end GPU approximately 100 million seconds, or about 3.2 years. Furthermore, neural nets don't scale very well to multi-GPU configurations, meaning no more than about 10 GPUs (or a DGX-1) would be used, for training a single net. So, I think we can safely say this overshot the mark, a bit.

So, I take it these are basically the GP106 (P4) and GP102 (P40), or tweaked versions thereof? Does the P100 have INT8-support?
 

hst101rox

Reputable
Aug 28, 2014
436
0
4,810
14
How well would a 1060 or 1080 at this sort of workload? I take it, these don't support the lower precision INT-8 precision but do have support up to the same level precision that the deep learning cards are capable of? Would not be nearly as efficient as the tailored GPUs?

Then we have the CAD GPUs, Quadro. Which of the 3 types of Nvidia GPUs work best with all 3 applications?
 

bit_user

Splendid
Herald
I'm also curious how they compare. I didn't find a good answer, but I did notice one of the most popular pages on the subject has been updated to include recent hardware through the GTX 1060:

http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/

I've taken the liberty of rewriting his performance equation, in a way that's mathematically correct:
GTX_TitanX_Pascal = GTX_1080 / 0.7 = GTX_1070 / 0.55 = GTX_TitanX / 0.5 = GTX_980Ti / 0.5 = GTX_1060 / 0.4 = GTX_980 / 0.35

GTX_1080 = GTX_970 / 0.3 = GTX_Titan / 0.25 = AWS_GPU_g2 (g2.2 and g2.8) / 0.175 = GTX_960 / 0.175
So, in other words, a Pascal GTX Titan X is twice as fast as the original Titan X or GTX 980 Ti, in his test. That said, I don't know that his test can exploit fp16 or int8.
 
Status
Not open for further replies.

ASK THE COMMUNITY