Nvidia's Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance With INT8, TensorRT Support

Lucian Armasu · Sep 13, 2016

Nvidia announced two new inference-optimized GPUs for deep learning, the Tesla P4 and Tesla P40. The two bring support for lower-precision INT8 operations as well Nvidia's new TensorRT inference engine, which significantly improve the chips' performance

Nvidia's Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance With INT8, TensorRT Support : Read more

bit_user · Sep 13, 2016

For training, it can take billions of TeraFLOPS to achieve an expected result over a matter of days (while using GPUs)

Hmmm... Using round numbers, GPUs top out at about 20 TFLOPS (fp16). So, 2 billion TFLOPS would take a single top-end GPU approximately 100 million seconds, or about 3.2 years. Furthermore, neural nets don't scale very well to multi-GPU configurations, meaning no more than about 10 GPUs (or a DGX-1) would be used, for training a single net. So, I think we can safely say this overshot the mark, a bit.

So, I take it these are basically the GP106 (P4) and GP102 (P40), or tweaked versions thereof? Does the P100 have INT8-support?

hst101rox · Sep 14, 2016

How well would a 1060 or 1080 at this sort of workload? I take it, these don't support the lower precision INT-8 precision but do have support up to the same level precision that the deep learning cards are capable of? Would not be nearly as efficient as the tailored GPUs?

Then we have the CAD GPUs, Quadro. Which of the 3 types of Nvidia GPUs work best with all 3 applications?

bit_user · Sep 14, 2016

hst101rox :

I'm also curious how they compare. I didn't find a good answer, but I did notice one of the most popular pages on the subject has been updated to include recent hardware through the GTX 1060:

http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/

I've taken the liberty of rewriting his performance equation, in a way that's mathematically correct:

GTX_TitanX_Pascal = GTX_1080 / 0.7 = GTX_1070 / 0.55 = GTX_TitanX / 0.5 = GTX_980Ti / 0.5 = GTX_1060 / 0.4 = GTX_980 / 0.35

GTX_1080 = GTX_970 / 0.3 = GTX_Titan / 0.25 = AWS_GPU_g2 (g2.2 and g2.8) / 0.175 = GTX_960 / 0.175

So, in other words, a Pascal GTX Titan X is twice as fast as the original Titan X or GTX 980 Ti, in his test. That said, I don't know that his test can exploit fp16 or int8.

Search

Nvidia's Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance With INT8, TensorRT Support

Lucian Armasu

Contributing Writer

bit_user

Polypheme

hst101rox

Reputable

bit_user

Polypheme

TRENDING THREADS

Latest posts

Moderators online

Share this page