I may or may not be the only deep learning engineer here. The thing is, Pascal, from the get go has been made with Deep Learning in mind. In layman's term, its a breakthrough in the field of Artificial intelligence 5/7 years back. Scientists, in order to get faster computing power, started using the GPUs. See how facebook uses 8 Tesla m40 in their Big sur, http://arstechnica.com/information-technology/2015/12/facebooks-open-sourcing-of-ai-hardware-is-the-start-of-the-deep-learning-revolution/
Baidu, a chinese company, is also heavily invested in gpu cluster based supercomputing. ake a read here..http://www.nextplatform.com/2015/12/11/inside-the-gpu-clusters-that-power-baidus-neural-networks/ Titan x is the cheapest card for this and instead of tesla m40s, they have taken Titan x as their choice. Each server has 8 gpu in them. These machines run for weeks just to solve one problem. That time can be reduced to days, even hours if we can have better interconnectivity between the cards.
But main issue lies in the interconnectivity between multiple cpu nodes with 8 gpu each. For a single pc with 4-8 gpus, this isn't a issue, but when adding up more than one pc, Bottlenecked by bandwidth issue kicks in. This has been worked around using products from mellanox. But having that nvlink between cards will let us bypass all that. It is truly remarkable. T