"In the Nvidia-provided chart below, the Tesla P40 even seems to be twice as fast as Google’s TPU for inferencing" - isn't it about inferences/sec WITH <10 ms latency only? Because the total inferencing performance of Google's TPU seems to be twice as fast as Nvidia’s Tesla P40 - 90 INT8 vs 48 INT8.