should be "v2 is 60% slower than ... v3".
Google was talking about a board with four v2 TPUs on it, which each crank out 45 TFLOPS. So, the entire board was 50% faster than a single (120 TFLOPS) V100.Google said then its 2nd-generation TPU could achieve 180 teraflops (TFLOPS) of floating-point performance, or six times more than Nvidia’s latest Tesla V100 accelerator for FP16 half-precision computation. The Cloud TPU also had a 50% advantage over Nvidia’s Tensor Core performance.
That also appears to feature boards with four TPUs on them. So, presumably, the single TPU perf jumped to 105 TFLOPS - almost equal with a V100.
I think it should be "... no custom TensorFlow operating inside ..."
- Models with no custom TensorFlowo perating inside the main training loop
- Models that rain for weeks or months
Huh. They even had to say that? You'd think anyone clever enough to port code to a TPU would be clueful enough to foresee such limitations. I guess I can see them getting lots of questions about general-purpose applicability, with such lofty performance numbers.Google has recommended against using TPUs for applications such as Linear algebra programs that require frequent branching and workloads that access memory in a sparse manner or require high-precision arithmetic.