Article Google Now Offering Pods With 1,000 Cloud TPUs to the Public


the Cloud TPU v2 is 60% slower than the Cloud TPU v2
should be "v2 is 60% slower than ... v3".

Google said then its 2nd-generation TPU could achieve 180 teraflops (TFLOPS) of floating-point performance, or six times more than Nvidia’s latest Tesla V100 accelerator for FP16 half-precision computation. The Cloud TPU also had a 50% advantage over Nvidia’s Tensor Core performance.
Google was talking about a board with four v2 TPUs on it, which each crank out 45 TFLOPS. So, the entire board was 50% faster than a single (120 TFLOPS) V100.

A year later, in 2018, the company announced version 3 of its TPU with a performance rated at 420 TFLOPS.
That also appears to feature boards with four TPUs on them. So, presumably, the single TPU perf jumped to 105 TFLOPS - almost equal with a V100.

  • Models with no custom TensorFlowo perating inside the main training loop
  • Models that rain for weeks or months
I think it should be "... no custom TensorFlow operating inside ..."
and "Models that train for weeks or months".

Google has recommended against using TPUs for applications such as Linear algebra programs that require frequent branching and workloads that access memory in a sparse manner or require high-precision arithmetic.
Huh. They even had to say that? You'd think anyone clever enough to port code to a TPU would be clueful enough to foresee such limitations. I guess I can see them getting lots of questions about general-purpose applicability, with such lofty performance numbers.