jimmysmitty :
Whats interesting is this got me thinking of Intels TeraScale CPU. Could run at 3.13GHz while only using 24W. They had it aat 6.26GHz pushing 2TFLOPS of performance on 80 cores and using the same as a Core 2 Quad. Wonder why that didn't pan out.
According to this, it actually burned 62 W @ 1 TFLOPS and maxed out at 265 W, for only 1.81 TFLOPS.
https://en.wikipedia.org/wiki/Teraflops_Research_Chip#Statistics_[21]
Remember the Cell Processor, of PS3 fame? That originally ran at 3.2 GHz on 90 nm node (same as later Pentium 4's), delivering 230 GFLOPS in 2006. According to wikipedia, that first generation utilized 170 - 200 W (total system power).
https://en.wikipedia.org/wiki/PlayStation_3_technical_specifications#Form_and_power_consumption
Subtracting what the GPU, optical drive, HDD, bluetooth, HDMI, fan, etc. burned, you might estimate the CPU only accounted for 100 W of that.
Compare that to Coffee Lake i7-8700K, which can only deliver only somewhere in the ballpark of 200 GFLOPS from its AVX units, at a roughly similar TDP, but at 14 nm and more than a decade later. And costing as much as an entire PS3 (super-slim).
Why? Probably owing largely to in-order SPE cores and DMA-driven (rather than cache-mediated) memory access. Not unrelated to that, the Cell was notoriously hard to program. Assuming the Teraflops research chip is similar (except uses a mesh instead of ring bus), I think you have your answer.
As the world discovered with GPUs, you can reach high efficiency when your cores are simple and in-order, and your cache hierarchy is relatively flat and small (or non-existent). This is good for things like large matrix multiplies and deep learning, but not so good for general-purpose computation.