Anton, are you for real?
I have a ton of respect for Anton, but every once in a while, he says something that it seems like he should really know better than to believe. All of TSMC's estimates are based on a mountain of assumptions and organized around a theoretical (or actual?) reference CPU core. This dictates the mix of different cells and doesn't permit designers to adapt the microarchitecture to the process node, but instead looks at only the impact of directly porting the core between nodes. I've previously seen articles mention specifically which CPU core ARM uses for these estimates, but I'm having trouble digging it up.
Essentially, when they quote a performance number, it's just looking at how much higher you could clock the same design at the same power, when doing a direct port. But, the thing is that people rarely do direct ports from one node to the next, especially if they're in different families.
So, the first point of divergence between TSMC & Nvidia's numbers is that GPUs are different in their mix of cells than CPUs. Secondly, if you use the new node's additional density and timing budget to increase IPC, then you can definitely beat their performance estimates. As I said, the way they estimate performance gains is essentially just by looking at how much you could increase the clock speed. Yet, in most cases, you can do better with a mix of IPC and clockspeed improvements. This is especially true of something like a GPU or NPU, where a feature like "tensor cores" really fall into the category of an IPC increase.