What is the significance and scale of "time to first token"?
Is it something like a multi-second workload takes 1us to get started instead of 12uS? As in a workload that takes 2,000,012 uS would now take 2,000,001 uS? Because that kind of "12x" improvement would be insignificant, uninteresting, and within the margin of error.
That's what it sounds like, or does it mean something else? It definitely doesn't mean the entire job gets done 12x faster, because AMD is only claiming ~2.1x performance in the overall throughput.
But they got their "12x more betterer" Headlines I guess, which is all they wanted.
Plus its still annoying to see companies' marketing department say things like "12 times faster" to mean "the task gets done in one twelfth the time", because that's a pretty backwards way of using multiplication. Granted, people working in marketing probably flunked math.