InvalidError :
bit_user :
Moot point, given that it hasn't got tensor cores. Okay, I'm assuming it doesn't have tensor cores, but I'm pretty sure she'd have mentioned them if it did.
A tensor core is only a fancy name for matrix multiply-add. AMD could probably tweak the shader architecture to achieve comparable performance without dedicating a large chunk of die area to fixed-function math, albeit at the expense of power efficiency when running tensor-intensive workloads.
Vega already had packed fp16 math, and (as I implied) I've already seen enough of the LLVM patches for the new instructions to know that they won't significantly change its fp16 throughput.
So, the only way it gets more than the stated 35% performance boost @ training is by some fixed-function hardware that wasn't mentioned - a pretty big deal to gloss over, but it's possible they're keeping that bit under wraps. Otherwise, the V100 will still be over 3x as fast.
As for inference, their new 8-bit instructions net them a mere 67 TOPS, compared with V100's 110 TFLOPS. I doubt its efficiency improved enough to sustain 67 TOPS at a mere 150 W, which is nominally what they'd have to achieve to reach parity with V100's efficiency. Plus, lots of fixed function hardware is coming to market that targets inference (or already in use, such as Google's TPUv2).
Interestingly, the new chip has packed 4-bit arithmetic, which we'll probably be hearing about. However, that's so coarse that you probably need to compensate for the quantization noise by adding significantly more nodes in the layers using it.