AMD Demos The World's First 7nm GPU, Radeon Instinct Vega

Status
Not open for further replies.

Moot point, given that it hasn't got tensor cores. Okay, I'm assuming it doesn't have tensor cores, but I'm pretty sure she'd have mentioned them if it did. So, even with the 35% performance boost, it still can't touch the training performance or efficiency of Nvidia's V100.

BTW, the new instructions will help for inference, but not training.
 
AMD skipping 12nm and going straight to 7nm? That brings a tear to the eye in a very good way!!

Nothing but praise the way Lisa Su is running the company placing technical improvements just as high as profits!
 

A tensor core is only a fancy name for matrix multiply-add. AMD could probably tweak the shader architecture to achieve comparable performance without dedicating a large chunk of die area to fixed-function math, albeit at the expense of power efficiency when running tensor-intensive workloads.
 
AMD have sailed their Chips over the narrow sea. The big war is coming who will prevail. Will it be the Advanced Mother of Dragons, the Nviannister's or the Intelwalker's. Winter is coming The night is dark and full of terrors, old man, but the fire burns them all away." "Look to your sins Lord Renly, the night is dark and full of terrors.
 

Vega already had packed fp16 math, and (as I implied) I've already seen enough of the LLVM patches for the new instructions to know that they won't significantly change its fp16 throughput.

So, the only way it gets more than the stated 35% performance boost @ training is by some fixed-function hardware that wasn't mentioned - a pretty big deal to gloss over, but it's possible they're keeping that bit under wraps. Otherwise, the V100 will still be over 3x as fast.

As for inference, their new 8-bit instructions net them a mere 67 TOPS, compared with V100's 110 TFLOPS. I doubt its efficiency improved enough to sustain 67 TOPS at a mere 150 W, which is nominally what they'd have to achieve to reach parity with V100's efficiency. Plus, lots of fixed function hardware is coming to market that targets inference (or already in use, such as Google's TPUv2).

Interestingly, the new chip has packed 4-bit arithmetic, which we'll probably be hearing about. However, that's so coarse that you probably need to compensate for the quantization noise by adding significantly more nodes in the layers using it.
 

I thought AMD sold off all their fabs? So wouldn't the fact that they're first to 7nm just mean they were willing to pay more to Global Foundries or TSMC or whoever so they could be first in line?
 
Right, so basically VEGA has no need for the special purpose cores as each of the standard cores already supports mixed precision. A benefit of the more general purpose cores AMD designed; less need for fixed function hardware.
 



Hodor, Hodor, Hodor...

 

No... I wouldn't say there's no need.

Nvidia's P100 was first with the type of packed fp16 math that Vega added*. That netted it about 19 TFLOPS. Then came the V100 and showed what a benefit can be derived from a special-purpose engine, delivering 110 TFLOPS from its Tensor cores. That's completely separate from the 27.6 TFLOPS that V100 delivers on the same packed-float instructions that the P100 supported.

Again, (extrapolating from what they've said) Vega's approach is only good for about ~67 fp16 TFLOPS, in the 7 nm chip. That's not exactly matching Nvidia, much less leap-frogging it. Maybe the purchase price on the new chip will be lower by enough to offset the difference in operating efficiency, but it's a little hard for me to see how this will gain AMD very much traction in the deep learning market.

Where we can agree as that the approach taken by Pascal and Vega has substantial potential benefits to gamers and certain other applications, whereas the tensor cores don't help with much beyond their target application of machine learning.

* In fact, Intel's Broadwell-era HD Graphics were actually first with this capability, but they lack enough shaders & memory bandwidth to be very interesting for the sort of training workloads that require fp arithmetic.
 
@HALBE depends on the material and the gate oxide layer. On 7nm this is a problem, 5 NM it is a hard problem. Intel will use a 3d gate and sink solution, AMD i do not know. 3-1 i have no clue i think 2 will be the final stop for shink fore silicone.
 
Quantum tunneling is already a problem but your question is also a bit broad, as this depends on materials and temperature. Current transistors require internal barriers of about a nanometer to keep current leakage from quantum tunneling at reasonable levels, although on the research side of things, a team at Lawrence Berkeley National Laboratory has successfully built a functional 1 nanometer long transistor gate.
 


We have, at the bottom of this article.
https://www.tomshardware.com/news/intel-ceo-amd-server-market,37273.html



 

THG rarely covers AMD and Intel SoC products for the domestic and worldwide markets since most THG readers have no interest in soldered-on-motherboard CPUs aside from x86 laptops.

Dhyana is an SoC server product exclusively for the Chinese market, which means most THG readers won't be able to get one even if they wanted to.
 
Status
Not open for further replies.