News Nvidia Unveils Its Next-Generation 7nm Ampere A100 GPU for Data Centers, and It's Absolutely Massive

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
It will be interesting to see what happens with GA102. It's not really possible to determine what GA102 will have just by looking at GA100. And looking back at Pascal can lead to some poor assumptions.

The GP100 was the data center / deep learning version, and it had up to 60 SMs total with 64 CUDA cores per SM (plus 32 FP64 CUDA cores). That's 3840 total FP32 CUDA cores. GP102 by comparison had up to 30 SMs with 128 FP32 CUDA cores per SM. It had the same total number of FP32 cores (3840), but got there in a very different fashion. So you might think GA102 will do the same thing relative to GA100 -- and it might -- but they're going to be very different beasts.

GA102 will of necessity have RTX cores -- there's no way Nvidia can walk back ray tracing support. In fact, the RT cores are supposedly enhanced and could be up to 4X faster than the Turing RT cores. Which means probably they're going to use a lot more die space. I'd assume the enhanced Ampere Tensor cores will also make an appearance, though that might not be the case. If they're the same ratio as in GA100, though, the GA102 SMs will ditch FP64 and add RT, which might not be much of a difference in size when all is said and done. And that's the problem.

There's no way Nvidia is going to make an 800mm square 'consumer' GPU on 7nm right now. It's just way too expensive. GA100 is going to have a price of roughly $18,750 per GA100 card based on the $199,000 asking price of the DGX A100. ($50K for all the server stuff, $150K for the eight GA100 parts). GP100 was a 610mm square part with a very high price, but GP102 was only 471mm square -- and still went into $1000+ graphics cards.

So, given the massive size of GA100, I suspect Nvidia is going to be aiming closer to 500mm square or less on GA102. And to get there, it will have to trim down the SM and core counts. Where GA100 has up to 128 SMs, I think 80 SMs in GA102 seems far more plausible, giving a maximum of 5120 cores. Clocks could be higher, though, and power will of necessity be less than 300W (and probably 250W for the RTX 3080 Ti). That's my bet, anyway.
Yes you are correct but can be argued on few things. Like the die size for starters. I mean yes NVIDIA may take completely different approach for GA102 but there is also a possibility of it using 700+mm square Chip as they did with Turing. Yes Turing TU102 was 750mm.sq while Volta GV100 was 800mm.sq.

And when it comes to price that is the markup they are charging for. I mean Initially Quadro RTX8000 was priced $10,000 while same Chip with lower Memory pairing was offered as Quadro RTX6000 for $6,000.

It may try limiting die size to 500mm.sq but there is nothing stopping it from offering 700mm.sq chip and that will not be as expensive as you think. Only limitation being the initial quantity available.
 
Yes you are correct but can be argued on few things. Like the die size for starters. I mean yes NVIDIA may take completely different approach for GA102 but there is also a possibility of it using 700+mm square Chip as they did with Turing. Yes Turing TU102 was 750mm.sq while Volta GV100 was 800mm.sq.

And when it comes to price that is the markup they are charging for. I mean Initially Quadro RTX8000 was priced $10,000 while same Chip with lower Memory pairing was offered as Quadro RTX6000 for $6,000.

It may try limiting die size to 500mm.sq but there is nothing stopping it from offering 700mm.sq chip and that will not be as expensive as you think. Only limitation being the initial quantity available.
The thing is that TU102 was made on a mature 12nm process. 7nm may be more mature now than it was last year, but yields are clearly not as high as 12nm. Which is why the GA100 currently will ship with 108 out of 128 SMs enabled, and one of the six HBM2 stacks disabled. That's 15.6% of the chip that's disabled, and 16.7% of the expensive HBM2!

By comparison, GV100 had 80 of 84 SMs enabled chip on the top Tesla models -- only 4.8% disabled -- and all four HBM2 stacks were in use. (Models with fewer SMs and only three HBM2 stacks did exist, however.) And GP100 had models with 12.5% of SMs disabled. TU102 meanwhile shipped with just 5.5% of SMs disabled in the 2080 Ti, and the Titan RTX and Quadro RTX 6000/8000 use the fully enabled chip.

So: 1st gen 16nm had 12.5% SMs disabled on a big chip, 2nd gen (12nm was basically just refined 16nm) had 5.5% SMs disabled, and now we're back to 1st gen on 7nm and 15.6% of the SMs disabled. Lower yields means more SMs disabled and more chips wasted, which means higher costs and larger chips are more difficult to manufacture.

This is why I think it's extremely unlikely for Nvidia to do a super large GA102 chip. They'll save that for post-Ampere and 2nd gen 7nm (or maybe just move to 1st gen 5nm, which may not be as big of a jump as the number change would suggest, sort of like 16nm -> 12nm).

But we shall see. Nothing is certain right now, but there's lots of opinions and speculation. If GA102 ends up as a 500-600mm square GPU, though, I expect pricing for the RTX 3080 Ti to be astronomical -- like $2000. Or just use GA102 for Quadro RTX with GA103 being a smaller chip with 80 SMs total for 3080 Ti, a trimmed down implementation with maybe 64-68 SMs for RTX 3080, GA104 would go into RTX 3070, etc.
 
  • Like
Reactions: bit_user
The thing is that TU102 was made on a mature 12nm process. 7nm may be more mature now than it was last year, but yields are clearly not as high as 12nm. Which is why the GA100 currently will ship with 108 out of 128 SMs enabled, and one of the six HBM2 stacks disabled. That's 15.6% of the chip that's disabled, and 16.7% of the expensive HBM2!

By comparison, GV100 had 80 of 84 SMs enabled chip on the top Tesla models -- only 4.8% disabled -- and all four HBM2 stacks were in use. (Models with fewer SMs and only three HBM2 stacks did exist, however.) And GP100 had models with 12.5% of SMs disabled. TU102 meanwhile shipped with just 5.5% of SMs disabled in the 2080 Ti, and the Titan RTX and Quadro RTX 6000/8000 use the fully enabled chip.

So: 1st gen 16nm had 12.5% SMs disabled on a big chip, 2nd gen (12nm was basically just refined 16nm) had 5.5% SMs disabled, and now we're back to 1st gen on 7nm and 15.6% of the SMs disabled. Lower yields means more SMs disabled and more chips wasted, which means higher costs and larger chips are more difficult to manufacture.

This is why I think it's extremely unlikely for Nvidia to do a super large GA102 chip. They'll save that for post-Ampere and 2nd gen 7nm (or maybe just move to 1st gen 5nm, which may not be as big of a jump as the number change would suggest, sort of like 16nm -> 12nm).

But we shall see. Nothing is certain right now, but there's lots of opinions and speculation. If GA102 ends up as a 500-600mm square GPU, though, I expect pricing for the RTX 3080 Ti to be astronomical -- like $2000. Or just use GA102 for Quadro RTX with GA103 being a smaller chip with 80 SMs total for 3080 Ti, a trimmed down implementation with maybe 64-68 SMs for RTX 3080, GA104 would go into RTX 3070, etc.
You are right. But the reason was obvious for A100 to have that cut-off version of GA100 Chip and disabled HBM2. But point to be considered here is that unlike GA100, GA102 will be using TSMC 7nm EUV chips which will enable it to push out large volumes and have better yield and binning ratios to satisfy Quadro/TITAN requirements as well as push out huge numbers of RTX3080Ti GPUs.

I think actually to a point it is obvious that NVIDIA will launch full-fledged variant of A100 on a later date probably after it completely shifts to production on 7nm EUV.

This entire process will cut-down production cost tremendously.
 
You are right. But the reason was obvious for A100 to have that cut-off version of GA100 Chip and disabled HBM2. But point to be considered here is that unlike GA100, GA102 will be using TSMC 7nm EUV chips which will enable it to push out large volumes and have better yield and binning ratios to satisfy Quadro/TITAN requirements as well as push out huge numbers of RTX3080Ti GPUs.

I think actually to a point it is obvious that NVIDIA will launch full-fledged variant of A100 on a later date probably after it completely shifts to production on 7nm EUV.

This entire process will cut-down production cost tremendously.
Do we know Nvidia is using 7nm EUV on GA102, though? I thought that was just speculation as well. N7P is not the same as N7+ for sure. But I though there was also a renaming of N7+ to N6 or something? I'm having trouble digging up exact sources right now. Basically, TSMC has N7 and N7P that are both DUV, then N7+ that has EUV, and also N6 of some form that's supposed to be design compatible with N7 or N7P is my recollection.

There's one of the 'smaller' nodes where manufacturers should be able to just drop in an existing design and it will work. Moving from N7P to N7+ on the other hand requires reworking some elements because of the EUV layers. So a full A100 made on N7+ wouldn't actually be A100 anymore, it would be a new tweaked / adjusted design. I'm not sure if that really matters, but I think so.

Probably there will be an update in a year or so where Nvidia moves to the smaller N6 / N6P or whatever it's called without reworking the logic, to improve yields and clocks but not shrink size. Like AMD did with Polaris 30 on 12nm.
 
  • Like
Reactions: bit_user
Do we know Nvidia is using 7nm EUV on GA102, though? I thought that was just speculation as well. N7P is not the same as N7+ for sure. But I though there was also a renaming of N7+ to N6 or something? I'm having trouble digging up exact sources right now. Basically, TSMC has N7 and N7P that are both DUV, then N7+ that has EUV, and also N6 of some form that's supposed to be design compatible with N7 or N7P is my recollection.

There's one of the 'smaller' nodes where manufacturers should be able to just drop in an existing design and it will work. Moving from N7P to N7+ on the other hand requires reworking some elements because of the EUV layers. So a full A100 made on N7+ wouldn't actually be A100 anymore, it would be a new tweaked / adjusted design. I'm not sure if that really matters, but I think so.

Probably there will be an update in a year or so where Nvidia moves to the smaller N6 / N6P or whatever it's called without reworking the logic, to improve yields and clocks but not shrink size. Like AMD did with Polaris 30 on 12nm.
Yes you are correct there is core design change when going form N7P to N7+. But Besides the design changes core performance it offers is same. Initially N7+ had consistency issues which are now being resolved to some extent. N6 is next gen EUV which is yet to be bought into actual production. It is expected to be implemented into production this year lets see how much improvement it is over N7+.

But yeah it is pretty much obvious that NVIDIA will use N7+ or N6 for lower end chips as it needs a huge yield to meet the market requirements. Otherwise that will be a big hit on profits for NVIDIA.
 

bit_user

Polypheme
Ambassador
I know right ? 400W each and you probably can ride 16 with EPIC. 6000W of juice. per 2U or 3U That rack will start glowing.
I'm pretty sure it's limited to 8 GPUs per system - constrained by NVLink, if nothing else.

Also, take a good look at the box - it looks to me somewhere in the ballpark of 6U.

Still, it's probably a good 4 kW, which is quite substantial. I'm just not sure how it compares with blade servers.
 
Last edited:

bit_user

Polypheme
Ambassador
I think actually to a point it is obvious that NVIDIA will launch full-fledged variant of A100 on a later date probably after it completely shifts to production on 7nm EUV.
Is there any recent historical precedent of them respinning a chip on a different process?

Even though 7 nm and 7 nm EUV sound similar, you can't just move a design from one to the other by flipping a switch. There's non-trivial costs and efforts involved in this, which you might as well spend on targeting an updated design to the new manufacturing node, rather than porting an existing one.
 
  • Like
Reactions: JarredWaltonGPU
Where I work just replaced an HP blade center that had 1st Gen E5s. The blade center had 6x 2000W PSUs in a 2x2x2 configuration. The new system is a 2U4N with dual Epyc 7502s & 1TB RAM per node. PSUs are just dual 2200W. Difference in efficiency and density is crazy.
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
If they're the same ratio as in GA100, though, the GA102 SMs will ditch FP64 and add RT, which might not be much of a difference in size when all is said and done.
First of all, as with all of their recent consumer GPUs, TU102 has fp64:fp32 ratio of just 1:32. So, they're not going to save much space by completely nixing it.

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series

Second, they won't do this because graphics APIs (and therefore a good amount of existing software) require fp64. Although it can be emulated in software, that might suddenly bottleneck even some games that could use fp64 for a few select computations.
 
Is there any recent historical precedent of them respinning a chip on a different process?

Even though 7 nm and 7 nm EUV sound similar, you can't just move a design from one to the other by flipping a switch. There's non-trivial costs and efforts involved in this, which you might as well spend on targeting an updated design to the new manufacturing node, rather than porting an existing one.
There will be difference in how it is designed that to not major changes only minor design changes. But the core performance will remain same. It is not like a completely different node. It is just different approach to how the final result is achieved. Yes there will be few differences according to which few minor tweaks have to be made in design but nothing major that will be affecting performance in any way. This will simply enable NVIDIA to have far greater yields.
 
First of all, as with all of their recent consumer GPUs, TU102 has fp64:fp32 ratio of just 1:32. So, they're not going to save much space by completely nixing it.

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series

Second, they won't do this because graphics APIs (and therefore a good amount of existing software) require fp64. Although it can be emulated in software, that might suddenly bottleneck even some games that could use fp64 for a few select computations.
I don't mean ditch completely, but GA100 has FP64 support on Tensor cores plus 32 FP64 CUDA cores per SM. That will likely not be part of GA102 and lower spec Ampere. Instead it will be no FP64 on Tensor and 1 FP64 per SM, just like on Pascal, Turing, Maxwell, etc. consumer GPUs. So yeah, I think they could save quite a bit of space by axing FP64 on consumer cards, but then adding in RT cores offsets that at least somewhat. Nvidia hasn't specifically said that "FP64 adds 20% more transistors per SM" or anything like that, nor has it said how much space and RT core requires, but we know in both cases that the feature isn't 'free.'