Is AMD Vega's Package Construction A Problem? A Closer Look

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

FormatC

Distinguished
Apr 4, 2011
981
1
18,990
If you have a good and realistic forecast with stable contracts behind and not only the hope, that a component or a production ressource may be available sometime from someone, you can use a strong schedule to keep all data and points in the workflow. If not, you are running into such a chaos. You have every time to act and not to react only like AMD.
 

Braindead154

Reputable
Jul 21, 2016
8
0
4,510
Ultimately, when you use multiple manufacturers, you're going to have slight differences in the output. This isn't specific to GPU's either (i.e. please refer to iPhone 7s release). One of the biggest lessons they teach in industrial/lean engineering is to limit the number of manufacturers, ideally to 1, that you build relationships with. Unfortunately, these manufacturers just aren't capable of the output required to meet world demand. Until the manufacturers improve their capacity and upgrade their factories, downstream production/assembly processes will have to compensate.
 

Rob1C

Distinguished
Jun 2, 2016
92
11
18,635
That 3rd variant's photo (with "Made in Korea" on the bottom) looks a little crooked and like they tried to save a fraction of a penny on the molding compound.

Consistency is necessary, even in mass production.

Waiting until all the problems are sorted into relevant and irrelevant piles is always a good idea, but obviously not an option for Testers/Reviewers.

If I can wait for the Reviews to decelerate that's my preferred route. It's unfortunate how Vega has had more problems than Epyc, Ryzen and Threadripper, it's tarnishing a wonderful comeback.
 

cmi86

Distinguished
So basically this article is a speculative piece with no confirmation from any AIB partner as to any issues current or pending. I can tell you from my experience as a production cnc machinist and tool and die maker that .003 variance between manufacturers on a reference dimension (which I can almost guarantee this is) is not that much at all, about the thickness of a human hair which you know is easily negated by a thermal paste application. I would be strongly willing to bet that if we were able to measure the thickness of individual die packages as they exit the production line that a variance of several thousandths, likely +/- .010 from top to bottom would easily be visible. I appreciate the discovery of the variance but the way this article is presented seems to cry wolf.
 

cmi86

Distinguished
@FORMATC, can you elaborate on that? I didn't really see anywhere in the contents of this reading where any one articular AIB was mentioned. I am not saying it isn't true I am just saying package thickness and heat spreader thickness/flatness are probably quite variable among almost every CPU/GPU every produced. Due to the way these things are assembled I think it would be somewhat difficult to ensure consistency down to <.001 and less, nor do I think with thermal transfer pastes in between 2 surfaces minor inconsistencies would be irrelevant and more expensive to eliminate than it is worth. Just my 2 cents.
 

alextheblue

Distinguished


LOL I was just thinking to myself, if the HBM is sitting lower, stuff a piece of aluminum foil in there. How much heat does the HBM need to dissipate? Can't be much. It's an odd situation but I can't see it causing a ton of hand-wringing among GPU manufacturers.
 

InvalidError

Titan
Moderator

A whole DIMM is about 2W, so I'd guess less than 5W per stack and most of the HBM's temperature simply comes from the substrate itself being at 70+C from the CPU core's heat.
 

bit_user

Polypheme
Ambassador

Partly on the basis of Igor's excellent roundup of GTX 1080's, I got one of these for work, about a month ago. I guess I got lucky, because it's stayed cool and quiet.

Mostly, I've been using it for deep learning, but I did run a few benchmarks on it ...just to make sure everything was working properly. ;)
 

bit_user

Polypheme
Ambassador

Link: http://www.tomshardware.com/reviews/evga-gtx-1080-ftw2-icx-cooler,4925.html
 

bit_user

Polypheme
Ambassador

I wonder why you think it's appropriate to estimate a HBM2 stack on the basis of something < 1/10th as fast that's designed to drive signals hundreds of times farther at much higher frequencies. Seems like apples vs. oranges, to me.

Here's some data for HBM (v1), indicating about 3.65 W/stack (per AMD) or 3.3 W/stack (per Hynix).

https://semiaccurate.com/2015/05/19/amd-finally-talks-hbm-memory/

In that article (circa 2015) they use a figure of 2 W for a single GDDR5 chip. Feel free to post updated figures on HBM2, if you find them.
 

InvalidError

Titan
Moderator

The fundamental technology behind DRAM hasn't changed since the 1980s, which means that regardless of what external interface you slap on top of it, a large chunk of its internal active power usage is still dedicated to sense amplifiers responsible for converting the DRAM cells' analog capacitor voltage to binary and column drivers which are responsible for rewriting DRAM cells when you're done modifying the memory row.

On a per-die basis, GDDR5X is 12GT/s x 32bits = 48GB/s while HBM2 on a per-die basis is 2GT/s x 128bits = 32GB/s, which makes HBM2 33% SLOWER on a per-die basis despite having a 4X wider interface. For a 256bits GDDR5X interface, you get 384GB/s vs 256GB/s for a full HBM2 stack. In both cases, you have eight total memory dies involved.

The main difference between HBM and previous memory technologies is that HBM's closer proximity and lower frequencies enable the bus to omit termination resistors. That's where HBM gets ALL of its power savings. Doubling the frequency of an unterminated near-zero-length bus won't increase its power draw by a huge amount and ~1W on top of HBM2 sounds about right.
 

bit_user

Polypheme
Ambassador

You're comparing 256-bit GDDR5X, where even the 11 GT/s version of GTX 1080 only gets 352 GB/sec (nominal) to a single stack of HBM2? Vega 64 uses two stacks of HBM2 to achieve a nominal bandwidth of 484 GB/sec.
 

alextheblue

Distinguished

That sure is fascinating to think about. However, HBM2 remains faster on a per-package basis. HBM has disadvantages (primarily cost), but bandwidth isn't exactly a weakness.

Edit: Ah that's what I get for never refreshing tabs. Bit already pointed out that a per-die comparison isn't necessarily of great relevance.
 

InvalidError

Titan
Moderator

So what? Still doesn't change the fact that GDDR5X is 50% faster die-for-die while currently costing ~60% less per GB than HBM2.

Arguing about the number of stacks is pointless: we've had GDDR interfaces up to 512bits wide on GPUs costing less than $400 in the past. If Nvidia felt threatened in any way by Vega or dual HBM2 stacks, it could relatively easily put together a 384bits GDDR5X GPU for half of the extra manufacturing cost of going with HBM2. As evidenced by how the 1070 and 1080 manage to outperform the Vega56/64 much of the time though, Pascal doesn't appear to need the extra bandwidth, so Nvidia isn't bothering to widen the bus on upper-end mainstream GPUs.

BTW, the 1080Ti has a 352bits wide bus and at 11GT/s, that's 484GB/s too. If you are about to say that the 1080Ti is a more expensive card, keep in mind that HBM2 currently contributes approximately $170 to Vega's manufacturing cost and may actually make Vega the more expensive GPU to manufacture between the two.
 

bit_user

Polypheme
Ambassador

Except when you start adding more GDDR5X, then you're burning more power.

TBH, I'm not even sure what you're point is. This was about power, which is highly relevant to this article about HBM2 and cooling. Now, it seems you're now attacking the value proposition of HBM2. Whatever reasons AMD had for using HBM2, I'm sure their analysis was at least as thorough as yours.
 

InvalidError

Titan
Moderator

You are the one who brought bandwidth into the argument by saying that HBM2 had more bandwidth than GDDR5X while in fact, GDDR5X is faster on a per-die/interface basis. (An HBM stack has eight DRAM channels so a per-channel comparison is fair. If you are going to argue otherwise, the same die-stacking trick to pack multiple channels in smaller footprints could be applied to GDDR5X too if there was market demand for such a thing.)

Power-wise, Vega may be saving 30-50W by using HBM2 but that matters little when the whole card still winds up using ~100W more to barely keep up with stock GTX1070/1080 in most titles despite the 1070/1080 having a 30% memory bandwidth handicap.

As for the P100/V100, these have FOUR stacks of HBM2 and matching that using GDDR5X would be impractical at best, not to mention that those chips wind up on $8000+ cards where manufacturing costs don't matter anywhere near as much. Two stacks as in Vega's case merely match the older 1080Ti's memory bandwidth with the 1080Ti beating Vega silly in most benchmarks while using 70-100W less power despite its GDDR5X power handicap.

AMD betting on HBM2's lower interface power requirements to offset Vega's still sub-par power efficiency and performance issues definitely looks like a costly mistake at this point. Vega isn't high-end-enough to justify the cost and AMD might be raking in losses on Vega sales at the moment.
 

bit_user

Polypheme
Ambassador

You were the one who compared a stack of HBM2 to a DIMM. I simply asked how this is a reasonable comparison, pointing out that the DIMM would have < 1/10th the bandwidth of HBM2, in addition to other factors, to highlight one of the many variables interfering with your extrapolation.

So, no. I was not talking about bandwidth of HBM2 vs. GDDR5X, and I will not follow you down this rathole, nor be put on the defensive about AMD's choice to use HBM2.

And I still don't care for your guesstimates of HBM2 dissipation. What matters is data.
 

InvalidError

Titan
Moderator

If you want data, go study basic DRAM operating principles and you'll find out for yourself that baseline power draw and operating principles for DRAM of any sort has remained largely unchanged since PC60 SDR. Typical power draw was around 2W per DIMM back then and is still around 2W per DDR4-2400 DIMM today despite a 40X bandwidth increase. After accounting for the 2.5X reduction in operating voltage, there's still a 6X throughput per watt improvement in-between, achieved mainly by reducing IO and trace capacitance from the memory die to the CPU die.

Even if ALL of HBM1's power draw was due to bus IO, doubling bus frequency would still no more than double the power draw to 7W. But you have ~100mW per die going to internal housekeeping and analog processes just like any other DRAM chip does, so that takes almost 1W out of that worst-case figure. In all likelihood, HBM2 also reduced IO and trace capacitance specs somewhat, so there goes another 200-500mW and we're down to my 5W-ish 'guesstimate'.

Just google how to estimate CMOS chips' and busses' power consumption. You'll find a simple equation (P = C * V^2 * F/2) which tells you that all else being equal, doubling the frequency no more than doubles frequency-dependent power.
 

bit_user

Polypheme
Ambassador
I had a thought. Is there any chance the Vega's with taller HBM2 stacks actually have 16 GB? Perhaps there were some partial-defects or production surplus from the 16 GB run that they sold as 8 GB units.
 

FormatC

Distinguished
Apr 4, 2011
981
1
18,990
The memory differs. The 16 GB Vega FE has Samsung, the RX Vega64 8 GB mostly Samsung (if molded) and the Vega56 only the slower SK Hynix (unmolded). This is my latest info from the factories. No packages available atm. Horrible.

Maybe, the RX Vega64 with the molded package uses downgraded modules, but the Vega56 has definitely other memory modules.
 

Gastec

Commendable
Sep 6, 2016
4
0
1,510
As a metrology engineer from a major car manufacturer I tell you that 40 ?m deviation/difference in height is unbelievable for such sensitive computer components. For crying out loud,we sometimes work with lower tolerances than this and I'm talking about metal car parts here. These chips are supposed to be virtually perfect, especially nowadays when they are made with such precise machines!
U N A C C E P T A B L E!
 

InvalidError

Titan
Moderator

The HBM stacks are dissipating 7W at most, more likely closer to 5W. They don't need perfect coupling with the heatsink the way the 300W GPU die next to them does and the HBM stacks sticking out by 10 microns would be far worse than being 40 microns under as it would prevent the heatsink from sitting level on the GPU.

While the dies themselves need to be practically perfect, the same doesn't apply to their macroscopic dimensions. Having somewhere around 40 microns of slack provides room for tolerances between chip (DRAM/CPU/GPU) fabs, between wafer manufacturers, between die stacking processes, micro-BGA ball thickness used in the stacks and under the GPU, tolerances on the embedded silicon interposer, etc.

If you work in the automotive field, then you should be familiar with "non-critical dimensions" where tolerances can exceed 10%, such as the thickness of washers for the bolts that fasten seats to the frame where the range is anything between the minimum thickness to handle crash forces to whatever thickness the spare threads on the bolts can afford which can be 2-3X that minimum thickness. HBM stack height is non-critical, doesn't really matter what it is as long as it doesn't stick out above the GPU.

Also, with each HBM die being less than 1/5th the thickness of the GPU for a quad-die stack and ~1/10th for an octo-die one, the top die is that much more susceptible to mechanical damage. I'd say that's one more reason to make sure it sits a safe distance below the GPU height.
 
Status
Not open for further replies.