AMD's HBM Promises Performance Unstifled By Power Constraints

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

xenol

Distinguished
Jun 18, 2008
216
0
18,680
how about replace system ram with hbm as well not just on graphic cards? wouldn't that make overall system much, much faster and less power hungry?

just imagine the system spec:

ram: 8 gb hbm
gfx: 8 gb hbm

:p
Then we would have people up in arms that they can no longer upgrade their RAM, since the very nature of HBM requires the RAM to basically be soldered in. It works great for GPUs because there hasn't been an ecosystem to upgrade VRAM in over 20 years or so.

We could go back to the days of Fast/Slow RAM, but I think that'll just cause developers headaches.
 

InvalidError

Titan
Moderator

Developers do not necessarily need to know about it since the OS can take care of most of it: put frequently accessed pages in HBM space (count memory page hits - 8GB of HBM should be more than enough to host the bulk of the active data set), use external RAM for less frequently accessed data like the disk cache and background processes, dump the rest to SSD/HDD as usual. It merely adds a new tier of paging between HBM and swapfile.

Most applications are not aware of their memory pages' physical location, all that they care about is that the OS makes them available on access.
 
We could go back to the days of Fast/Slow RAM, but I think that'll just cause developers headaches.
They could just do that, since at first it will be limited to 4GB. Which afaik will be to low for a high end card.
So maybe we will see 4 GB HBM and 4-8 GB GDDR5.
Also the production of the cards will be more complex: For now manufacturers just "buy" the chip and put the rest together by themself. This would be changed: Either the GPU including the HBM Memory will be shipped directly by AMD, the complete package will be put together by the card manufacturers, AMD will produce the cards by themself (member when 3dfx tried that), or they need to invent a procedure that allows assembling GPU and HBM memory together afterwards-
 

codePanda

Reputable
May 20, 2015
1
0
4,510
If AMD's launch of new GPUs fail to be a success, Apple will buy AMD to manufacture APUs with hbm for their new line of Macs instead of the A-chip rumor? :) Maybe the Radeon in iMac Retina 5k is a start of something?
 

InvalidError

Titan
Moderator

4GB will be enough for high-end cards in most cases since most of the extra space is simply used to host duplicate copies of resources across multiple channels to increase bandwidth. With HBM providing nearly 3X as much bandwidth, this trick becomes that much less important.

As for GPU card becoming "more complex," my bet is AMD will be supplying fully assembled GPUs, HBM included. This means AiB partners have nothing left to do other than provide the VRM, a heatsink and a PCB in the colors and shapes of their choosing. All the other performance-critical parts are on the silicon interposer under AMD's control. The GPU chip assembly will be more complex but the board-level design should become trivial four layer jobs since there will be no more 600+ signals memory bus to fan out around the GPU BGA, only power, 32 PCIe signal pairs and the display outputs.

I wonder how much longer it will be until AMD starts working on their own flavor of integrated voltage regulation. In order to improve their CPUs and GPUs' power efficiency through faster power state transitions, they will have to do it eventually. In a multi-chip package, they can afford designing a voltage regulator on a coarse process that can take straight 12V input and skip the intermediate voltage used by Intel. Imagine how empty GPU and motherboard PCBs would look.
 
sad thing is until AMD has a working ''retail'' card at a store to buy all this is still hearsay/ rumor / hype - I see they now changed the so called release date again .

until these card is in hand to see it to believe it -- all this been going on for like 6-8 months now if not longer and I still would not be surprised if we don't see a card until thanksgiving just in time for black Friday sales -- news flash been postponed until 2ed 1/2 2016 [next year]

just to add I can understand the 390x with the new memory delayed because its all new and things happen but I don't get why the refresh 200 type cards are delayed

they say it due to the old stock of cards needs to sell but like NVidia just salvaged there 700 cards over to make there 970's and got them released fast and eliminated that issue for them ?
 

rav_

Distinguished
Jul 24, 2011
38
1
18,530
@notsleep

That is what AMD is doing!!! A Zen APU with 3 levels of cache using the Interposer as the system memory bus integrating 8-16 gigs ON-DIE as System RAM.

Sell you memory stock!!!
 

deuce_23

Distinguished
Nov 18, 2009
63
0
18,630
Why didn't radeon time the release with witcher 3???
Witcher 3 has 1 million pre orders and a lot of people upgraded just for witcher 3.
I have always used radeon but i feel like they have lost there way. I was really pissed that that they did not support HDMI 2.0 last time around. It seem they are always playing catch up.

Why be so secret about this new card? Throw some benchmarks out there and people will wait to buy this.

oh well lets see
 
Why be so secret about this new card? Throw some benchmarks out there and people will wait to buy this.

I guess you got to have a real live retail card to do all that ,right cant bench rumored /next year pipe dreams -- AMD lately got a lot of good talk but cant seem to put there money where there mouth is
 

xenol

Distinguished
Jun 18, 2008
216
0
18,680

I think it does matter, considering developers didn't like the XDR/GDDR3 split in the PS3 (the GPU could use both) and they're not liking the GDDR3/eDRAM split in the XB1.

But even then it also just adds another layer of complexity for the entire system, something I don't think we need regardless. Besides that, I don't think the CPU is close to hitting the memory wall (considering faster RAM doesn't improve the performance of most operations) so it's probably something we'll worry about when we move from silicon.
 

InvalidError

Titan
Moderator

Wrong way to think about it.

Think of it more like Intel's Crystalwell: a sort of L4 cache. The CPU and IGP both access the same total memory pool and the OS can re-allocate memory pages between RAM and eDRAM. Applications do not need to know about it. While RAM might not be a bottleneck for the CPU, it is definitely a major bottleneck for IGPs. HBM effectively eliminates the main bottleneck to IGP performance. When AMD decides to put HBM on 14nm APUs, we might see high-end APUs with IGPs on par with a R9-290X.
 

Not sure I'd go that far. Memory bandwidth is a big deal, but raw shader count is at least equally important. Getting that many shaders on the die in addition to the normal CPU cores will take a lot of space. We'd likely need to see a big paradigm shift to fully embrace HSA and other shared resource processing to reach that level.

APUs on par with with a 260X? That should be doable pretty soon.
 

InvalidError

Titan
Moderator

From 28nm to 14nm, they can afford putting about four times as many resources in the iGPU while preserving the same CPU-GPU die area ratio. Unless they also quadruple resources in the CPU area, which I seriously doubt they will more than double if even that, they will have even more die area to sink into the iGPU.

HBM IGPs at R7-260X levels? Maybe for the A6. An IGP six times as powerful as the A10-7850K would land in R9-280X territory.
 

gwiddle

Reputable
May 3, 2015
40
0
4,560
This seems great, now I am not sure how AMD will be able to keep this technology off Nvidia and make them make their own memory modules. Because what will prevent Nvidia from starting to make this? (it's a legitimate question I have)
 

Mitrovah

Honorable
Feb 15, 2013
144
0
10,680


Oh, so you are stating the processes and the chips are too expensive to mass produce GPU cards more cheaply and quickly, is that correct?
 

serendipiti

Distinguished
Aug 9, 2010
152
0
18,680

it's a fragmentation issue. If you need to access lots of textures, shaders, etc then the memory latency becomes an important factor (because every access will have to wait for the data to arrive). Although there are some tricks (caches) to improve this, look at the trend at current graphics cards: high speed memory (Ghz) with relatively narrow data paths (and, probably, doubling to 256 or 512 bits wide memory buses is something that boards makers can do, problably raising the cost to more than double, but not to be prohibitive.
 

InvalidError

Titan
Moderator

HBM is an obvious invention and similar concepts have been used by chip manufacturers before, albeit without the benfits of through-silicon vias to effectively eliminate trace length. The only differences today are that mainstream GPUs and IGPs require the new capabilities enabled by HBM to progress any further and the prerequisite technologies to make HBM economically viable have already entered the mainstream, in large part motivated by extreme space constraints in mobile devices.

HBM is little more than the combination of multi-chip package technology from 20+ years ago with through-silicon vias that became popular a few years ago. The two key enabling aspects of TSV here would be circuit density through stacking and the ability to tightly couple otherwise incompatible semiconductor processes - DRAM does not mix with high-speed CMOS logic.


Supply and demand. HBM is new, so it will likely take a while for availability to scale up.

Once nearly everyone and everything starts using HBM, HBM should become cheaper to produce than standard DRAM chips due to eliminating most of the packaging and distribution overhead: instead of sending fully packaged individual ICs to countless distributors, board and system integrators, they can ship whole wafers to CPU/GPU/APU/SoC manufacturers. Much lower cost overheads per sale.

 

somebodyspecial

Honorable
Sep 20, 2012
1,459
0
11,310


The main problem for AMD is they can't AFFORD much of anything (least of all mistakes). Another problem for AMD is, most customers don't read hardware sites (nor forums) to find out that 6 isn't always better than 4 (still needs to be proven that 4GB is ok anyway, but you get the point). IE, there are many people who STILL think 64bit is 2x faster than 32bit...LOL. AMD's 2nd problem here is cost. IF the cost of rev1 HBM is so high they can't make a dime when trying to compete at X price for the card, they will be screwing their own tech here. I really think they should have went one more rev on GDDR5 as bandwidth wasn't an issue and looks like it won't be for another gen at least. NV took the route that allows them to price competitively easily and still make profits. Then again I think part of why AMD went this way is lack of R&D funding to get watts down on their own IN the GPU. Maybe it was their only move, rather than working on better compression and a better gpu perf/watt gpu.

I don't expect AMD to be able to defeat perception or cost here, and end the end of this gen (380-390's) I expect either break even or loss again. You have too many things here that can go wrong which will hike prices. You have new memory, new process, and a new gpu. Nvidia can take a risk like that and eat low yields etc for a quarter or three (heh), but AMD can't. That is a major problem even outside of the perception issues on amount of ram after years of being told more is better here. I'm not even sure you couldn't get 2 more gens out of GDDR5 with die shrinks (lowering volts some while upping speeds again) and if necessary going to 512bit bus again for top cards. It's funny AMD's charts here have no dates on them for when the crap hits the fan. Meaning at what point is this a problem? It isn't now at least for nvidia. Will it be eventually? Yes, surely. IE, when AMD went ON-DIE with the memory controller that was severely needed and gave a massive boost to perf causing Intel some REAL headaches. Since we're not constrained now, as shown by currently getting far more fps by upping gpu clocks vs. memory clocks (looking at OC results), you're adding the equivalent of blue crystals (in Intel speak...LOL) for the time being which just costs money and hurts your ability to compete on price.

I could be wrong, but I don't see AMD (or NV) being massively bandwidth constrained TODAY or probably not next gen either when trying to stay above 30fps (we need more gpu). You don't get massive perf improvements from OCing the memory on anything due to us needing more gpu perf first to stay at 30fps when either of these matters. As noted you can also up the bus width to get even more if 384 isn't enough (in NV's case that is). NV has used 448 & 512bit before (IE, 690x etc), though AMD is there now. Clearly NV has better compression tech if AMD needs this (looking at amd's recent cards, and maxwell's compression efficiency). If you can point to something that IS bandwidth constrained, my guess is you'll find gpu lacking and be under 30fps too. We'll soon see if HBM does anything more than drop watts ~18-20w on a card. But it sure seems like what the consoles stole from R&D on cpu's and now gpus is running it's course (which BTW was the exact reason Nvidia passed - less R&D for CORE products if they tried consoles).

One more point, AMD needs to make a gaming card like Nv has to compete. Scoring good in F@H (or coin mining etc) and synthetic crap like that does nothing for fps in games. NV will continue to have an easier time in benchmarks until AMD decides to FOCUS on games by stripping crap out that most don't use and using that die space to amp up stuff they DO use (fp32 for now). Jack of all trades doesn't work here now that nvidia is splitting things correctly for max perf/profit. AMD is wisely (FINALLY) going this route with ZEN. IE, stripping gpu crap, making a dominant high IPC cpu only dedicated to destroying Intel's pricing power for a while (catering to GAMERS who disable the gpu seconds after the build anyway - and these people buy DISCRETE), I hope they do it to GPU soon too. I wonder how long it will take Intel to respond to zen with no gpu. Hopefully AMD pockets a few billion before that on ZEN etc (with pricing power and perf) before Intel can put out a new enthusiast core, which we haven't had since the gpu got on board ;) which was coincidentally basically the day AMD gave up on cpu. Hopefully ZEN is under ~100w while doing it, but even at 125w they'll pay a premium as long as you WIN. What we don't want is 200w and a win. That's too high even in a desktop and means OCing would be ridiculous.
 

HBM is coming around because GDDR is no longer able to push the GB/W expected in modern applications. With all GPU designers likely to switch over to HBM, it seems unlikely anyone will bother pushing for GDDR6/7. The problem with off-package memory is that you end up wasting tons of power on bus termination, de-skew, signal equalization, etc. and this gets worse as speeds go up. With HBM, the bus lines are too short for most of these to matter. At least for now.

The next step after HBM would be direct TSV connection between the RAM and GPU/CPU/whatever - skip the interposer and intermediate glue logic altogether.

There are alternatives to HBM though:

http://www.extremetech.com/computing/197720-beyond-ddr4-understand-the-differences-between-wide-io-hbm-and-hybrid-memory-cube

SO there is no 100% that NVidia will go HBM if they have another option that may better suit their needs/wants. I guess we will have to wait and see how that plays out.

It is not the only way to go. As well, stacking RAM on the GPU die is not impossible. Intel has had on package RAM for quite a while:

http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-landing-cpu-for-exascale-supercomputing

I am sure they are still working on integrating it on die. Problem I can see with that is if the RAM goes bad you have to replace the entire CPU. GPU is fine since, well you can't replace RAM on it anyways.
 

InvalidError

Titan
Moderator

HBM is not for current GPUs, it is for future GPUs. Making a PCB for a 512bits wide memory bus is expensive and inefficient, going HBM on package eliminates the costly PCB from the equation. This may not seem like an immediate necessity but at 14nm, you will have about four times as much compute in the same die area and that will require a matching increase in memory bandwidth that GDDR5 simply cannot keep up with. Think of the R9-390X as a proof-of-concept/experimental project: apply HBM to a product before it becomes absolutely necessary. As AMD said, HBMv2 next year will have twice the bandwidth, which means their first-gen HBM is likely being clocked very conservatively. BTW, if memory bandwidth was not a concern, AMD and Nvidia would not have bothered implementing so many techniques to reduce dependence on memory bandwidth, such as the new texture compression that enabled last year's memory bus redux. With more raw memory bandwidth available, AMD will be able to spend more resources on compute rather than memory compression tricks and associated overhead.

As for Zen, AMD's claim of 40% better IPC is unqualified. I would not be surprised if it turned out to be something like 20% better IPC for a single thread + 20% from simultaneous multi-threading. That would put AMD about on par with Sandy/Ivy Bridge i7 for IPC in 2016. On Intel's side though, Skylake is coming. Zen might sound good on paper but by the time it launches, it may do little more than claw back the additional performance gap Broadwell+Skylake created.

You say AMD cannot afford mistakes. Well, they cannot afford playing it safe either.
 


I would assume that Bulldozer was AMDs way of not playing it safe. A completely new uArch not based on the K8/K10/10.5 uArch? A module idea that requires high clocks, high power but has low IPC? That seems pretty risky to me, especially considering how they made fun on Intel for doing the same thing during Netburst (high clocks/power, low IPC).
 

InvalidError

Titan
Moderator

The article only lists three "alternatives" and points out that HBM is just a GPU-specific flavor of Wide-IO, which means they do not really count as alternatives and it would make no sense for Nvidia to pick Wide-IO.

Besides that, the article says Nvidia is one of the backers behind HBM, along with AMD and Hynix.

HBM is also a JEDEC standard (JESD235), which means AMD, Nvidia and whoever else decides to use HBM should eventually be able to source the HBM DRAMs from multiple manufacturers.
 

InvalidError

Titan
Moderator

They repeated some of Intel's mistakes with Netburst, tried a few other things of their own, much of it did not work as planned, and now they are going back to basics. Much like Intel did when they folded what was worth salvaging from Netburst into the P3 to create the Core2, turning the tables on AMD almost overnight.
 


HMC is not a variation of the Wide I/O though. My point is NVidia might go with HMC or they might have something else cooking.

Who knows.



That was my point. That AMD took risks back then but it back fired on them. Some risks are good to take, GCN is a great example as most GCN cards will support DX12. Other risks, especially those that others have already done, are not.
 


The whole Bulldozer fiasco really made no sense to me. You'd think AMD would've known better than to do that when their own T-Bred chips completely outclassed the P4.
 
Status
Not open for further replies.

TRENDING THREADS