[SOLVED] What happened to HBM memory?

fordongreeman · Apr 19, 2021

What happened to HBM memory?

Ralston18 · Apr 19, 2021

In what context?

I did a quick unrestricted/unfiltered internet search via Google.

One older link I found:

https://semiengineering.com/whats-next-for-high-bandwidth-memory/

There were other links as well.

For example, the following April 15th or so link from within this Forum -

https://www.tomshardware.com/reviews/glossary-hbm-hbm2-high-bandwidth-memory-definition,5889.html

InvalidError · Apr 19, 2021

Still being used on niche hardware like datacenter GPGPUs, network switching gear, fabric switches, special CPUs, etc.

On the consumer side of things, AMD and Nvidia have decided to increase GPU buffer and cache sizes to stretch the viability of GDDR6(X) some more and shave some costs.

dorsai · Apr 19, 2021

Simply put...cost. It costs more to implement than it's worth.

gamerk316 · Apr 19, 2021

Simply put: Cost/Performance. Most games aren't limited by memory bandwidth, so it didn't make financial sense to use HBM over GDDR6x.

InvalidError · Apr 19, 2021

gamerk316 said:
Most games aren't limited by memory bandwidth, so it didn't make financial sense to use HBM over GDDR6x.

They would be without the buffed buffers and caches. It just turns out that increasing on-chip SRAM, which is itself ludicrously expensive, to get by on GDDR6(X) is still cheaper overall than HBM and good enough in the consumer space at least for now.

That said, the main reason HBM costs more is low volume. If there was mass adoption of HBM2/3, it would become marginally more expensive than GDDR6.

gamerk316 · Apr 19, 2021

InvalidError said:
That said, the main reason HBM costs more is low volume. If there was mass adoption of HBM2/3, it would become marginally more expensive than GDDR6.

This applies to literally every product ever. It's the chicken and egg problem: To make a product affordable, you need people who want to purchase it, which requires the product is affordable.

hotaru.hino · Apr 19, 2021

I may not be able to confirm this, but one of the main problems I see with HBM is that it decreases the tolerance for defects. Once you bind an IC onto the package substrate, it's probably really hard to correct it. And HBM based systems have more points of failure per substrate over a single die on a substrate, and all of these points have to pass for it to work.

Also if the Semiengineering article posted before is correct, only Samsung and SK Hynix make HBM stacks. Unless you can get Samsung to make everything and produce the final package, you have to buy HBM stacks separately, find someone to assemble the final package, and hope that they have a good track record for doing so.

So basically, I think HBM is simply to complex to bring to scale and remain profitable.

InvalidError · Apr 19, 2021

hotaru.hino said:
Once you bind an IC onto the package substrate, it's probably really hard to correct it. And HBM based systems have more points of failure per substrate over a single die on a substrate, and all of these points have to pass for it to work.

Dies are usually tested while they are still on the wafer to avoid wasting time cutting and handling defects, so there shouldn't be many defects making it into HBM stacks and the stacks themselves get tested again before shipping to the customer.

On the points of failure side of thing, I think AMD's Zen 2/3 Ryzen/TR/EPYC lineups have proven that having thousands of signals between CCD and IOD is not a major yield or reliability issue and Intel's crazy 47 tiles monster shows that Intel is fairly confident in its fancy packaging abilities.

hotaru.hino · Apr 19, 2021

InvalidError said:
Dies are usually tested while they are still on the wafer to avoid wasting time cutting and handling defects, so there shouldn't be many defects making it into HBM stacks and the stacks themselves get tested again before shipping to the customer.

The defects may not be there when they arrive for final assembly, but something may happen during the final assembly.

On the points of failure side of thing, I think AMD's Zen 2/3 Ryzen/TR/EPYC lineups have proven that having thousands of signals between CCD and IOD is not a major yield or reliability issue and Intel's crazy 47 tiles monster shows that Intel is fairly confident in its fancy packaging abilities.

While do bring up a point about MCM manufacturing being sufficient for Zen 2/3, I don't think there's "thousands of signals". Most of the diagrams I've seen say IF is a 32-bit wide bus in each direction, so 128 lines total (64 for data, 64 for ground) per CPU die. So at most this is 1024 lines (half of which are ground) for an 8-die EPYC. The other thing is the package itself from what I can tell is no more different than a bog standard PCB. The interposer for HBM based devices is typically silicon based.

I was also going to comment on the cache thing in GPUs. I would argue the primary reason for increasing cache is because NVIDIA and later AMD went to a tiled rasterization rendering scheme. This has the benefit of not requiring as much bandwidth between the GPU and VRAM since only 32x32 tiles are being passed around and having a sufficiently large enough cache can let you hide memory latency.

InvalidError · Apr 19, 2021

hotaru.hino said:
Most of the diagrams I've seen say IF is a 32-bit wide bus in each direction

Not much information out there besides the 32B/fclk read and 16B/fclk write on marketing slides. Something is clearly not symmetrical there.

Well, CPUs are going to be the small fries of fancy packaging once chiplet GPUs become a thing. Those will need far more than 64GB/s to spare developers the same or similar headaches to what they had with the various SLI/CF modes.

Search

[SOLVED] What happened to HBM memory?

fordongreeman

Honorable

InvalidError

Ralston18

Titan

InvalidError

Titan

dorsai

Splendid

gamerk316

Glorious

InvalidError

Titan

gamerk316

Glorious

hotaru.hino

Glorious

InvalidError

Titan

hotaru.hino

Glorious

InvalidError

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page