SK Hynix's new 12-layer HBM3 modules to enable extreme performance and capacity.
SK Hynix Samples 24GB HBM3 Modules: Up to 819 GB/s : Read more
SK Hynix Samples 24GB HBM3 Modules: Up to 819 GB/s : Read more
Yeah, they have to be in the same package as the GPU die. The article mentions the interface per stack is 1024 data bits, which it's only feasible to route & drive through an interposer. That compares with 32 bits per GDDR6 chip. However, it runs at a much lower frequency and you don't have as many stacks as you typically have GDDR6 chips. So, it's only like 3-5 times the bandwidth, rather than 32x.Or is my understanding flawed and these HBM3 modules have to actually be on the GPU Die? Which means only 1 maybe 2 modules per GPU Die?
Nothing about it is a drop-in replacement, though. The memory controllers are very different, between the two. It's just one of many things that differentiate the AI/HPC processors from their rendering-oriented GPU cousins.With how things are going, i am seeing that consumer versions get the GDDR6X treatment and professional GPU just have the GRRD6X replaced with HBM3 for higher bandwidth and VRAM.
Necessity will probably bring HBM or something HBM-like with fewer or narrower channels to the GPU and CPU consumer space within the next five years. It'll be the only practical way to meet bandwidth requirements without stupidly high external memory bus power and related PCB costs.I dont expect to see HBM in consumer space anytime in the near future...
The HBM interface is very similar to DDR5 apart from having separate access to RAS and CAS lines, an optional half-row activation feature if you want to split each sub-channel into two more semi-independent channels, only one DQS per 32bits and no bus termination at either end, which I'd say makes HBM simpler overall.Nothing about it is a drop-in replacement, though. The memory controllers are very different, between the two. It's just one of many things that differentiate the AI/HPC processors from their rendering-oriented GPU cousins.
Necessity will probably bring HBM or something HBM-like with fewer or narrower channels to the GPU and CPU consumer space within the next five years. It'll be the only practical way to meet bandwidth requirements without stupidly high external memory bus power and related PCB costs.
The HBM interface is very similar to DDR5 apart from having separate access to RAS and CAS lines, an optional half-row activation feature if you want to split each sub-channel into two more semi-independent channels, only one DQS per 32bits and no bus termination at either end, which I'd say makes HBM simpler overall.
The only genuinely problematic difference IMO is needing eight of those slightly modified DDR5 controllers per stack.
At some level, DRAM is DRAM. I get it. But, HBM3 has 32-bit sub-channels, which means you have 32 of those per stack, rather than the 2 that you get per DDR5 DIMM. So, that's a pretty big deal, and not something you can just gloss over.The HBM interface is very similar to DDR5 apart from having separate access to RAS and CAS lines, an optional half-row activation feature if you want to split each sub-channel into two more semi-independent channels, only one DQS per 32bits and no bus termination at either end, which I'd say makes HBM simpler overall.
They also increased the amount of L2 cache by about 10x. We saw how much Infinity Cache helped RDNA2, so it's a similar idea.take a look at the 3070 and the 4070 for example, they reduced the memory bus from 256bit 192bit. the throughput remained about the same at 448 and 505gb/s by using GRRD6 vs GDDR6X
HBM is still fundamentally still the same technology as any other DRAM. The main reason it is more expensive is relatively low volume production. All that would be necessary to bring the price down is for GPU and DRAM manufacturers to coordinate a hard switch.I highly doubt that will be the case. If they do that, it would be for the halo models.
Nothing forces you to deploy independent memory controllers all the way down to the finest sub-banking option. You can operate each chip in the stack as a single 128bits-wide channel too and you can use stacks with fewer than eight chips if you don't need the largest capacity configuration.At some level, DRAM is DRAM. I get it. But, HBM3 has 32-bit sub-channels, which means you have 32 of those per stack, rather than the 2 that you get per DDR5 DIMM. So, that's a pretty big deal, and not something you can just gloss over.
I'm not convinced that's how it works. In this article, they talk about a 12-high stack with a 1024-bit interface.You can operate each chip in the stack as a single 128bits-wide channel too and you can use stacks with fewer than eight chips if you don't need the largest capacity configuration.
The standard HBM interface is 128bits per interface and intended to be one or two interfaces per die in the stack. What the manufacturer probably did there to get to 12 stacks while maintaining a 1024bits interface is design its DRAM dies so a pair of single-ported 2GB dies can share a port with the two halves of a dual-ported 2GB die with the triplet operating as a pair of 3GB dies.I'm not convinced that's how it works. In this article, they talk about a 12-high stack with a 1024-bit interface.
When you want to cut costs and drive adoption, sometimes sacrifices must be made. Not every applications needs 1TB/s of bandwidth per stack. On-package memory for a relatively high performance APU or lower-end GPU would be perfectly fine at 250-300GB/s.I'm talking about actual HBM3, not simply what's plausible to do with stacked DRAM. Sure, you can dream up lots of plausible options, but once you start cutting the width of the HBM stack's interface, it comes directly at the expense of its bandwidth.
SK Hynix's new 12-layer HBM3 modules to enable extreme performance and capacity.
SK Hynix Samples 24GB HBM3 Modules: Up to 819 GB/s : Read more