News SanDisk's new HBF memory enables up to 4TB of VRAM on GPUs, matches HBM bandwidth at higher capacity

The article said:
HBF will never match DRAM in per-bit latency
Yes, read latency is certainly a major issue they're having to contend with. The only way I see it being viable is if reads have sufficient coherencey that you can basically stream data from the NAND, rather than waiting until receiving each read command to begin fetching the data.

The article said:
SanDisk didn't touch on write endurance. NAND has a finite lifespan that can only tolerate a certain number of writes.
Also reads, AFAIK. I recall hearing that the old cell design that I think was last used by Micron/Crucial and Intel, about 5 years ago, was particularly susceptible to wear-out from reads. That said, I'm still using some Crucial TLC SSDs of that vintage and they're holding up fine.


Even if less so, I'd suspect charge trap is still susceptible to read wear (I think I've even seen estimates of read-endurance in modern 3D NAND, but I don't recall where), unless someone knows otherwise.

The article said:
NAND is also typically written to at block granularity, whereas memory is bit-addressable. That's another key challenge.
Uhhh... no. HBM is just DRAM and it's not bit-addressable. That's one of the main differences between DRAM and SRAM. HBM3e has a transaction size of 32-byte bursts (i.e. per 32-bit subchannel, not per 1024-bit stack).
It's not even specific to HBM, but that seemed the most relevant source to cite. It fundamentally has to do with how DRAM reads & writes work, which operate at row granularity. I'm surprised Anton didn't seem to know that.

Also, I found this diagram pretty hilarious:

N2uMfXGp8NPsygjTMoEGJh.jpg

No, there needs to be some actual DRAM!


s4sTWj7642Cq7k6hrVvH4h.jpg

That's more believable.
: )
 
Last edited:
As often as one would think the information would change, how long would this flash memory last?
As long as the GPU has some regular DRAM (i.e. HBM, GDDR, DDR, LPDDR, etc.), the HBF can be used only to hold an AI model. I think it's a reasonable assumption the set of AI models a given machine needs to inference probably change somewhat infrequently. So, this usage would be classed as "write rarely, read mostly". That said, if such GPUs are used in rentable cloud machines with frequent turnover, write endurance could be an issue.

While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.

BTW, I suspect they're probably using a lower bit-density per cell, maybe even pSLC. That could really help with endurance, as would the abundant capacity + wear-leveling.
 
Last edited:
I likes it!
HBM with dynamic RAM that needs constant refreshes is a dumb, dumb design.
Put in a 100mb of SRAM and 4tb of flash.
Winner.
DRAM? Not on your GPU module, no sir, no ma'am.

... unless HBM4, 5, or 17 already specified something like this, which it should.
 
While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.
With the AI market being as hot as it is, the HBF-equipped product might be thrown away after 3-5 years anyway.
 
HBM with dynamic RAM that needs constant refreshes is a dumb, dumb design.
Why is that dumb? Modern DRAM now has refresh commands the memory controller can simply send, rather than forcing the host to read out data it doesn't need or want. AFAIK, the overhead imposed by refresh is now almost a non-issue, in DRAM.

As mentioned in the article, you cannot use NAND the same way as DRAM. This design was made for memory that you mostly just read from. It can't be used for training or HPC, because the NAND would burn out in probably a matter of days or weeks.

Finally, NAND has access latencies probably well into the realm of microseconds. This further restricts it to use with regular and predictable access patterns.
 
Finally, NAND has access latencies probably well into the realm of microseconds. This further restricts it to use with regular and predictable access patterns.
NAND's latency is roughly 3 orders of magnitude higher than DRAM. Optane was ~2 orders of magnitude. Would've been a much better fit for this. Maybe called HBO - High Bandwidth Optane. Alas, Intel killed the only original technology they've had in a looong time.
 
As long as the GPU has some regular DRAM (i.e. HBM, GDDR, DDR, LPDDR, etc.), the HBF can be used only to hold an AI model. I think it's a reasonable assumption the set of AI models a given machine needs to inference probably change somewhat infrequently. So, this usage would be classed as "write rarely, read mostly". That said, if such GPUs are used in rentable cloud machines with frequent turnover, write endurance could be an issue.

While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.

BTW, I suspect they're probably using a lower bit-density per cell, maybe even pSLC. That could really help with endurance, as would the abundant capacity + wear-leveling.
I am pretty sure that true SLC would be used here. When it's being used as memory and the tech itself is a ticking time bomb, picking the time bomb that ticks slowest is the best option.
 
  • Like
Reactions: bit_user
I am pretty sure that true SLC would be used here. When it's being used as memory and the tech itself is a ticking time bomb, picking the time bomb that ticks slowest is the best option.
Agreed. The NAND dies they're using must be custom-designed for this application, given the TSVs and their overall organization. In that case, they can dispense with the fancy MLC+ machinery and just implement binary logic to read and write the cells.

Plus, if you consider the fact that it takes them 8 stacks to reach 4 TB, whereas we can buy 4TB TLC SSDs that are implemented on single-sided M.2 boards, the NAND they're using must be considerably lower density.