News SanDisk's new HBF memory enables up to 4TB of VRAM on GPUs, matches HBM bandwidth at higher capacity

Admin · Feb 13, 2025

SanDisk talks high bandwidth flash memory that promises to wed HBM bandwidth with 3D NAND capacity.

SanDisk's new HBF memory enables up to 4TB of VRAM on GPUs, matches HBM bandwidth at higher capacity : Read more

bit_user · Feb 13, 2025

The article said:
HBF will never match DRAM in per-bit latency

Yes, read latency is certainly a major issue they're having to contend with. The only way I see it being viable is if reads have sufficient coherencey that you can basically stream data from the NAND, rather than waiting until receiving each read command to begin fetching the data.

The article said:
SanDisk didn't touch on write endurance. NAND has a finite lifespan that can only tolerate a certain number of writes.

Also reads, AFAIK. I recall hearing that the old cell design that I think was last used by Micron/Crucial and Intel, about 5 years ago, was particularly susceptible to wear-out from reads. That said, I'm still using some Crucial TLC SSDs of that vintage and they're holding up fine.

3D NAND: Benefits of Charge Traps over Floating Gates - The Memory Guy Blog

(The following is an update of a post that originally ran on 22 November 2013. It was republished in 2024 as a part of a series to honor the 3D NAND inventors who have received the 2024 FMS Lifetime Achievement Award.) A prior post in this series (3D NAND: Making a Vertical String) discussed the

thememoryguy.com

Even if less so, I'd suspect charge trap is still susceptible to read wear (I think I've even seen estimates of read-endurance in modern 3D NAND, but I don't recall where), unless someone knows otherwise.

The article said:
NAND is also typically written to at block granularity, whereas memory is bit-addressable. That's another key challenge.

Uhhh... no. HBM is just DRAM and it's not bit-addressable. That's one of the main differences between DRAM and SRAM. HBM3e has a transaction size of 32-byte bursts (i.e. per 32-bit subchannel, not per 1024-bit stack).

https://www.synopsys.com/glossary/what-is-high-bandwitdth-memory-3.html

It's not even specific to HBM, but that seemed the most relevant source to cite. It fundamentally has to do with how DRAM reads & writes work, which operate at row granularity. I'm surprised Anton didn't seem to know that.

https://en.wikipedia.org/wiki/Dynam...s_to_read_a_data_bit_from_a_DRAM_storage_cell

Also, I found this diagram pretty hilarious:

No, there needs to be some actual DRAM!

That's more believable.
: )

A Stoner · Feb 13, 2025

As often as one would think the information would change, how long would this flash memory last?

bit_user · Feb 13, 2025

A Stoner said:
As often as one would think the information would change, how long would this flash memory last?

As long as the GPU has some regular DRAM (i.e. HBM, GDDR, DDR, LPDDR, etc.), the HBF can be used only to hold an AI model. I think it's a reasonable assumption the set of AI models a given machine needs to inference probably change somewhat infrequently. So, this usage would be classed as "write rarely, read mostly". That said, if such GPUs are used in rentable cloud machines with frequent turnover, write endurance could be an issue.

While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.

BTW, I suspect they're probably using a lower bit-density per cell, maybe even pSLC. That could really help with endurance, as would the abundant capacity + wear-leveling.

JRStern · Feb 13, 2025

I likes it!
HBM with dynamic RAM that needs constant refreshes is a dumb, dumb design.
Put in a 100mb of SRAM and 4tb of flash.
Winner.
DRAM? Not on your GPU module, no sir, no ma'am.

... unless HBM4, 5, or 17 already specified something like this, which it should.

Alvar "Miles" Udell · Feb 13, 2025

So is this going to actually be a thing, or vaporware like X-NAND?

https://www.tomshardware.com/news/x-nand-technology-gets-patented-qlc-flash-with-slc-speed

usertests · Feb 13, 2025

bit_user said:
While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.

With the AI market being as hot as it is, the HBF-equipped product might be thrown away after 3-5 years anyway.

bit_user · Feb 14, 2025

JRStern said:
HBM with dynamic RAM that needs constant refreshes is a dumb, dumb design.

Why is that dumb? Modern DRAM now has refresh commands the memory controller can simply send, rather than forcing the host to read out data it doesn't need or want. AFAIK, the overhead imposed by refresh is now almost a non-issue, in DRAM.

As mentioned in the article, you cannot use NAND the same way as DRAM. This design was made for memory that you mostly just read from. It can't be used for training or HPC, because the NAND would burn out in probably a matter of days or weeks.

Finally, NAND has access latencies probably well into the realm of microseconds. This further restricts it to use with regular and predictable access patterns.

bit_user · Feb 14, 2025

usertests said:
With the AI market being as hot as it is, the HBF-equipped product might be thrown away after 3-5 years anyway.

That's the standard service life for most datacenter equipment. The concern would be if it burned out far quicker.

Nikolay Mihaylov · Feb 14, 2025

bit_user said:
Finally, NAND has access latencies probably well into the realm of microseconds. This further restricts it to use with regular and predictable access patterns.

NAND's latency is roughly 3 orders of magnitude higher than DRAM. Optane was ~2 orders of magnitude. Would've been a much better fit for this. Maybe called HBO - High Bandwidth Optane. Alas, Intel killed the only original technology they've had in a looong time.

subspruce · Feb 15, 2025

bit_user said:
As long as the GPU has some regular DRAM (i.e. HBM, GDDR, DDR, LPDDR, etc.), the HBF can be used only to hold an AI model. I think it's a reasonable assumption the set of AI models a given machine needs to inference probably change somewhat infrequently. So, this usage would be classed as "write rarely, read mostly". That said, if such GPUs are used in rentable cloud machines with frequent turnover, write endurance could be an issue.

While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.

BTW, I suspect they're probably using a lower bit-density per cell, maybe even pSLC. That could really help with endurance, as would the abundant capacity + wear-leveling.

I am pretty sure that true SLC would be used here. When it's being used as memory and the tech itself is a ticking time bomb, picking the time bomb that ticks slowest is the best option.

bit_user · Feb 15, 2025

subspruce said:
I am pretty sure that true SLC would be used here. When it's being used as memory and the tech itself is a ticking time bomb, picking the time bomb that ticks slowest is the best option.

Agreed. The NAND dies they're using must be custom-designed for this application, given the TSVs and their overall organization. In that case, they can dispense with the fancy MLC+ machinery and just implement binary logic to read and write the cells.

Plus, if you consider the fact that it takes them 8 stacks to reach 4 TB, whereas we can buy 4TB TLC SSDs that are implemented on single-sided M.2 boards, the NAND they're using must be considerably lower density.

Search

News SanDisk's new HBF memory enables up to 4TB of VRAM on GPUs, matches HBM bandwidth at higher capacity

Admin

Administrator

bit_user

Titan

3D NAND: Benefits of Charge Traps over Floating Gates - The Memory Guy Blog

A Stoner

Distinguished

bit_user

Titan

JRStern

Distinguished

Alvar "Miles" Udell

Dignified

usertests

Distinguished

bit_user

Titan

bit_user

Titan

Nikolay Mihaylov

Commendable

subspruce

Prominent

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page