News Samsung's revived Z-NAND targets 15x performance increase over traditional NAND — once a competitor to Intel's Optane, Z-NAND makes a play for AI d...

Regular SSDs, by contrast, typically use page sizes in the 8 to 16 KB range.
8 KiB is if you are lucky. You tend to get that with TLC run in pSLC mode. 16 KiB is the norm nowadays. Some pSLC SSDs made from QLC NAND will do 4 KiB pages.

If Samsung's targets hold true, its next-generation Z-NAND technology will be 15x more performant than today's NVMe SSDs.
What we’d really like to know is what differentiates Z-NAND from SLC. Because SLC and pSLC are far more accessible now, why would anyone want Z-NAND?
 
Moving SSD's directly onto GPU's, CPU and RAM being bypassed... this makes me think that the System-on-GPU concept is potentially getting closer to reality.
 
Question I have is, what purpose does Z-NAND and Optane serve now or in the near future?
Nvidia has said they want vastly higher-performing SSDs, which probably caught Samsung's interest.

Also, Sandisk is proposing to use flash in a HBM-like fashion for direct streaming of AI model weights. Perhaps Samsung is also heading in that direction, with Z-NAND?

Because if there is demand, Intel could sell off their Optane IP and let someone else make it.
I doubt it. Optane fell short of expectations on multiple levels. Perhaps the final nail in its coffin was its failure to effectively scale in 3 dimensions, as Intel had initially hyped. IIRC, the first gen Optane dies were only 2 layers and the second gen were only 4? Meanwhile, NAND is doing like 300 layers, these days.

From what I've heard, Optane also has an efficiency problem on writing, which sounds like another potential scaling bottleneck.

Regardless, if this technology had legs, I'm sure Intel would've sold it. I think the issue must've been that Intel never turned a profit on Optane, the trends pointed in the wrong direction, and there was probably nobody interested in buying it. The fact that XL-Flash and Z-NAND are competitive on performance, while certainly being much higher density, shows that Optane was probably a dead-end technology.
 
Moving SSD's directly onto GPU's, CPU and RAM being bypassed... this makes me think that the System-on-GPU concept is potentially getting closer to reality.
That's not what they said. It's still going to be decoupled and accessed via PCIe or CXL. They're just talking about the GPU going straight to the SSD, without having to involve the CPU. This is one of the things CXL is designed to do, so I hope that's what they have in mind.
 
That's not what they said. It's still going to be decoupled and accessed via PCIe or CXL. They're just talking about the GPU going straight to the SSD, without having to involve the CPU. This is one of the things CXL is designed to do, so I hope that's what they have in mind.
I think it would be interesting to see the "SSG" concept revived though. It probably would only be done for AI, and looks like they're already getting started.
 
I think it would be interesting to see the "SSG" concept revived though. It probably would only be done for AI, and looks like they're already getting started.
It only makes sense for that sort of "HBF" concept, because you cannot fit that much bandwidth over normal I/O interfaces.

Otherwise, it's a bad idea to put NAND on a GPU. NAND doesn't like high temperatures and is higher-latency than DRAM. That means it's bad to put it next to a 1 kW chip and the latency-impact of going over the interconnect fabric to reach it should be negligible. Furthermore, moving it off the card lets you more easily scale in capacity and bandwidth, while also giving multiple accelerators more symmetrical access to it.

It's like two great flavors that are worse, together.
 
  • Like
Reactions: lemongrassgarlic
That's not what they said. It's still going to be decoupled and accessed via PCIe or CXL. They're just talking about the GPU going straight to the SSD, without having to involve the CPU. This is one of the things CXL is designed to do, so I hope that's what they have in mind.
Oh I know. Still, it seems like one step closer and something that will have to happen eventually to meet the insatiable demand of AI compute. It's still many steps away for sure.

CXL is a great standard in the datacenter.
 
Oh I know. Still, it seems like one step closer and something that will have to happen eventually to meet the insatiable demand of AI compute. It's still many steps away for sure.
As mentioned, the "HBF" is the way that must be integrated tightly with the GPUs.

As for everything else, there'd be tons of headroom merely by somehow incorporating the SSDs into the NVLink fabric.
 
As far as I recalled, Z-NAND is no different from SLC. If the underlying technology is still pretty much the same, I don’t think it will materially improve latency over conventional NAND solution. Between Optane and Z-NAND, I believe the former is much better when it comes to endurance and latency. But it’s too costly, draws too much power and produces too much heat, hence not suitable for mass adoption to scale. I’m still using Optane drives to install OS and despite the much slower sequential transfer rate, it’s very fast and makes the system more responsive.
 
As far as I recalled, Z-NAND is no different from SLC. If the underlying technology is still pretty much the same, I don’t think it will materially improve latency over conventional NAND solution.
I'm sure it's similar to Kioxia's XL-Flash. In both cases, they doing some organizational optimizations to reduce latency and increase bandwidth, such as using smaller page sizes and more planes. Kioxia's next gen XL-Flash will allegedly support 10M 512-byte read IOPS.

Importantly, you can't get there by using pSLC on conventional 3D NAND. Achieving such good 512B performance requires a custom design.

Between Optane and Z-NAND, I believe the former is much better when it comes to endurance and latency.
Kioxia claims latency of 3-5 usec, whereas I get about 4.3 usec for 512-byte random read latency from my P5800X. So, if they deliver on that promise, then they'll be up there with the last, best Optane SSDs Intel ever made.

As for endurance, we need look no further than existing XL-Flash. The 800 GB DapuStor Xlenstor2 X2900P has a rated endurance of 100 DWPD, which is the same as Intel's 800 GB P5800X.


However, that first gen XL-Flash really couldn't match the P5800X's QD1 read latency. So, we'll have to see if Kioxia makes good on its promises.

I’m still using Optane drives to install OS and despite the much slower sequential transfer rate, it’s very fast and makes the system more responsive.
I can read about 7.4 GB/s from my P5800X, which is plenty.
 
  • Like
Reactions: Jame5
That's not what they said. It's still going to be decoupled and accessed via PCIe or CXL. They're just talking about the GPU going straight to the SSD, without having to involve the CPU. This is one of the things CXL is designed to do, so I hope that's what they have in mind.
Will the malware scanner (Defender....etc) be able to scan the data moving directly to the GPU in the background?
 
  • Like
Reactions: bit_user
Will the malware scanner (Defender....etc) be able to scan the data moving directly to the GPU in the background?
I think the idea is that you wouldn't use it in ways/places where that would be a concern.

It's a good point that some malware might be able to propagate itself by routing writes via the GPU, though. It'd certainly be something to think about, if this ever sees use outside of specialized AI training and inference contexts.

I think the way security software could try to avoid that is by preventing:
  1. Preventing executable files and loadable libraries from being opened for the GPU to write (the GPU can't open files - only read & write ones specially prepared for it by the host OS).
  2. Scanning every non-executable file that's changed to an executable file type, at the time of the change.

For all I know, they already do the second one. In either case, once you go to execute the file, it should get scanned.
 
Last edited: