Review WD Blue SN5000 4TB SSD review: Western Digital launches a 4TB mainstream drive

1_rick

Distinguished
Mar 7, 2014
112
51
18,670
Please explain "folding" and "pseudo-folding" state.
Sounds like gibberish to me. (Actually I found a Reddit post that purported to explain it, but "Folding as used in this sense is the combining of three separate SLC blocks into a single TLC block." requires some thought. I guess it means "clearing the pSLC cache by writing to the NAND in TLC/QLC mode." Which is probably going to be slow.

The mention of QLC flash probably elicits sighs of its own. It’s best not to be too hasty, as the performance specifications of this flash rival those of earlier 3D TLC flash generations.

And the sighs are because, as expected, once you exhaust the pSLC cache, the performance drops drastically.

The drive's probably fine until you do something like "copy Diablo IV to a different PC rather than download 90GB again" and you have to decide which one is actually going to be slower.
 

Notton

Commendable
Dec 29, 2023
868
766
1,260
That's so weird. Why did they use 2x QLC for the 4TB model, when the 2TB model only used 1x TLC chip?
There was plenty of space on the PCB for a second 2TB TLC chip.
 
Sounds like gibberish to me. (Actually I found a Reddit post that purported to explain it, but "Folding as used in this sense is the combining of three separate SLC blocks into a single TLC block." requires some thought. I guess it means "clearing the pSLC cache by writing to the NAND in TLC/QLC mode." Which is probably going to be slow.
A lot of Reddit threads and posts on this topic are by me, so I can quickly summarize here as briefly as possible. There's different granularities or units of storage depending on what you're doing with flash, the most important being the smallest read/write unit (pages) and the smallest erase (blocks). The latter is what we're talking about with folding since you're "folding" multiple blocks into one with garbage collection working at the block level as well. The smaller blocks are pseudo-SLC which means the native flash (in this case, QLC) acting in single-bit mode like SLC, so four blocks of SLC/pSLC "folds" into one block of QLC. This is required to free up space since the SLC/pSLC is only one-fourth the "real" QLC capacity.
 
I did at least learn something useful from my searches. I had always wondered how empty space on SSD can be used as cache, when writing to such "cache" would be same as direct-writes. Per the reddit post below, apparently a dynamic cache of TLC/QLC can be used in SLC mode (and speed) at the penalty of requiring 3x or 4x space.

https://reddit.com/r/hardware/comments/kn7can/how_does_slc_cache_work_in_ssds

Check my reply just above as it applies here (and I did have some posts in that Reddit thread). In any case, direct writes means writing straight to the native flash. This can happen when the cache is full but doesn't take up all of the free capacity of the drive. That is, the cache is not using all of the drive's flash for the SLC mode. You usually want to avoid direct writes by using a pseudo-SLC mode because the impact of random writes is higher, but for sequential writes going to native is fine. Also, by writing first to SLC and then folding/copying to native you reduce write amplification since SLC writes out sequentially.

Regardless, if you outrun capacity you are forced into a "folding" state which is slower because you're writing to SLC first, reading from it, rewriting to TLC, confirming write before SLC erasure (to free up space), while already-written data also needs to be copied (not counting towards new writes). However, background copying can occur in any mode by taking some of the die time, so it's possible to get back SLC (or QLC vs folding) which causes a jump in performance. This can make for inconsistent speeds after the cache.
 
Last edited:
That's so weird. Why did they use 2x QLC for the 4TB model, when the 2TB model only used 1x TLC chip?
There was plenty of space on the PCB for a second 2TB TLC chip.
2TB one is probably the same as the 2TB SN770, which uses 1Tb dies (versus 512Gb at lower capacities). You can get that in one 16DP. Normally you'd do that for a drive that has to accommodate M.2 2230 (SN740/SN770M), maybe they are just repurposing. The QLC on the 4TB is 1Tb as well and so needs two packages though.
 
  • Like
Reactions: Notton
>The result is that the SN5000 writes at almost 5 GB/s in pSLC mode before hitting a pseudo-folding state at around 544 MB/s.

>What’s most likely happening is that the SN5000 is alternating between direct-to-QLC and folding.

Please explain "folding" and "pseudo-folding" state.
Check my replies above, since someone linked a Reddit thread I was in and I was summoned as if by magic. However, to be quick: folding is when the drive has to copy over already-written data from SLC to QLC in order to free up space/capacity. In some cases, the drive forces writes to SLC first as this can reduce write amplification/wear, in other cases it will go to writing directly to the QLC or can be forced to fold. As space is freed up, including in the background, this can lead to spikes/fluctuations in write speed.
 
@Maxxify, I appreciate your shedding some light on the subject. I get now where the term "folding" comes from. Some more questions if you don't mind:

. How do DRAM and DRAM-less compare in performance? Do DRAM NVMe's generally draw more power, and thus generate more heat?

. Does DRAM act as a tier 1 cache before data overflows into a SLC cache (if any) as a tier 2, and to pSLC as a tier 3?

. How has SSD performance improved over succeeding generations, and how has that improvement filter down from premium SSDs to mainstream to value offerings? (This last question is a bit expansive, so you can just summarize the highlights.)
The term "folding" comes from SanDisk and later their "nCache" technology. You can find articles on this from 10+ years ago, I think. In any case the diagram/graphic they used to explain it at the time was basically showing a DMA-like (direct memory access) operation where 3 blocks of SLC/pSLC compacted into 1 block of TLC. It's more complex than this today as there are different ways to merge blocks but essentially that is the idea. One advantage that was noted is that this can be done on-die without controller interaction, which means you don't have the overhead that killed host (incoming) I/O. You're still limited by simultaneous die operations, though.

DRAM-less drives are more often 4-channel so pull less power as a result, but if you're comparing like-for-like (and this did happen more in the past, e.g. SM2263 v SM2263XT) then DRAM-less technically pulls less power as it does away with external DRAM (which pulls some power) and reliance on a DRAM memory controller for it. However, performance could be worse in some cases, which could make it less efficient in some workloads/scenarios.

DRAM can but usually does not act as a write cache (or if it does, not in the way a HDD's DRAM cache does) but rather as a metadata cache for mapping, wear management, etc. FYI I explain this on my subreddit with my SSD Basics, which although outdated covers some of this. SSDs do use a volatile write cache but you don't need a lot of memory for that when you're accessing at a superpage level (e.g. 16KiB x 4 planes/die x 4 dies/channel x 4 channels). It makes more sense to take advantage of DRAM's latency for logical page (4KiB) mapping and other things. There can be multi-tier non-volatile caching though. Static SLC -> Dynamic SLC -> native is very common (e.g. static is FIFO, since it has different wear than dynamic) and it's possible to do pSLC -> pMLC/pTLC -> TLC/QLC and other things, but not at all common.

SSD performance has improved at the controller level and at the flash level (and for DRAM too, but mostly power efficiency). Controllers are more efficient, have way higher IOPS (and queues), better error correction (necessary for denser flash over time), more intelligent algorithms, etc. Flash has also improved a lot but people often say it hasn't. They cite that 4KiB random still feels the same, but in reality at the flash level there's been significant improvements in latency as well as power efficiency, throughput, etc. Today's DRAM-less NVMe SSDs are insanely fast and efficient as a result.

Not to advertise but I only post here from time to time (and mostly just Memory/Storage forum), you can find resources at my subreddit incl Discord. Not to derail the thread: the SN5000 is a good example of the above, since the QLC on the 4TB has specifications that would've been pretty good with Gen3 TLC drives. So people saying "flash hasn't improved" should surely take no issue with this drive, but then they want those juicy sustained graphs. It looks otherwise to be an SN770 which was a great drive borne from WD's experience with the hardware.
 
$350!? What the hell are they thinking?

You can get a 4TB 990 Pro for under $300, which beats the SN5000 in every way.
I think this is just the initial MSRP, because you don't want to set a price that's "too low." It's relatively easy to come down on price, and harder to go up on price. Plus, there are places that will now be able to say, "Was $349.99, now only $249.99!" for semi-permanent sales.

In fact, even WD itself now shows the base price of the 4TB as $289.99, with a "sale" to $279.99:
https://www.westerndigital.com/products/internal-drives/wd-blue-sn5000-nvme-ssd?sku=WDS400T4B0E

I expect over time the SN5000 drives will trend downward to compete with similar performance drives from other companies, and from WD itself. The SN580 2TB costs $119.99 now, while the 2TB Black SN770 and Blue SN5000 are $139.99. Note that the 2TB SN5000 uses TLC and really is the same basic hardware as the SN770, if I've got my facts right, so it makes sense for them to be priced the same.
 
  • Like
Reactions: Maxxify
Holy hell! The predictions of SSD storage skyrocketing in price were no joke. A 4TB NVMe costs $300CAD at the least expensive. I bought both of my 2TB NVMe drives for only $90CAD each a little over a year ago. Now the 2TB drives start at $135CAD and go up from there. That's literally a 50% increase in price from a year ago.

Things were so much better a year ago that when I went into Memory Express to buy a 256MB WD Black SN770 PCIe4 NVMe to use as my new system drive. It cost $40CAD at the time but the salesperson used some "sales wizardry" to get me to spend $45CAD on a 512MB version (that was somehow faster than the 256MB model) instead. :giggle:

The prices today are just guano-insane.
 
  • Like
Reactions: JarredWaltonGPU
Holy hell! The predictions of SSD storage skyrocketing in price were no joke. A 4TB NVMe costs $300CAD at the least expensive. I bought both of my 2TB NVMe drives for only $90CAD each a little over a year ago. Now the 2TB drives start at $135CAD and go up from there. That's literally a 50% increase in price from a year ago.

Things were so much better a year ago that when I went into Memory Express to buy a 256MB WD Black SN770 PCIe4 NVMe to use as my new system drive. It cost $40CAD at the time but the salesperson used some "sales wizardry" to get me to spend $45CAD on a 512MB version (that was somehow faster than the 256MB model) instead. :giggle:

The prices today are just guano-insane.
Yup. The NAND makers cut production, plus the AI cycle is apparently encouraging businesses to purchase larger drives for model storage. Or at least I've heard that's happening. Those two things have combined to help massively inflate prices. We were seeing good quality 2TB TLC drives for $90 USD and lower last year, now the best options cost at a minimum $120, with some models starting at $150.
 
  • Like
Reactions: Avro Arrow
Yup. The NAND makers cut production, plus the AI cycle is apparently encouraging businesses to purchase larger drives for model storage. Or at least I've heard that's happening. Those two things have combined to help massively inflate prices. We were seeing good quality 2TB TLC drives for $90 USD and lower last year, now the best options cost at a minimum $120, with some models starting at $150.
The NAND makers cut production while demand skyrockets because of the AI cycle... Capitalism at it's worst I tell ya.

It just goes to show you that even when a market isn't saddled by a small oligopoly, collusion is still possible. I mean, there are something like 26 manufacturers of NAND Flash Memory and they're all behaving contrary to the so-called "invisible hand" of the market. That means collusion because, with 26 competitors, at least one would move to take advantage of the surplus demand if collusion wasn't preventing it.

That the so-called "Competition Bureaus" of the world allow this to occur is a symptom of the staggering corruption that they're involved with. If they're not going to do their jobs and properly regulate the market, what's the point of their existence?
 

1_rick

Distinguished
Mar 7, 2014
112
51
18,670
A lot of Reddit threads and posts on this topic are by me, so I can quickly summarize here as briefly as possible. There's different granularities or units of storage depending on what you're doing with flash, the most important being the smallest read/write unit (pages) and the smallest erase (blocks). The latter is what we're talking about with folding since you're "folding" multiple blocks into one with garbage collection working at the block level as well. The smaller blocks are pseudo-SLC which means the native flash (in this case, QLC) acting in single-bit mode like SLC, so four blocks of SLC/pSLC "folds" into one block of QLC. This is required to free up space since the SLC/pSLC is only one-fourth the "real" QLC capacity.
Thanks! This extra info makes a lot of sense; I was familiar with the size differences between pages and blocks.