[SOLVED] Seagate Firecuda 520: SMART'S "Percentage Used" is Above "Data Units Written" / Drive's TBW

May 24, 2020
4
0
10
I have a Firecuda 520. This is the relevant section of smartctl

34WZecT.png


2239 hours is 8,060,400 seconds
510 TB is 534,773,760 MB
534,773,760 / 8,060,400 is 66 MB/s, which is roughly what live monitoring reports for MB writes/s, 24/7.

The problem: this drive's TBW is 3600.
BUt 510 / 3600 is 14.2% used, so how can Percentage Used show 25%?!
 
Solution
TBW or total bytes written (not terabytes written by origin, but effectively the same now) is a measurement taken for warranty purposes. That is: a certain amount of writes or a warranty period, whichever comes first. Usually it's used to determine the drive writes per day (DWPD) value for write-intensive workloads; for consumers, not so much. When I say "health" below I mean the inverse of "percentage used." (e.g., 73% health = 27% used)

With that in mind, how a vendor/manufacturer determines TBW and (sometimes separately) drive health can vary. Most often the writes you see listed are host writes which is a measure of how much has been written by the system, but that does not reflect how many writes were actually done to the...
I can't tell how they calculate TBW from just that readout. However, total sector writes does not equate to total NAND writes. The 520 is an E16-based drive which means full-drive dynamic SLC caching. Dynamic SLC caching has an additive effect to write amplification between 0 (none) and 3 (the number of bits per cell of the underlying/native TLC flash). This is because the SLC and TLC share the same wear zone as the SLC converts to/from TLC as needed based on how much space the user is using. So you write to SLC, it gets moved to TLC, you're writing multiple times for 1 piece of data. Side effect of the drive's design.
 
Last edited:
510 terabytes / 996,095,194 data units = 512 bytes per data unit

So each "data unit" is one sector.

My very next sentence included, "total sector writes does not equate to total NAND writes" - with a sector being implied as 512 bytes (512e). That's clearly not how they calculate TBW - it's how they calculate host writes - since his health % does not line up with that, and I was explaining it's because that does not account for write amplification which is potentially increased by dynamic SLC caching. NAND writes are often given separately and from that you can determine the WAF. Ergo, they might be calculating TBW ("percentage used") by NAND which is WAF * host. This is not commonly the case, however with full-drive SLC caching the entire wear zone applies to both SLC and TLC modes as I explained above.

(not trying to sound like a jerk with this, just elaborating on what I meant - some vendors do judge TBW/health on host writes, I'm explaining why that would be a poor metric for a drive with full-drive dynamic SLC caching)
 
Last edited:
I mean, you're right, most will adjust to 512-byte sectors for that reading (unless formatted as 4Kn) and you can determine the host writes precisely from this. Further, most manufacturers use that value or a similar reading to determine TBW/health. I only meant to observe that perhaps because of this drive's unique SLC configuration that they track it with another value that takes write amplification into consideration. The difference usually isn't large enough to matter but the E16 drives have a very high TBW rating for a consumer drive combined with an unusually large dynamic SLC cache so they may be hedging their bets.
 
Here it is today:
euKaqs4.png


I understood 90% of what was said here. Basically what I want to know is if I am 27% done or 538/3600 done towards the 3600TBW?

Also, do I decommission the drive after Percentage Used reaches 100% or after the available spare drops to 5%? Let's not consider being safe here (backups, importance of data/service etc) - in black and white, which one of the two signals time to replace the drive?

@mdd1963 it's a heavy use server. (obviously not web)

@USAFRet I have daily backups to RAID 10 metal, and the data can be re-downloaded (could take a long time to do so, but downtime would be minimal as I can spin up the spares that are kept synced)

@Maxxify "they might be calculating TBW ("percentage used") by NAND which is WAF * host. This is not commonly the case" - If TBW was NOT host writes * WAF then WAF would not matter. Where would it play a role if not exactly there? I am confused about the "This is not commonly the case" part.

Side note: I must add that I cannot run FStrim on this drive unless I stop the services on this machine. fstrim causes them to fail (because it locks the disk for writes and maybe for reads too?). this is current debian LTS on modern hardware, don't ask me why I cant run fstrim, i just learned to live with it. I even tried two other reputable manufacturers before this, fstrim = services fail. Amazingly enough, the one drive that can run fstrim and not have services fail is the intel 660p, which as we know is on the lower end.
 
TBW or total bytes written (not terabytes written by origin, but effectively the same now) is a measurement taken for warranty purposes. That is: a certain amount of writes or a warranty period, whichever comes first. Usually it's used to determine the drive writes per day (DWPD) value for write-intensive workloads; for consumers, not so much. When I say "health" below I mean the inverse of "percentage used." (e.g., 73% health = 27% used)

With that in mind, how a vendor/manufacturer determines TBW and (sometimes separately) drive health can vary. Most often the writes you see listed are host writes which is a measure of how much has been written by the system, but that does not reflect how many writes were actually done to the flash/NAND ("NAND writes"). This is because of something called write amplification where more than a single write is done for a given piece of data; the ratio between NAND and host writes is known as the write amplification factor (WAF).

If you're following that so far, your drive has full-drive SLC caching which means all of the TLC can act in single-bit mode (more or less). The SLC cache acts as a temporary write cache but it will shrink as the drive is filled since it's converted to TLC. This is dynamic SLC (different from real/native SLC, and also different than static SLC) which has an additive effect on wear. So let's look at two sources for more information here.

Dynamic Write Acceleration (DWA) is Micron's term for dynamic SLC. On page 5 we see: "Provided conditions occur such that a given piece of user data is written as SLC and is neither trimmed nor rewritten before the later migration to MLC, the additive factor in WAF for that data would be two." (this for 2-bit MLC) Next, ADATA's page on DWPD: "the PE Cycle in the Dynamic SLC zone should be the same as the TLC, and it will perform Wear Leveling with the TLC block; consequently, it is necessary to combine the calculation of TBW." (meaning actual TBW, not hosts) The combination of these factors means you will have higher write amplification and if the manufacturer measures health/used by NAND writes rather than host writes it will be higher than the listed writes.

Considering the high TBW of the E16 drives and the fact they have full-drive SLC caching it would be a pretty poor idea for them to have health based on host writes. So the health/used % might give a better indication of NAND/actual writes even if they may warranty under host writes. By no means does that mean the drive will die at 100% usage or writes.

Most often with SMART the value given for writes (and often TBW/warranty) is host writes! Some vendors measure health by the average block erase count, as well, but that will track closely to the NAND writes value. Host writes will often be given in the logical sector size (512B here) also.

To answer your other questions:

That drive may and likely will survive far more writes than the % or writes suggest. I don't have exact numbers as it depends on WAF/workload. Once a drive starts tapping into spare blocks, and that means any, I would suggest decommissioning it. This is because modern drives wear fairly evenly so once you see one failed block they tend to go down like dominoes.

Also, to give you a different example:

I have drives with static SLC. With static SLC, the SLC zone has its own wearing and TBW separate from the TLC zone as its endurance is usually an order of magnitude higher. However, TLC TBW or host writes doesn't take that into account at all. That falls under health - if you check my 2nd link/source above you'll see this. In other words, listed writes on consumer drives isn't really an accurate indicator. Most often health will be based on P/E or erases which mirrors NAND writes rather than host writes. In the case of a static SLC drive the "real" TBW would be the worse of two separate TBWs, neither of which maps to host writes!

tl;dr - it's not atypical for SSDs to match "TBW" to host writes when actual health or P/E will be based on NAND writes
 
Last edited:
Solution
@Maxxify "they might be calculating TBW ("percentage used") by NAND which is WAF * host. This is not commonly the case" - If TBW was NOT host writes * WAF then WAF would not matter. Where would it play a role if not exactly there? I am confused about the "This is not commonly the case" part.

Read my other reply above this one for full details, but I thought I'd clarify real quick with a separate reply as I realize reading through my other post that it's...lengthy.

Basically, TBW as defined should account for NAND writes, and it's possible % used here refers to that value and/or average erase count (for example). That accounts for write amplification. The problem is that consumer drives tend to take and list the host writes value and many people take that to be "TBW" (including many manufacturers who will warranty by it). This is troublesome with consumer TLC drives because SLC caching can have additive or separate TBW (many drives will list SLC writes separately in fact). So % used is probably more accurate than "data units written."
 
Last edited:
Thank you for the lengthy and proper explanation. I think it is clear now: Seagate's % used acounts for both SLC and TLC wear, and it is the most reliable indicator of health.

Question: "Once a drive starts tapping into spare blocks" is when "available spare" drops to 99%, correct?

Also, when you say "Provided conditions occur such that a given piece of user data is written as SLC and is neither trimmed nor rewritten before the later migration to MLC, the additive factor in WAF for that data would be two." , I have to ask if this means that the fact that I can't fstrim this drive is contributing to the discrepancy between % used and "Data Units Written"?

Separately, I thought (lack of) fstrim wouldn't impact a drive's life time until the drive is getting full, or at least 50% full. This drive is 30% filled. Does not running fstrim impact lifetime for an overprovisioned drive?

PS the partition is 2TB but I only use 500GB
 
I don't know that Seagate is tracking NAND writes with that value (or avg. erase count), just makes sense that they would. I haven't worked with any E12/E16 drives so I don't know the precise SMART values, that is. However, "percentage used" usually means an estimate of the device's life used based on usage and the manufacturer's method of predicting lifespan. Therefore my assumption is they track device health with NAND writes or something that would take write amplification into consideration, which includes average block erase count for example. So this might be TBW or it might not be. What I stated previously is that some devices will go by host writes, however actual TBW should be host writes times the WAF (that is, NAND writes). The reason it's important with this drive is that its SLC cache design encourages higher NAND wear. This differs from drives with no or static SLC because the latter have separate TBW values and the actual is the worse of the two.

Yes, once a single spare block is used ("99%" remaining) the chance of failure goes up drastically. I can illustrate that in one image where a site tested multiple drives with writes and recorded the results: here is the Crucial MX500 which is a reasonably good drive with some dynamic SLC. You'll see that the amount of writes in TB (the x axis) from first to last spare block usage is approximately 928TB to 1067GB, or 15%. This sounds like a lot but you start getting errors with newly-written data. If we check the SU800, which has the same controller but with a large SLC cache, we see it survives far fewer writes and goes to hell pretty quickly at the end.

TRIM, as an ATA Command or with its SCSI analogue UNMAP, will reduce write amplification. If you're not using the full drive its controller/FTL is capable of using extra space for dynamic overprovisioning (which reduces WA, but benefits from TRIM). fstrim from my understanding is just on-demand TRIM with discard being another option. I don't feel fstrim is absolutely necessary as modern controllers/drives are quite good at GC, but not doing it regularly may increase write amplification. What Micron means in their DWA document is that if you write data to SLC that later gets migrated to TLC, before being discarded or rewritten, it increases write amplification. And since dynamic SLC shares a zone with TLC the TRIM/GC/wear-leveling is likewise balanced.

Of course if I'm right about the values your WAF is what, ~1.765? Not sure what you're using the drive for but that's not particularly high.
 
"Of course if I'm right about the values your WAF is what, ~1.765? Not sure what you're using the drive for but that's not particularly high. "

I like reading your stuff, man. It's dense. I just discovered how to calculate WAF. It is approx 0.27/0.15, or 1.8. That's not bad. I guess I am chewing thru this drive but at least there is nothing wrong, the WAF is what it is.

Thanks again for all the clarification. This is my to-go site for HW questions now. You guys rock.
 
That's a predicted WAF since I'm not sure how they are calculating percentage used. It's quite possible they are tracking the average block erase count instead. If they are, then 25% relates to an average erase count of 450 (3600 / 2TB = 1800 P/E, 1800 * .25 = 450 P/E). Since blocks are erased before being rewritten it does given you a general idea of NAND writes so you can still loosely calculate the WAF.

Also, although I categorize the E16-based drives as Prosumer on my SSD buying guides, it's mainly because they fill a specialized niche for sequential performance. You do not want a drive with full-drive SLC caching for steady state workloads because they won't perform as consistently, may perform worse for desired workloads, and additionally will have more wear. Traditionally the E12/E16 drives came with high TBW but that is a value for warranty and should not be misunderstood outside that context.