Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Lucretia19 · Feb 5, 2020

The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

Assuming RL truly depends on bytes written, the decrease in RL is accelerating and something is very wrong. The latest decrease in RL, from 94% to 93%, occurred after writing only 138 GB in 20 days.

(Note 1: After RL reached 95%, I took some steps to reduce "unnecessary" writes to the ssd by moving some frequently written files to a hard drive, for example the Firefox profile folder. That's why only 528 GB have been written to the ssd since Dec 23rd, even though the pc is set to Never Sleep and is always powered on. Note 2: After the pc and ssd were about 2 months old, around September, I changed the pc's power profile so it would Never Sleep. Note 3: The ssd still has a lot of free space; only 111 GB of its 500 GB capacity is occupied. Note 4: Three different software utilities agree on the numbers: Crucial's Storage Executive, HWiNFO64, and CrystalDiskInfo. Note 5: Storage Executive also shows that Total Bytes Written isn't much greater than Total Host Writes, implying write amplification hasn't been a significant factor.)

My understanding is that Remaining Life is supposed to depend on bytes written, but it looks more like the drive reports a value that depends mainly on its powered-on hours. Can someone explain what's happening? Am I misinterpreting the meaning of Remaining Life? Isn't it essentially a synonym for endurance?

�

Crucial MX500 500GB SSD in desktop pc since summer 2019
Date	Remaining Life	Total Host Writes (GB)	Host Writes (GB) Since Previous Drop
12/23/2019	95%	5,782
01/15/2020	94%	6,172	390
02/04/2020	93%	6,310	138

Lucretia19 · Feb 9, 2020

@fzabkar: Regarding whether my MX500 is a fake, all I can say is that I bought it at Amazon from Amazon. The Amazon Orders history says: "Sold by: Amazon.com Services LLC."
Perhaps Crucial purchased some fake controller chips?

Maxxify · Feb 9, 2020

At this stage my advice would be: Resource Monitor, Performance Monitor, etc. See what's engaging the drive. You can track this. As for power on hours: the drive will still go to sleep/idle and I do have some drives that only track active controller time. This inherently doesn't impact WAF though.

In response to fzabkar's comments: drives without compression can never have a WAF lower than 1. Dynamic SLC as used by Micron has an additive value to write amplification between 0 and 2 (for MLC). DRAM-less drives (which the MX500 is not) do have a higher WAF due to committing more NAND writes to update metadata and being less able to defer these changes.

fzabkar · Feb 9, 2020

I still don't see any answer to my observation regarding the WA that is inherent in SLC caching. Perhaps I'm not explaining myself well, or maybe I'm just having a brain fart, or maybe the answer isn't out there?

IIUC, each TLC (triple level cell) stores 3 bits of information whereas an SLC stores a single bit. Ultimately each bit received from the host will be committed to a TLC. However, for reasons of speed, the data are initially written to an SLC cache which consists of TLCs running in SLC mode. That is, the same TLC that would normally store 3 bits now stores only 1. Therefore, instead of directly programming a single cell with those three bits, the SSD has had to program 3+1 cells. This is before we even take garbage collection into account.

If this makes any sense at all (maybe not??), and if every write is buffered in SLC cache, then ISTM that the minimum WAF must be 4:1.

Edit:

Maybe the SSD only utilises SLC cache if the 512MB DRAM buffer becomes full?

Maxxify · Feb 9, 2020

Micron describes the impact of dynamic write acceleration (dynamic SLC) on write amplification (and the lack of impact on GC) here. With TLC it will have an additive factor between 0 and 3. DRAM isn't used as a write cache on modern SSDs.

fzabkar · Feb 9, 2020

Thanks.

I see that the reason for the range from 0 to 3 is that it accounts for those cases where data are TRIM-ed from the SLC cache without being committed to TLC. That makes more sense now. 🙂

Edit:

Or does it?

ISTM that the additive factor should range from 1 to 2, ie the WAF should range from 2 to 3 for MLC NAND, and 3 to 4 for TLC. The additive factor of 0 only applies in cases where there is a sustained workload that bypasses SLC cache.

Dynamic write acceleration may contribute to WAF because data may first be written as SLC and later be rewritten as MLC. The magnitude of the difference in WAF is an additive factor between zero and two, depending on runtime conditions. Provided conditions occur such that a given piece of user data is written as SLC and is neither trimmed nor rewritten before the later migration to MLC, the additive factor in WAF for that data would be two. If the user data was rewritten or trimmed before SLC-to-MLC migration, or if the data was originally written as MLC (as would be the case for sustained workloads), the additive factor for that data would be zero.

Maxxify · Feb 9, 2020

Yes. When it writes out from SLC it's done sequentially (folding - three SLC blocks into one TLC block) which reduces wear, however with dynamic SLC it converts to/from TLC which has less endurance than static SLC (used on many drives). But SLC can defer writes and the mechanism by which you write SLC imposes less wear on the cell structure (coarser voltage pulses), although erases still impact all the underlying TLC blocks with dynamic SLC. So P/E as a metric is a bit loose especially when other factors are taken into consideration (e.g. read disturb, OP, etc.) but I guess we're focused on the WAF here. Typically consumer WAF will be in the 1.5-3.0 range (I usually find it to be 1.5-2.0) and the additive factor of SLC is limited, so there's clearly an issue here (and the MX500 of course has DRAM).

Drives with SLC caching can often bypass the SLC with direct-to-TLC mode which increases wear generally. One reason I suspect power issues as a source is because it will commit to TLC in that case, but the data is temporary by nature (it's a copy of system RAM) so is quickly trimmed. Therefore if you got into a cycle of hibernate-wake (wherein the system can't hibernate properly and immediately awakens), say every fifteen minutes, you're doing a ton of committed writes for no lasting data. Obviously this shouldn't be happening in his case but I'm explaining the reasoning. Related issues would be "versioning" or metadata differential (as on APFS for example).

Lucretia19 · Feb 10, 2020

fzabkar said:
Have you seen this thread?

https://forums.anandtech.com/threads/what-up-with-my-crucial-ssds.2576727/

@fzabkar: I hadn't seen that thread before you mentioned it. Assuming I didn't make a typo, the Crucial M4 ssd that was apparently being killed by BitDefender had a lifetime WAF of 25, and its Remaining Life reached 9% after 24 TB host bytes written (in 3.5 years). That's a very large WAF, and I don't understand how antivirus software could cause a large WAF. The people in that thread called BitDefender buggy because uninstalling it apparently stopped the rapid growth in Average Block Erase Count, but it seems to me a large WAF should be blamed on the ssd, not blamed on software running on the pc. Nevertheless, I could try disabling Comodo for awhile to see what happens.

[I thought I posted this yesterday, but apparently I forgot to click the Post Reply button.]

Lucretia19 · Feb 10, 2020

WAF has been significantly better the last 24 hours (but still poor): 10.97. (See the red column in the log below.) The improvement may be the result of one of the three actions I took yesterday:

Exited CrystalDiskInfo and set it not to autostart with Windows.
Exited HWiNFO and set it not to autostart with Windows.
Shutdown the pc for about 10 seconds, in case the ssd's internal processor had gone rogue after many weeks without a power off/on reset.

I plan to log WAF for another day or two before re-enabling CrystalDiskInfo or HWiNFO. Today's improvement might just be a one-off.

Here's the latest log:

Date	Total Host Writes (GB)	S.M.A.R.T. F7	S.M.A.R.T. F8	WAF = 1 + F8/F7	Total Amplified Writes (GB)	ΔF7 (1 row)	ΔF8 (1 row)	Recent WAF = 1 + ΔF8/ΔF7
02/06/2020	6,323	219,805,860	1,229,734,020	6.59	41,698
02/07/2020	6,329	220,037,004	1,242,628,588	6.65	42,071	231,144	12,894,568	56.79
02/08/2020	6,334	220,297,938	1,252,694,764	6.69	42,351	260,934	10,066,176	39.58
02/09/2020	6,342	220,575,966	1,269,273,190	6.75	42,836	278,028	16,578,426	60.63
02/10/2020	6,351	220,857,490	1,272,080,434	6.76	42,931	281,524	2,807,244	10.97

fzabkar · Feb 10, 2020

Yesterday I examined a dozen SMART reports from different MX500 SSDs covering 3 different firmware versions. The WAF varied from about 1.1 to 15. There were new and old drives in the mix.

Lucretia19 · Feb 10, 2020

@fzabkar: Are those MX500 SMART reports on the web, publicly available? Is 500GB the size of each drive?

fzabkar · Feb 10, 2020

Lucretia19 said:
@fzabkar: Are those MX500 SMART reports on the web, publicly available? Is 500GB the size of each drive?

Yes, they are public. All but one were 500GB. The other was 1TB. It, too, had a large WAF.

Sorry, I should have posted the URLs. :-(

Here are a few of them:

https://www.smartmontools.org/ticket/1227
https://forum.ubuntuusers.de/topic/smartmontools-nerven/
https://debian-facile.org/viewtopic.php?id=23316
https://debian-facile.org/viewtopic.php?id=25700
https://forums.cpanel.net/threads/current-pending-sector-emails.646573/
https://forum.proxmox.com/threads/smartd-false-positive-ssd-currentpendingsector.44579/
https://www.smartmontools.org/ticket/1130

hang-the-9 · Feb 10, 2020

Lucretia19 said:
The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

138

My thing with using computers, is use them, if things fail, replace them. If you said Crucial already worked with you and won't replace the drive you are just spending time and worry needlessly on this. Instead of spending your time on the computer worrying that something will fail, just use the drive and keep backups. When the drive fails, replace it and copy your stuff from the back-ups. If Crucial won't exchange it there you are just worried about stuff you have no control over. It's like buying a car then keeping it in the garage most of the time just measuring how much the tires are wearing down or if the gas mileage has gone down by .3 mpg in the last month or not.

Lucretia19 · Feb 10, 2020

@hang-the-9: You're free to have that "I don't care; I've got plenty of money to buy new parts" attitude, but it's not a one-size-fits-all attitude.

I never said Crucial refused to replace the drive. I haven't asked them yet to replace the drive. I think it's wise to understand the problem before asking them to replace it... in particular, understanding what causes WAF to increase. Otherwise a replacement ssd is reasonably likely to suffer the same fate. Also, the better I understand the problem, the more likely it is that Crucial can be persuaded that warranty replacement is reasonable, and perhaps persuaded that they have a firmware bug they should devote resources to fixing.

Your analogy about keeping a car in the garage seems poor. Perhaps you should have read more of this thread. The problem isn't so much that WAF has averaged about 7 over the ~6 months the ssd has been in service, nor that the Remaining Life decreased to 94% after writing about 6 TBytes; the bigger problem is that WAF increased to about 50 when looking at the ssd's recent behavior. In other words, the decrease of Remaining Life has been accelerating. Writing 390 GB caused RL to decrease from 95% to 94%. Writing 138 GB caused RL to decrease from 94% to 93%. Doing nothing about it seems like a last resort, not an optimal strategy.

USAFRet · Feb 10, 2020

It's not a case of "just buy new"...rather it is a case of replace when this drive dies. Some may die faster than others.
This particular one will die when it is ready to, no matter how much you mess with it, or fret over the specific numbers.

If it does within the warranty period..hey, free drive.
Otherwise, replace when your budget allows.

There is pretty much nothing you can do to cause the WAF to change to your benefit.

hang-the-9 · Feb 11, 2020

Lucretia19 said:
@hang-the-9: You're free to have that "I don't care; I've got plenty of money to buy new parts" attitude, but it's not a one-size-fits-all attitude.

I never said Crucial refused to replace the drive. I haven't asked them yet to replace the drive. I think it's wise to understand the problem before asking them to replace it... in particular, understanding what causes WAF to increase. Otherwise a replacement ssd is reasonably likely to suffer the same fate. Also, the better I understand the problem, the more likely it is that Crucial can be persuaded that warranty replacement is reasonable, and perhaps persuaded that they have a firmware bug they should devote resources to fixing.

Your analogy about keeping a car in the garage seems poor. Perhaps you should have read more of this thread. The problem isn't so much that WAF has averaged about 7 over the ~6 months the ssd has been in service, nor that the Remaining Life decreased to 94% after writing about 6 TBytes; the bigger problem is that WAF increased to about 50 when looking at the ssd's recent behavior. In other words, the decrease of Remaining Life has been accelerating. Writing 390 GB caused RL to decrease from 95% to 94%. Writing 138 GB caused RL to decrease from 94% to 93%. Doing nothing about it seems like a last resort, not an optimal strategy.

You did say that Crucial gave up trying to help which sounds like they are not going to replace the drive. If their support checks what you have and then decides to replace the drive, they will do it, or if you think there is an issue, then start the return process and buy another brand or exchange for the same drive. My point is that just tracking how the drive is doing week by week is a waste of time vs just working with their support to replace it.

Lucretia19 · Feb 11, 2020

USAFRet said:
It's not a case of "just buy new"...rather it is a case of replace when this drive dies. Some may die faster than others. This particular one will die when it is ready to, no matter how much you mess with it, or fret over the specific numbers.

If it does within the warranty period..hey, free drive. Otherwise, replace when your budget allows.

There is pretty much nothing you can do to cause the WAF to change to your benefit.

No, that advice is indeed "just buy new" when the drive dies (if it's past the warranty period). I believe the modern woke response to "replace when your budget allows" is "check your privilege," because you're not considering the consequences if the drive dies when one's budget is tight, nor the loss of productivity if one doesn't have a spare drive immediately available.

Although you say "free drive" if the ssd dies within the warranty period, you're neglecting the value of one's time to deal with the replacement hassle and the loss of productivity during that time. Also, the replacement drive isn't guaranteed to perform better; it seems more likely that it would eventually suffer the same fate, yet not be covered by a fresh warranty.

Please cite the basis for your opinion that nothing can be done to improve WAF. For example, what if the abnormally high WAF is caused by a firmware bug that could be patched by Micron if users bring it to their attention? What if it's a firmware bug that a user could mitigate by periodically shutting off the pc (resetting the ssd's controller)? What if it's a firmware bug that the user could mitigate by, paradoxically, writing more to the ssd? (Note: the user who is reporting the same issue at the anandtech.com forum also has written very little to his MX500 -- the host pc writing only 3961 GB has led to a 93% Remaining Life -- and he's currently checking to see whether the decrease in Remaining Life is accelerating as it has with my MX500.) What if the issue is exascerbated by the pc failing to properly cache writes, so the ssd is receiving small random writes instead of large sequential writes, and the solution is to figure out why the pc isn't caching writes properly? (Note: Crucial/Micron tech support gave up trying to figure out why Momentum Cache won't activate on my pc, after I did the obvious steps their tech support script suggested. They should have promised to escalate the issue to the attention of Micron's software engineers, but instead they suggested I try substituting third party software, and neglected to specify which third party software to try.)

Please stop discouraging other commenters from trying to help understand the cause of the problem.

Lucretia19 · Feb 11, 2020

@hang-the-9: You wrote:

You did say that Crucial gave up trying to help which sounds like they are not going to replace the drive. If their support checks what you have and then decides to replace the drive, they will do it, or if you think there is an issue, then start the return process and buy another brand or exchange for the same drive. My point is that just tracking how the drive is doing week by week is a waste of time. You are also assuming this remaining life statistic is accurate for the drive.

Tech support did not VOLUNTEER to replace the drive, in their most recent email. I don't see why that sounds to you like Crucial refuses to replace the drive. Their email included the standard invitation "let us know if you have further questions." They didn't literally say they gave up and that there's nothing more they will do.

I don't plan to track the drive "week by week" indefinitely, and I'm not "just" tracking the drive. This is a short term effort to understand the problem and hopefully fix it, which is not necessarily a waste of time (and besides, it's interesting to think about). The very low rate of writing to my MX500 and to Charlie98's MX500 may be an important clue, along with the strangely low Power On Hours (991 after more than 5 months when the pc has been set to Never Sleep). There are some theories that can be tested. For example, it's possible that writing more to the ssd will actually slow the decrease in its Remaining Life, for example by greatly reducing the frequency of transitions between low power state and normal state, if each transition causes some writing to NAND.

Regarding whether the Remaining Life stat is accurate... The stat appears to be based entirely on the Average Block Erase Count and the Lifetime Erases rating. I believe the RAW ABEC reports are precise because it's reported by Storage Executive, HWiNFO and CrystalDiskInfo which all agree. The Lifetime Erases rating is just Micron's rating, appears to be about 1500 per block, and hopefully is a conservative estimate.

hang-the-9 · Feb 11, 2020

Lucretia19 said:
No, that advice is indeed "just buy new" when the drive dies (if it's past the warranty period). I believe the modern woke response to "replace when your budget allows" is "check your privilege," because you're not considering the consequences if the drive dies when one's budget is tight, nor the loss of productivity if one doesn't have a spare drive immediately available.

\

Well if someone can't afford to replace a drive when it fails, there is not much they can do when it fails no matter what causes it to fail, some hardware fault or if a dragon eats it. Same thing if anything breaks of if they really want to buy that $40 steak but only have $10. Used to be what you have depends on how much you worked for it, now people just want stuff handed to them and if it's not they come up with terms like "privileged" for those that have better jobs, are smarter, work harder or just lucky and have richer parents. Back in the USSR in the 80s we used to make fun of people by calling them rich, it was bad to have more money than someone else, calling someone rich was actually an insult in USSR. Welcome to the US of Communism LOL

fzabkar · Feb 11, 2020

With regard to SMART attributes, CrystalDiskInfo always reports the correct numbers. Its only point of difference with other tools is in the names that it gives to each attribute. These names are mostly vendor specific, but are not generally publicly documented. The drive does not report attribute names, only attribute IDs.

Each raw value consists of 56 bits. These bits can be arranged as a single number, or as a multi-byte or multi-word data set. CDI reports these bits as is, but other tools attempt to interpret them, often incorrectly. If you understand these raw data, you can often interpret them better than the SMART tools.

Lucretia19 · Feb 11, 2020

@hang-the-9: It seems inappropriate that a "Moderator" would choose to inject his/her personal political opinions into what's supposed to be a tech discussion, but regardless of that faux pas, I'll just say that your thought processes continue to appear broken. (1) It should be obvious that there's a difference between never being able to afford to buy a new drive and not being able to afford to buy a new drive every couple of years. (2) You're neglecting the theme of this thread, which is about what might cause an ssd's write amplification to go crazy. (3) No one in this thread has written anything that implies a desire for free things to be handed out, so your rant is a non sequitur.

If you continue to write off-topic, the appropriate response will be to start treating you like a troll and ignore you.

fzabkar · Feb 11, 2020

Would there be any point in examining the SMART data for other vendors' SSDs which are based on the same SM2258 controller? I guess it would depend on whether each OEM writes their own firmware (I believe Intel does this for one of their own Silicon Motion rebranded controllers).

FWIW, Intel's 540s SSDs use an SM2258 controller. SMART attributes F1h and F9h should allow the WAF to be calculated.

https://www.intel.com.au/content/da...oduct-specifications/ssd-540s-series-spec.pdf

Lucretia19 · Feb 11, 2020

fzabkar said:
Would there be any point in examining the SMART data for other vendors' SSDs which are based on the same SM2258 controller? I guess it would depend on whether each OEM writes their own firmware (I believe Intel does this for one of their own Silicon Motion rebranded controllers).

FWIW, Intel's 540s SSDs use an SM2258 controller. SMART attributes F1h and F9h should allow the WAF to be calculated.

https://www.intel.com.au/content/da...oduct-specifications/ssd-540s-series-spec.pdf

Yes, I think there'd be a point. In particular, I'd be interested in looking at "WAF versus host write rate" to see if very low write rate causes WAF to go crazy. But is such data readily available? Just getting data for MX500 ssds would be a good start, assuming some of the ssds had a very low write rate.

I looked at the SMART data of the 7 MX500 ssds you linked above. Here are my notes:

MX500 model	WAF (approximate)	Power On Hours	Total Host Writes (TB, approximate)
2 TB	unavailable	unavailable	unavailable
1 TB	4.5	1402	5.70
500 GB	1	13	0.04
500 GB	5.5	3468	21.38
250 GB	2	437	1.76
1 TB	13	561	2.33
500 GB	2	85	0.36

The values for the 2 TB drive were unavailable because the SMART column was truncated. Its ABEC was only 3 and its Power Cycles was only 9... a very young drive that probably didn't have meaningful data.

It would probably be wrong to assume those drives maintained a fairly steady host write rate. And I see no way to estimate the percentage of time that the ssd spent in a low power state, in order to calculate the actual hours. So I have no clue what the host write rates were, and maybe nothing relevant can be deduced from those snapshots of SMART data. It may be necessary to see multiple snapshots per drive, taken at known intervals of time.

Another issue when trying to analyze the data is the unknown write cache on the host pc.

fzabkar · Feb 11, 2020

FWIW, here are SMART stats for an Adata SU800, also based on the SM2258 controller:

https://pastebin.com/B6gG9fgy

I suspect that attributes 241 and 245 represent the total data written by the host and FTL in 32MiB increments. Therefore the ratio should be the WAF, in this case about 1.5.

According to the datasheet, the SU800 also uses SLC caching:

https://www.adata.com/upload/downloadfile/Datasheet_SU800_EN_20180503.pdf

Lucretia19 · Feb 12, 2020

@fzabkar: The Adata SU800 also has dram read/write cache.

Below is some evidence that your guesses about SMART 241 and 245 are basically correct, depending on whether you're saying 245 is the total TLC writes or only the portion of TLC writes caused by the FTL background process. If the latter, add 1 to the WAF calculation, to get 2.5.

That ssd wrote about 1 TB during 222 so-called Power On Hours. Unfortunately, I don't see enough stats about that ssd to be helpful. There's no way to know the pc's power-on hours to know whether the ssd frequently transitioned to a low power state, nor the average write rate and how write rate varied, nor how WAF varied with write rate. Its WAF is another data point that suggests WAF should be much lower than mine, but it doesn't help me figure out why mine is so much higher.

Although I couldn't find any Adata document that lists SU800 SMART attributes, I found the following at https://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h?rev=4934#L1934 :

"-v 241,raw48,Host_Writes_32MiB "

"-v 245,raw48,TLC_Writes_32MiB "

Those are for a different Adata ssd (SP550) that has a different SiliconMotion controller (SM2246EN) but they match your guesses. I found the link to that webpage at https://www.smartmontools.org/ticket/954 where a user was requesting inclusion of support for the SU800.

Note: That smartmontools database also says 246 is SLC Writes (for the Adata SP550).

fzabkar · Feb 12, 2020

Regarding your concerns about CrystaDiskInfo's contribution to the WAF, be aware that the SMART and Identify Device data each require only 512 bytes. You can see these raw data packets by invoking CDI's Text Copy function. Therefore CDI's contribution is essentially nothing.

I suggest that you reexamine the SMART data by performing well defined tasks.

Dump the SMART data.

Clone the drive, sector by sector.

Recheck SMART. The data should now show 500GB of additional reads but no additional writes.

Secure erase the drive. If the drive is not hardware encrypted, then this should consume 1 PE/cycle. It should also reset the FTL. Check SMART.

Restore the data from the clone. The SMART data should now show 500GB of additional writes but no additional reads.

If this has the effect of reducing the WAF, then it will offset the penalty of the extra P/E cycle. In any case you will have a better understanding of the SMART numbers.

Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Honorable

Honorable

Distinguished

Illustrious

Distinguished

Illustrious

Distinguished

Honorable

Honorable

Illustrious

Honorable

Illustrious

Titan

Honorable

Titan

Titan

Honorable

Honorable

Titan

Illustrious

Honorable

Illustrious

Honorable

Illustrious

Honorable

Illustrious

Share this page