Question Crucial MX500 500GB sata ssd Remaining Life decreasing fast despite few bytes being written

Page 12 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
[snip]
From what i have seen though, the Health value of these affected drives actually rollovers to 100% and starts counting down again, not ideal since you'd expect 0% health nand to go into read only mode.
That's interesting about Remaining Life rollover from 0% back to 100%. Where have you seen this?

I'm unsure whether it's desirable or undesirable for the ssd to continue to be writeable after rollover, because being able to continue to write to the drive may be a benefit.

Do you have any idea whether the ssd's write amplification algorithm changes after rollover? (Or earlier, when Remaining Life is low.) At that point, continuing to run a wear-leveling routine makes no sense to me and would seem entirely self-destructive. Unless my thinking about this has grown fuzzy since I last thought about it two years ago, I think the only desirable write amplification when Remaining Life is low is the minimal amplification that's necessary: copying (and then erasing) an entire block when the host pc partially rewrites the block's contents.
 

Diceman_2037

Distinguished
Dec 19, 2011
45
2
18,535
0
That's interesting about Remaining Life rollover from 0% back to 100%. Where have you seen this?

I'm unsure whether it's desirable or undesirable for the ssd to continue to be writeable after rollover, because being able to continue to write to the drive may be a benefit.

Do you have any idea whether the ssd's write amplification algorithm changes after rollover? (Or earlier, when Remaining Life is low.) At that point, continuing to run a wear-leveling routine makes no sense to me and would seem entirely self-destructive. Unless my thinking about this has grown fuzzy since I last thought about it two years ago, I think the only desirable write amplification when Remaining Life is low is the minimal amplification that's necessary: copying (and then erasing) an entire block when the host pc partially rewrites the block's contents.
Here
TechPowerUp user posted this



smart data has exceeded 100% and the value has rolled over from 200, which it shouldn't,
it's actually at 165% life time used
 
The Average Block Erase Count is 0x9AB (2475) and the Percentage Life Used is 0xA5 (165%). This means that the rated number of P/E cycles is ...

(100% / 165%) x 2475 = 1500 P/E cycles​

If a 512GiB NAND array is reprogrammed 1500 times, that's a total of ...

1500 x 512GiB = 824 TB​
... and for 2475 times the figure is ...

2475 x 512GiB = 1360 TB​

CrystalDiskInfo is reporting 22.9TB for the Total Host Writes. Has this figure rolled over, too?
 
Last edited:

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
Here
TechPowerUp user posted this
<screencapture omitted>
smart data has exceeded 100% and the value has rolled over from 200, which it shouldn't,
it's actually at 165% life time used
Where do you get the "rolled over from 200" idea? The Raw Values column shows Percent Lifetime Used is A5, which in decimal is 165. The Average Block Erase Count (ABEC) value is 9AB, which in decimal is 2475. As is common knowledge, 100% Lifetime Used corresponds to ABEC = 1500, and the ratio of 2475 to 1500 is 1.65, which is the same as the ratio of 165% to 100%. To me it appears that Lifetime Used and ABEC appear to be counting up normally, with neither having rolled over.

It's unfortunate that the screencapture failed to capture the value of Background Program Page Count. It would be interesting to use it in combination with Host Program Page Count to see the total NAND pages written to the ssd and to calculate the ssd's Write Amplification Factor.

The ssd's Total Host Writes is 22,926 GB (and its Total Host Sector Writes is b31c9b88c, which in decimal is 48,079,943,820). This is so small that it seems a reasonably good bet that the ssd is a victim of the FTL write bug.

It would be interesting to learn how far beyond 165% that ssd reaches before it fails, because if it reaches something like 1000% it might mean the FTL write bug doesn't really exist. The actual bug might be grossly inaccurate counting of several SMART values: ABEC, Lifetime Used, and Background Program Page Count. In other words, perhaps those values can increase without actually writing to the ssd.
 
The ssd's Total Host Writes is 22,926 GB (and its Total Host Sector Writes is b31c9b88c, which in decimal is 48,079,943,820). This is so small that it seems a reasonably good bet that the ssd is a victim of the FTL write bug.

It would be interesting to learn how far beyond 165% that ssd reaches before it fails, because if it reaches something like 1000% it might mean the FTL write bug doesn't really exist. The actual bug might be grossly inaccurate counting of several SMART values: ABEC, Lifetime Used, and Background Program Page Count. In other words, perhaps those values can increase without actually writing to the ssd.
I suspect that the raw values of the various attributes may be affected by roll-over. For example, the real value of Total Host Sector Writes could be 0x10b31c9b88c, which would correspond to a maximum allocation of 40 bits. The total host writes would then be 587 TB.

https://www.google.com/search?client=opera&q=0x10b31c9b88c+x+512+bytes+in+TB

Where do you get the "rolled over from 200" idea?
I think @Diceman_2037 is referring to attribute 0xAD. The normalised value of this attribute usually counts down from 100, but in this case it appears to be counting down from 200.

Here are other MX500 SMART data:

https://www.techpowerup.com/forums/attachments/xxxx-png.217963/
https://i1.wp.com/www.thessdreview.com/wp-content/uploads/2017/12/Crucial-MX500-1TB-CDI.png?resize=686,768&ssl=1
View: https://i.imgur.com/mmUzcBN.jpeg
 
Last edited:

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
<snip>
If a 512GiB NAND array is reprogrammed 1500 times, that's a total of ...
1500 x 512GiB = 824 TB...
<snip>

CrystalDiskInfo is reporting 22.9 TB for the Total Host Writes. Has this figure rolled over, too?
and in a later post:
I suspect that the raw values of the various attributes may be affected by roll-over. For example, the real value of Total Host Sector Writes could be 0x10b31c9b88c, which would correspond to a maximum allocation of 40 bits. The total host writes would then be 587 TB.
An ssd drive can't be written in small chunks like ram can, so to rewrite a single byte in a block requires copying the entire block... write amplification. Also, the wear leveling algorithm uses up erases throughout the lifetime of the ssd in order to try to make every block have a similar number of erases (which I don't understand why it's worth the consumption of erases). And the FTL write bug is very wasteful, relatively speaking, if the host write rate isn't large. Thus most of the "reprogramming" is written by the ssd controller, not by the host pc. That 824 TB calculation is misleading, even without the FTL write bug.

The FTL write bug might explain why the Total Host Writes is only 22.9 TB, much less than the 180 TB endurance specification advertised by Crucial.

The ssd's 2475 Average Block Erase Count (ABEC) implies ABEC incremented at an average rate higher than once per day, because once per day would correspond to 6.8 years in service and Crucial launched the MX500 in 2018. Before I tamed the bug in my ssd using selftests, the rate at which my ssd's ABEC was incrementing appeared to be accelerating... it was nearly once per day, and perhaps it would have gone much higher if the bug hadn't been tamed.

Your conjecture that Total Host Sector Writes rolled over (and thus Total Host Writes too) seems plausible. One way to check it is with the Host Program Page Count, which is f44f0b22 (4,098,820,898 in decimal) and perhaps hasn't rolled over. Total Host Writes and Total Host Sector Writes are about twice my ssd's (11,784 GB and 24,713,381,021) -- but Host Program Page Count is 4,098,820,898 which is much more than twice my ssd's 445,568,819. It's closer to nine times mine, and suggests the actual value of Total Host Writes may be approximately 9 times mine: 9 x 11,784 GB = 108,000 GB (unless Host Program Page Count also rolled over). 108 TB is much larger than 22.9 TB so it suggests rollover of Total Host Sector Writes occurred, but it's much smaller than 587 TB so it suggests the true value of Total Host Sector Writes isn't 10b31c9b88c nor that it's stored in 40 bits. On the other hand, I would expect Crucial would have allocated enough bits to be able to track Total Host Sector Writes beyond the published duration spec of 180 TB, so perhaps Host Program Page Count rolled over too, or isn't a reliable measure of host writing.
 
It seems to me that a programmer would be more likely to use an integral number of bytes to store a variable. Any other arrangement would seem to be convoluted and would complicate the programming.

That said, I've been racking my brain to make sense of attribute 0xAD, but nothing leaps out at me. :-?
 

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
It seems to me that a programmer would be more likely to use an integral number of bytes to store a variable. Any other arrangement would seem to be convoluted and would complicate the programming.

That said, I've been racking my brain to make sense of attribute 0xAD, but nothing leaps out at me. :-?
Yes, all else being equal, a programmer should prefer to allocate an integer multiple of bytes, or an integer multiple of "words" (a word is often 16 bits or 32 bits or 64 bits, depending on the processor's data bus width) to each variable for the sake of speed & simplicity. But all else isn't always equal... in particular, in some small devices storage for variables might be such a scarce resource that bits must not be wasted. I don't know what kind of storage the MX500 uses and whether it's a scarce resource... perhaps a combination of ram for speed plus occasional copying from ram to NAND for nonvolatility?

Regarding attribute AD (Average Block Erase Count, or ABEC), perhaps the uppermost bits is a rollover count, and the lower 10 bits are the average erases after the most recent rollover. This seems unlikely, but if true it might mean 1500 + 427 = 1927. (1500 = one rollover; 427 = 0x1ab.) It seems unlikely for at least two reasons: (1) it would be simpler just to use all the bits for the count of erases, like 2475; and (2) the 165% ratio of 2475 to 1500 matches the 165 Percent Lifetime Used.

Something else to scratch one's brain about is the 191 in the Current and Worst columns for ABEC and Percent Lifetime Used. My ssd lists 89, which corresponds to the formula 100 - Percent Lifetime Used. (100 - 11 = 89.) Using that formula, 100 - 165 is -65, which would look like 191 if one byte of storage is used: 191 = 256 - 65. This seems like additional evidence in support of the theory that ABEC really does equal 2475, even though 2475 is a very high number of erases during only a few years of operation. Perhaps the rate of block erases rapidly increased after Percent Lifetime Used reached 100%, or after some error event associated with an old ssd beginning to fail.

EDIT: Thinking about it some more, I suppose I shouldn't have been surprised by the high ABEC count, 2475. Depending on what the ssd was used for, it could have had a very high rate of host writes. We can presume it really did exceed 100% Lifetime, and 165% is in the same order of magnitude as 100%.
 
Last edited:

Diceman_2037

Distinguished
Dec 19, 2011
45
2
18,535
0
EDIT: Thinking about it some more, I suppose I shouldn't have been surprised by the high ABEC count, 2475. Depending on what the ssd was used for, it could have had a very high rate of host writes. We can presume it really did exceed 100% Lifetime, and 165% is in the same order of magnitude as 100%.
Precisely,

Plus the disk is already demonstrating Erase fails,and reallocated blocks.
 

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
Precisely. Plus the disk is already demonstrating Erase fails, and reallocated blocks.
The ssd's count of the number of reallocated sectors is 0xA (10 decimal), which is tiny compared to the 500GB ssd capacity (assuming the count is supposed to be interpreted literally).

We cannot deduce from the screenshot when the sector failures occurred. They might have occurred a long time before the ssd reached 100% Lifetime Used... perhaps due to manufacturing defects or lightning surges.

I had a hard drive that was new in 2008, which developed a few bad sectors within a year of being placed in service. It never developed any more bad sectors, and I retired it in 2019 when I built my current pc using the MX500 ssd.

EDIT:
Note: It's still unclear how many of the 2475 average erases per block were caused by the FTL write bug or by an overly aggressive wear leveling algorithm, rather than by host writes. If we assume Host Program Page Count didn't roll over, comparison to my ssd's Host Program Page Count and my 11,800 GB Total Host Writes implies the host pc wrote about 108,000 GB (my calculation is in a recent message). If we estimate the portion that had been written when the ssd reached 1500 average erases per block, by dividing 108,000 GB by 1.65, we get about 65 TB, which is much less than the 180 TB endurance spec.
 
Last edited:
I had a hard drive that was new in 2008, which developed a few bad sectors within a year of being placed in service. It never developed any more bad sectors, and I retired it in 2019 when I built my current pc using the MX500 ssd.
Hard drives in those days were more likely to be affected by bad media than bad heads. Today it's the reverse. If you start noticing reallocations today, then it's likely to be a sign of a degrading disc head.
 

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
Hard drives in those days were more likely to be affected by bad media than bad heads. Today it's the reverse. If you start noticing reallocations today, then it's likely to be a sign of a degrading disc head.
Okay, but is that relevant to an ssd, or is it just an interesting side note?
 
I purchased an MX500 1TB SSD earlier this year. I haven't used it much, but now I'll be paying special attention to it. :-(

Here are scans of the PCBs:

http://users.on.net/~fzabkar/SSD/Micron/MX500/

The NAND flash is Micron MT29F2T08EMLEEJ4-QA:E (part marking NY133).

The flash controller is a Silicon Motion SM2259H-AC with a YYWW (Year/Week) date code of 2137 (week 37 of 2021).

The SDRAM is Micron MT41K256M16TW-107:p (part marking D9SHD ).

https://www.micron.com/products/dram/ddr3-sdram/part-catalog/mt41k256m16tw-107
https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/4gb_ddr3l.pdf?rev=c2e67409c8e145f7906967608a95069f

Firmware is M3CR043.

You can decode NAND part numbers here:

https://nand.gq/#/decode

Click the icon in the top right corner to switch between Chinese and English.
 
Last edited:
Sep 8, 2022
22
2
15
0
I purchased an MX500 1TB SSD earlier this year. I haven't used it much, but now I'll be paying special attention to it. :-(

Here are scans of the PCBs:

http://users.on.net/~fzabkar/SSD/Micron/MX500/

The NAND flash is Micron MT29F2T08EMLEEJ4-QA:E (part marking NY133).

The flash controller is a Silicon Motion SM2259H-AC with a YYWW (Year/Week) date code of 2137 (week 37 of 2021).

The SDRAM is Micron MT41K256M16TW-107:p (part marking D9SHD ).

https://www.micron.com/products/dram/ddr3-sdram/part-catalog/mt41k256m16tw-107
https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/4gb_ddr3l.pdf?rev=c2e67409c8e145f7906967608a95069f

Firmware is M3CR043.

You can decode NAND part numbers here:

https://nand.gq/#/decode

Click the icon in the top right corner to switch between Chinese and English.
Well, this is incredibly interesting!

I've been looking for ways to decode some of these part numbers and now I seem to have a way. Most of the results were not surprising, with one big exception. Is DRAM with the D9SHD marking always 512MiB in capacity? If so, it looks like Crucial has started cheaping out even more than I realized. EVERY single MX500 I've seen has D9SHD printed on the DRAM chip(s). The older 1TB models have two of these chips, which would equal 1GiB DRAM, which is what I expected. However, ALL the newer models I've encountered have only one of these chips. I've seen them in 250GB, 500GB, and 2TB drives. That would mean the 250GB drive has more DRAM than expected, the 500GB has the expected amount, and the 1TB+ drives have less than expected. If true, maybe it's time to start lambasting Crucial along with all the other brands (admittedly, nearly all of them) that have done silent downgrades. If I am wrong about what the D9SHD means, please correct me.

On the other hand, the NAND decoder actually seems to bolster my theory that firmware version M3CR023 (and any that can be upgraded to it) have 64-layer NAND, M3CR033=96-layer, and M3CR04x=176-layer. At least, that appears true of all the drives I've encountered. I find it very concerning that there is now a claim that there may even be a QLC variant of the MX500. I was originally going to claim that that would get them in trouble for false advertising but, I combed over Crucial's site and I can't actually find anywhere that they claim the MX500 uses TLC (only 3D NAND). There's actually no claims about DRAM at all. I guess I've been going on what has historically been reported about these drives. It's still incredibly scummy. I looked at Samsung's site and they do specify TLC (well, 3-bit MLC) and specific DRAM amounts for the 860/870 EVO.

If it is confirmed that QLC variants of the MX500 exist, I won't be recommending it anymore. I'm already not super happy with Crucial because of my terrible experience with one of the presumably TLC versions of the BX500. It's the only SSD I've been dissatisfied with, despite having used plenty of even cheaper DRAM-less drives from far less reputable companies. It's getting really hard to find a SATA SSD to recommend these days. Even before this realization, I've seen a few comments about elevated failure rates on recent MX500s. I've seen even more comments about failures with the Samsung 870 EVO. I'm not even sure what's going on with the Western Digital Blue. The well regarded Blue 3D seems to have been replaced with the Blue SA510. I haven't been able to find any professional reviews of that model but the user reviews are terrible. The people that aren't complaining that it failed within a few weeks/months are saying that it's much slower than its predecessor. Those three were considered the top SATA drives. What's left now?

As for the original point of this thread, I've got a couple questions for the people who are experiencing the rapid decrease in lifespan remaining.

  1. What is the use case for the drive? Is it a system drive (if so, what OS) or just a storage drive?
  2. What filesystems are being used on the drive (not counting EFI partitions)?
  3. Is the drive being used in a way where the proportion of reads:writes is unusual? For example, is it a drive where a lot of data was written once and then the drive was mostly just read from?
  4. Is the drive left powered but idle for large periods of time?
I've got a little speculation on the issue but I'd like some more data points before I consider sharing it.
 

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
<snip>
As for the original point of this thread, I've got a couple questions for the people who are experiencing the rapid decrease in lifespan remaining.
  1. What is the use case for the drive? Is it a system drive (if so, what OS) or just a storage drive?
  2. What filesystems are being used on the drive (not counting EFI partitions)?
  3. Is the drive being used in a way where the proportion of reads:writes is unusual? For example, is it a drive where a lot of data was written once and then the drive was mostly just read from?
  4. Is the drive left powered but idle for large periods of time?
I've got a little speculation on the issue but I'd like some more data points before I consider sharing it.
I'll try to answer your questions, even though there are more than a couple:
  1. My MX500 is the Windows 10 system drive C:. My pc also has two internal hard drives that store most of my data and some of my installed programs.
  2. The file system of the C: volume is NTFS.
  3. My pc doesn't write much to the ssd, approximately 0.1 MBytes/second average. (I think most of the writing is by Windows logging.) During the most recent 51 hours (following the most recent pc restart) the pc writes averaged 0.072 MB/s. The pc reads roughly 4 billion sectors every 5 weeks, which is about 0.7 MBytes/second if I didn't make an arithmetic mistake and assuming a sector is 512 bytes.
  4. The pc is powered on essentially all the time... I typically turn it off only for brief hardware maintenance. Every few weeks I also "sleep" the pc for a few seconds, because that power cycles the ssd, which seems to restrain the ssd's excessive FTL writing. I don't know what you mean by "idle," since Windows logs to C: incessantly, I usually have dozens of Firefox tabs open, etc.
Note: My ssd no longer experiences rapid decrease of Remaining Life, due to the nearly nonstop (19.5 minutes of every 20 minutes) ssd selftests that my pc runs. Remaining Life reached 92% on 3/13/2020 and 88% on 11/12/2022, a decrease of 4% over about 2.7 years. That's not a rapid decrease in terms of chronological time, but I expect it will fall short of the 180 TB endurance spec since the 4% RL decrease corresponds to about 5.5 TB written by the pc, which extrapolates to about 140 TB. Before I began the selftests regime, it appeared that the rate of decrease of RL, relative to bytes written by the pc, was accelerating... WAF was roughly 50 during the weeks before I began the selftests regime. WAF has averaged 3.3 under the selftests regime.
 

Lucretia19

Commendable
Feb 5, 2020
163
11
1,595
2
<snip>
I've got a little speculation on the issue but I'd like some more data points before I consider sharing it.
Are you still planning to share your speculation? Considering how much time has elapsed since you requested more data points, it seems unlikely you'll receive additional data in the future. Mine appears to be the only response to your four questions.
 

ASK THE COMMUNITY

TRENDING THREADS