Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Page 16 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

Assuming RL truly depends on bytes written, the decrease in RL is accelerating and something is very wrong. The latest decrease in RL, from 94% to 93%, occurred after writing only 138 GB in 20 days.

(Note 1: After RL reached 95%, I took some steps to reduce "unnecessary" writes to the ssd by moving some frequently written files to a hard drive, for example the Firefox profile folder. That's why only 528 GB have been written to the ssd since Dec 23rd, even though the pc is set to Never Sleep and is always powered on. Note 2: After the pc and ssd were about 2 months old, around September, I changed the pc's power profile so it would Never Sleep. Note 3: The ssd still has a lot of free space; only 111 GB of its 500 GB capacity is occupied. Note 4: Three different software utilities agree on the numbers: Crucial's Storage Executive, HWiNFO64, and CrystalDiskInfo. Note 5: Storage Executive also shows that Total Bytes Written isn't much greater than Total Host Writes, implying write amplification hasn't been a significant factor.)

My understanding is that Remaining Life is supposed to depend on bytes written, but it looks more like the drive reports a value that depends mainly on its powered-on hours. Can someone explain what's happening? Am I misinterpreting the meaning of Remaining Life? Isn't it essentially a synonym for endurance?


Crucial MX500 500GB SSD in desktop pc since summer 2019​
Date​
Remaining Life​
Total Host Writes (GB)​
Host Writes (GB) Since Previous Drop​
12/23/2019​
95%​
5,782​
01/15/2020​
94%​
6,172​
390​
02/04/2020​
93%​
6,310​
138​
 
  • Like
Reactions: demonized

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
lifespan = 73%
18,042 GB written (636 GB written compared to previous 627 GB)
write amplification...........still at 1.75......

Would you please stop posting your monthly updates? Maybe switch to annual updates? I don't think the updates are of use here unless the behavior of your ssd changes.

Also, the format of your data isn't as easy to use as it could be... I recommend you create a spreadsheet in which there's one row of cells per (monthly) record of data. (I use the Calc module of LibreOffice, which is free.) You could add columns that automatically calculate values such as the overall write amplification, the bytes and NAND pages written by the host pc and by the ssd controller during the most recent month, the write amplification during just the most recent month, etc. It's easy to paste multiple rows & columns into a message here with a single copy/paste operation, and doing that annually instead of monthly would probably benefit both you and the rest of us.
 

worstalentscout

Distinguished
Nov 1, 2016
316
9
18,685
Would you please stop posting your monthly updates? Maybe switch to annual updates? I don't think the updates are of use here unless the behavior of your ssd changes.

Also, the format of your data isn't as easy to use as it could be... I recommend you create a spreadsheet in which there's one row of cells per (monthly) record of data. (I use the Calc module of LibreOffice, which is free.) You could add columns that automatically calculate values such as the overall write amplification, the bytes and NAND pages written by the host pc and by the ssd controller during the most recent month, the write amplification during just the most recent month, etc. It's easy to paste multiple rows & columns into a message here with a single copy/paste operation, and doing that annually instead of monthly would probably benefit both you and the rest of us.

roger.................roger...............

my updates were more for myself to take note............:ROFLMAO:
 

Diceman_2037

Distinguished
Dec 19, 2011
56
3
18,535
checked on the 500GB i put in a qosmio for the first time in a long while and theres only 31% of life left lmao

IDAttribute DescriptionThresholdValueWorstDataStatus
01Raw Read Error Rate01001000OK: Always passes
05Reallocated Sector Count101001000OK: Value is normal
09Power-On Hours Count010010024418OK: Always passes
0CPower Cycle Count0100100201OK: Always passes
ABProgram Fail Count01001000OK: Always passes
ACErase Fail Count01001000OK: Always passes
ADWear Leveling Count03939926OK: Always passes
AEUnexpected Power Loss Count010010092OK: Always passes
B4Unused Reserve (Spare) NAND Blocks00045OK: Always passes
B7SATA Interface Downshift01001000OK: Always passes
B8Error Correction Count01001000OK: Always passes
BBReported Uncorrectable Errors01001000OK: Always passes
C2Enclosure Temperature0584442OK: Always passes
C4Re-allocation Event Count01001000OK: Always passes
C5Current Pending Sector Count01001000OK: Always passes
C6SMART Off-line Scan Uncorrectable Error Count01001000OK: Always passes
C7SATA/PCIe CRC Error Count01001001OK: Always passes
CAPercentage Of The Rated Lifetime Used1393961OK: Value is normal
CEWrite Error Rate01001000OK: Always passes
D2Successful RAIN Recovery Count01001000OK: Always passes
F6Total Host Sector Writes01001007.77 TBOK: Always passes
F7Host Program Page Count0100100315412429OK: Always passes
F8FTL Program Page Count010010015513539116OK: Always passes

What i have come to learn, is that the Pending redirect value is flipped during cell refresh cycles, this refresh is performed in order to keep the disk performant, however on the original mx500's with the initial SM2258 controller, this is performed far too frequently leading to excessive wear compared to the SM2259H
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
checked on the 500GB i put in a qosmio for the first time in a long while and theres only 31% of life left lmao

IDAttribute DescriptionThresholdValueWorstDataStatus
09Power-On Hours Count010010024418OK: Always passes
0CPower Cycle Count0100100201OK: Always passes
ADWear Leveling Count03939926OK: Always passes
C5Current Pending Sector Count01001000OK: Always passes
CAPercentage Of The Rated Lifetime Used1393961OK: Value is normal
F6Total Host Sector Writes01001007.77 TBOK: Always passes
F7Host Program Page Count0100100315412429OK: Always passes
F8FTL Program Page Count010010015513539116OK: Always passes

What i have come to learn, is that the Pending redirect value is flipped during cell refresh cycles, this refresh is performed in order to keep the disk performant, however on the original mx500's with the initial SM2258 controller, this is performed far too frequently leading to excessive wear compared to the SM2259H

You mean 39% Remaining Life, not 31%.

It's well known that Current Pending Sector Count briefly changes from 0 to 1 during a write burst caused by the ssd's firmware bug. (Read some of the messages about it that were posted in this thread during the spring of 2020.) According to the changes observed in the FTL Program Page Count attribute, each of those write bursts writes approximately 1 GByte of ssd memory over a 5 seconds period of time, or a small multiple of 1 GByte over a small multiple of 5 seconds. I presume the bug is in the ssd firmware's wear leveling routine or in firmware code that decides when to trigger the wear leveling routine. You appear to be claiming a new detail about it: that Current Pending Sector Count briefly changes to 1 during a "cell refresh cycle." How are you defining "cell refresh cycle" and where did you learn about this correlation?

The ssd selftests regime that I developed (posted in one of the 2020 messages, as an MS-DOS .BAT file to be run nonstop in the background on a Microsoft Windows pc) mitigates the ssd's bug. It greatly reduces the runtime available to the ssd's buggy routine, which is evidently lower priority than the ssd's selftest routine.

For comparison, since my pc began running the selftests .BAT file in late February 2020, my pc has written about 12.5 TB to the MX500 ssd. During this time, the ssd's Average Block Erase Count increased from 118 to 249, which corresponds to a decrease of Remaining Life from 92.13% to 83.4%, a net decrease of 8.73%. Over that period, its Write Amplification Factor has been 3.02. For comparison, your qosmio has written 7.77 TB to your MX500, and its Remaining Life decreased by 61%, with a Write Amplification Factor of 50.18.

As you can see from the comparison, the selftests regime is very effective (for MX500s that have the older controller chip). So if your qosmio runs Windows, you might consider installing the .BAT file. You can set Windows Task Scheduler to start it automatically each time Windows starts. (Task Scheduler could also be set to start it "hidden" so it won't display onscreen or in the Windows taskbar.)

Note: The .BAT expects SMARTCTL.exe to be on the pc. It's available for free as part of SmartMonTools. You'll probably need to edit the first few lines of the .BAT to match your setup, for example the folder where SMARTCTL.exe can be found.

I have another .BAT file that displays the selftest status of the ssd. It executes "SMARTCTL.exe -l selective C:" to let me check whether the ssd is running a selftest. In other words, it lets me verify the hidden selftests .BAT task is running. If I had spare time, I would modify it to parse the SMARTCTL output to automatically check whether the ssd is running a selftest, and not only display the selftest status but also attempt to launch the selftests .BAT task if a selftest isn't running. I would set Windows Task Scheduler to launch it in background soon after Windows Task Scheduler starts the selftests .BAT task.
 

Diceman_2037

Distinguished
Dec 19, 2011
56
3
18,535
You appear to be claiming a new detail about it: that Current Pending Sector Count briefly changes to 1 during a "cell refresh cycle." How are you defining "cell refresh cycle" and where did you learn about this correlation?

The level of the trapped charge drops over time leading to a slower reading of data (as part of it reconstructs from ECC bits). When this occurs the C5 counter flips and refreshes data in blocks of 1GB by copying it to another part of the drive, upon completion the pending redirect counter changes back without incrementing 05.

on the affected parts, the charge either drops faster than it should (phy defect?) or the algorithm used to calculate when a refresh should occur was off by several factors leading to premature refreshes happening and abnormal WAF.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The level of the trapped charge drops over time leading to a slower reading of data (as part of it reconstructs from ECC bits). When this occurs the C5 counter flips and refreshes data in blocks of 1GB by copying it to another part of the drive, upon completion the pending redirect counter changes back without incrementing 05.

on the affected parts, the charge either drops faster than it should (phy defect?) or the algorithm used to calculate when a refresh should occur was off by several factors leading to premature refreshes happening and abnormal WAF.

What is the source of your information? My understanding is that ssd cells retain data for a very long time, especially if the ssd drive is receiving power and hasn't received much wear.

If trouble reading cells is causing the "excessive writes" problem, I would expect Current Pending Sectors Count (C5) to become 1 (or more) during the selftesting, because selftesting should discover weak cells. And then it should stay 1 until the ssd eventually refreshes the cell... which may be as much as 19.5 minutes later, given the selftest timing managed by my .BAT file. But my observation is that C5 becomes 1 only during some of the 30 seconds pauses between selftests.

Also, I wouldn't expect a revised ssd controller chip to affect the rate at which cells lose charge. But the problem appears to be associated only with MX500s that have the older controller chip. It seems a good bet that the cause of the problem is in the old controller chip. In other words, probably a buggy algorithm... and it's unclear why C5 briefly becomes 1.