Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Page 9 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

Assuming RL truly depends on bytes written, the decrease in RL is accelerating and something is very wrong. The latest decrease in RL, from 94% to 93%, occurred after writing only 138 GB in 20 days.

(Note 1: After RL reached 95%, I took some steps to reduce "unnecessary" writes to the ssd by moving some frequently written files to a hard drive, for example the Firefox profile folder. That's why only 528 GB have been written to the ssd since Dec 23rd, even though the pc is set to Never Sleep and is always powered on. Note 2: After the pc and ssd were about 2 months old, around September, I changed the pc's power profile so it would Never Sleep. Note 3: The ssd still has a lot of free space; only 111 GB of its 500 GB capacity is occupied. Note 4: Three different software utilities agree on the numbers: Crucial's Storage Executive, HWiNFO64, and CrystalDiskInfo. Note 5: Storage Executive also shows that Total Bytes Written isn't much greater than Total Host Writes, implying write amplification hasn't been a significant factor.)

My understanding is that Remaining Life is supposed to depend on bytes written, but it looks more like the drive reports a value that depends mainly on its powered-on hours. Can someone explain what's happening? Am I misinterpreting the meaning of Remaining Life? Isn't it essentially a synonym for endurance?


Crucial MX500 500GB SSD in desktop pc since summer 2019​
Date​
Remaining Life​
Total Host Writes (GB)​
Host Writes (GB) Since Previous Drop​
12/23/2019​
95%​
5,782​
01/15/2020​
94%​
6,172​
390​
02/04/2020​
93%​
6,310​
138​
 
  • Like
Reactions: demonized
I'm sure i'll replace them before then,

the 2TB has 921 days on time and the 1TB has 1336 days on time, rather than self tests i have both of them accelerating reads for hdd's using ebooster and both are readily replaceable if needs be.

I'm sure the SLC write cache has something to do with the WAF to a degree, Dynamic Write Acceleration was first introduced with the mx200 but controller and firmware drivers have obviously changed between then and now.

All my ssds are up to date firmware wise, and im about to receive another mx500 (500GB this time) which may or may not be on the newer revision (03x firmware)
when i got a mx500 for a toshiba qosmio about a year ago now, it was still on old stock with the 023 firmware, the WAF for that is not pretty, i might actually make use of your script on that system.

My systems are kept online unless a hardware swap is required or a power outage is going on, the 1TB was early Q2 2017.
 
the 2TB has 921 days on time and the 1TB has 1336 days on time, rather than self tests i have both of them accelerating reads for hdd's using ebooster and both are readily replaceable if needs be.

I'm sure the SLC write cache has something to do with the WAF to a degree, Dynamic Write Acceleration was first introduced with the mx200 but controller and firmware drivers have obviously changed between then and now.

Your description that the two MX500 drives are "accelerating reads for hard drives using eBoostr" is incomplete, because you didn't mention which application(s) are doing so much reading (and/or writing). I'm assuming a well-designed caching program wouldn't keep a large cache extremely busy unless it's being pushed to do so by one or more apps that are reading or writing the drives at a high rate, and I'm assuming the reason why your Power-On Hours values are so high is that the ssds are being kept extremely busy.

I suspect you're right that heavy SLC mode writing is at least partly responsible for the high WAF. But the host wrote only about 20 TB to each of your MX500s over 3 or 4 years, which doesn't seem excessive. So there may be some other, more significant cause of the high WAF.

How much ram is in your system? Maybe it would make sense to invest in more ram so the OS or eBoostr can manage a larger ram cache, instead of investing in ssds to cache hdds.
 
ram is at the max 48GB capable of a westmere processor without using registered dimms,

the new mx500 i ordered has arrived and im watching the FTL program page count increment with no increments of the host program page count.

ID Attribute Description Threshold Value Worst Data Status
09 Power-On Hours Count 0 100 100 0 OK: Always passes
F7 Host Program Page Count 0 100 100 0 OK: Always passes
F8 FTL Program Page Count 0 100 100 196 OK: Always passes


this is one of the newer revisions with the M3CR043 firmware and the smaller box,

I need to perform some write tests to see if theres anything different with the C4/F8 behavior.

Whats the chance that the ssds own internal management is being performed in SLC mode and triggering SLC > MLC bulk rights.
 
ram is at the max 48GB capable of a westmere processor without using registered dimms,

the new mx500 i ordered has arrived and i'm watching the FTL program page count increment with no increments of the host program page count.
ID Attribute Description Threshold Value Worst Data Status
09 Power-On Hours Count 0 100 100 0 OK: Always passes
F7 Host Program Page Count 0 100 100 0 OK: Always passes
F8 FTL Program Page Count 0 100 100 196 OK: Always passes

this is one of the newer revisions with the M3CR043 firmware and the smaller box,

I need to perform some write tests to see if there's anything different with the C4/F8 behavior.

What's the chance that the ssd's own internal management is being performed in SLC mode and triggering SLC>MLC bulk writes?

48 GB sounds like plenty of ram, assuming a large portion of it is being used by Windows or by eBoostr to cache the drives. Is it right for us to assume you're running Windows 10? Have you tried benchmarking the performance of your critical apps with and without the use of the ssds to cache the hard drives (so the ram would be the only cache of the drives)?

That SMART data seems too premature to reach a conclusion about whether F8=196 is excessive. On my MX500, the FTL controller usually writes at least 5,000 pages per hour (while the host usually writes at least 2,000 pages per hour). I think the F7 and F8 values will be more meaningful later when your Power-On Hours is no longer 0. The AD value (Average Block Erase Count) is relevant to endurance too.

Regarding your question about the chance that the ssd's controller uses SLC mode for internal management, I don't know how to estimate the chance. I imagine that would be unnecessarily inefficient, because it would create additional internal management work later, to convert it to TLC mode. (Unless the data remains in SLC mode permanently, which should not trigger any later conversion to TLC.) I imagine most internal management operations would be low background priority. Can you think of any internal management operations that would require high speed writing to non-volatile memory?
 
48 GB sounds like plenty of ram, assuming a large portion of it is being used by Windows or by eBoostr to cache the drives. Is it right for us to assume you're running Windows 10? Have you tried benchmarking the performance of your critical apps with and without the use of the ssds to cache the hard drives (so the ram would be the only cache of the drives)?

my systems are a mix of 7 and 10 for game and hardware compatibility reasons,

I imagine most internal management operations would be low background priority. Can you think of any internal management operations that would require high speed writing to non-volatile memory?

The only reason (and its heavily assumption) that i can think of is that the C4 - F8 correlation would be due to time based slc > mlc migration and that migration is always in 37k page operations regardless if the host has written that many.

A hardware debugger needs to be put to use to qualify the actual reasons, but i think the internal data is encrypted anyway, making the only people capable of confirming or ruling anything out the same people that havent' bothered to look into this issue after 3 years.
 
my systems are a mix of 7 and 10 for game and hardware compatibility reasons,



The only reason (and its heavily assumption) that i can think of is that the C4 - F8 correlation would be due to time based slc > mlc migration and that migration is always in 37k page operations regardless if the host has written that many.

A hardware debugger needs to be put to use to qualify the actual reasons, but i think the internal data is encrypted anyway, making the only people capable of confirming or ruling anything out the same people that havent' bothered to look into this issue after 3 years.

After 7 days, 16 hours, the FTL count has grown to 32920 and has had no host program writes.
 
Hi guys, a me too here. waves

500 GB MX500 put in my laptop sometime in January of this year, it was brand new.

It replaced a 128 gig Samsung 830.

Today I decided to check the SMART data out of curiosity and expected maybe 10 erase cycles at most. The laptop rarely has a browser open, (so not many browser writes), I dont download games or game on it, isnt used for any write intensive tasks, although there is occasional OS backups done to a second partition. However I checked the size of those backups and they do not justify the data I am about to share.

Install time Jan 2021
Power on hours 3228 (134 days)
Total host writes 2039 GB (4x drive capacity)
Erase cycles 100
Lifetime left 94%

Also

F7 78341025
F8 1550715824

Laptop is on 24/7, isn't even using sleep mode, so the drive is been used in the worst case scenario detailed by you guys.

Windows does have hidden power settings that can be adjusted either with PowerSettingsExplorer or by unhiding in the registry so they appear in the control panel applet. I am curious if disabling all storage power saving would mitigate this issue but for now I am probably going to do the selftest trick and gather data, then will compare to what happens when disabling HIPM,
 
Hi guys, a me too here. waves
500 GB MX500 put in my laptop sometime in January of this year, it was brand new.
It replaced a 128 gig Samsung 830.

Today I decided to check the SMART data out of curiosity and expected maybe 10 erase cycles at most. The laptop rarely has a browser open, (so not many browser writes), I dont download games or game on it, isnt used for any write intensive tasks, although there is occasional OS backups done to a second partition. However I checked the size of those backups and they do not justify the data I am about to share.

Install time Jan 2021
Power on hours 3228 (134 days)
Total host writes 2039 GB (4x drive capacity)
Erase cycles 100
Lifetime left 94%

Also
F7 78341025
F8 1550715824

Laptop is on 24/7, isn't even using sleep mode, so the drive is been used in the worst case scenario detailed by you guys.

Windows does have hidden power settings that can be adjusted either with PowerSettingsExplorer or by unhiding in the registry so they appear in the control panel applet. I am curious if disabling all storage power saving would mitigate this issue but for now I am probably going to do the selftest trick and gather data, then will compare to what happens when disabling HIPM,

I agree, your F8 is excessive. (WAF = 20.79)

Your host writing (F7) rate, 2TB over ~8 months, is fairly low... I think typical users write at a much higher rate. A low F7 rate makes the F8 bug much more obvious. (My F7 rate is extremely low: my pc has written only 2.8TB to the ssd in the last ~18 months, because I moved many Windows log files and some apps to a hard drive. My F7 rate is so low that I think the ssd might last longer if the host writes a little more: during the most recent 12 months the host wrote 1.4TB and WAF during this recent period has been 4.07, and during the 6 months before this recent period the host wrote 1.4TB, twice the recent rate, and WAF during those 6 months was 2.48. The increased amplification during the most recent 12 months has more than cancelled the NAND pages savings that began 12 months ago, so I should probably undo some of the host write rate reduction that I implemented 12 months ago... in other words, I should move some Windows logs back to the ssd.)

How are your laptop's relevant Windows and BIOS power settings currently set? Is the reason that the laptop isn't using sleep mode that you're running an app 24/7 that generates frequent activity, resetting the sleep timer? Or did you set the laptop to not go to sleep?

By "hidden" power settings, do you mean settings that aren't available by clicking on "Advanced Settings" in the power settings dialog?

Going off on a tangent... Since you're running your laptop 24/7, I assume it's always connected to the charger (or you connect it often). I'm curious whether you somehow prevent the laptop's battery from always (or often) being fully charged. My understanding is that full charges shorten the lifespan of lithium batteries. Unfortunately, few devices provide a way for the user to limit the charge to something healthier, like 80%. (On my android phone an app alerts me if the battery voltage exceeds 4.15V or drops below 3.7V, to help me manually keep the charge within a healthier range.)
 
  • Like
Reactions: chrysalis
The laptop bios has no options for AHCI power saving. It is an extremely basic bios.

Windows settings disable both sleep and hibernate, they do let the screen go to standby. They disable storage standby.

I have a program called dslstats which every minute will log data to a few tiny txt files for my dsl modem.

The laptop is connected to the charger, and yep my battery is likely not optimal now, but the laptop is rarely used on battery anyway.

Yep the power settings dont show when clicking advanced, a tool called PowerSettingsExplorer will let you see them all. <edit>
It lets you control devsleep as well as HIMP/DIPM.
Windows defaults are if connected to power socket, HIPM in balanced power profile, HIPM+DIPM on power saving profile. Active in any of the performance profiles. There is a "Lowest" setting which also allows devsleep but seems to not be enabled in any of the default profiles.

Sadly there is no hdd to move the logs to but what I have done is disabled some things that just log useless noise in windows, and also shrunk many of the 20meg and 40meg logs down to a smaller size which should reduce the writes from my logs. (did this last night so no affect on the stats I posted).

I am running your selftest script, would you be able to share one that logs the SMART data as well, it would be awesome if you could.

--edit--

I just checked the diskinfo screenshot I made, as luck would have it, the current pending sector count was 1.

It is now 0 though.
 
Last edited:
Chrysalis requested my Logger .bat file. Okay. Below is the ssdSMARTLogger.bat file, which appends a comma-delimited row of ssd SMART data to a log file. Below that is the LOOP.bat file, which calls ssdSMARTLogger.bat on a very precise timing loop. Below that is the setCONSTANTS.bat file, which is called once by ssdSMARTLogger.bat to initialize vars to match my ssd and my folder locations. Be sure to edit setCONSTANTS.bat to match your ssd and folders.

ssdSMARTLogger is more complicated than I expect most people will need. You may need to delete some lines so it won't call some .bat files I haven't included here (which use additional Windows Task Scheduler tasks). Or maybe it will work as is, and if you're running it in a non-hidden window you can just ignore any "missing .bat files" errors. Unfortunately, I don't have time to explain which lines to delete, or how the programs work. ssdSMARTLogger could be much simpler if all you want to log is the raw text output of smartctl.exe. It's complex because it parses the smartctl.exe output, calculates increases (deltas) from the previous log period, and appends the data to the log file as a comma-delimited row suitable for importing into a spreadsheet.

LOOP.bat could be reduced to a few lines of code if you don't mind imprecise timing of the log intervals, that could drift a few seconds per interval. To eliminate drift, it was necessary to periodically check the system time, which also meant compensating for midnight rollovers and the twice-a-year change between daylight savings time and standard time.

I set up two Windows Task Scheduler tasks on my pc. One task runs the logger once per day at 12:01pm, and the other runs the logger every 2 hours. In the "every 2 hours" task, the Action contains the following:
Program: N:\fix_ssd_waf\LOOP.bat
Arguments: 7200 0 N:\fix_ssd_waf\ssdSMARTlogger.bat
Start in: N:\fix_ssd_waf
(Note: N:\fix_ssd_waf is the folder on my hard drive where I store the .bat files.)

In the "daily" task, the Action has 86400 instead of 7200:
Arguments: 86400 0 N:\fix_ssd_waf\ssdSMARTlogger.bat
(There are 86400 seconds in one day.)

Note: In the three blocks of code that follow, tomshardware inserted a line with the word "Code:" at the beginning (at least in the Preview before I submitted this post). That line isn't part of my .bat code, so don't copy it.

Here's ssdSMARTLogger.bat :
Code:
@echo off
rem  Usage: LOOP DesiredLoopDuration 0 ssdSMARTlogger.bat
rem     where the loop duration is specified in seconds
rem  Must be run with Administrator privilege.
rem  CHANGELOG: See bottom of this file.

if [%1]==[INIT] (
   call :init
   if errorlevel 1 EXIT /B 1
)

%SMARTCTL% -x %SSD%>%SNAP%
set Datestamp=!date:~4!
set Timestamp=!time: =!
set nAttribs=7
rem  NOTE: Smartctl can't be relied on to output attributes on same lines
rem        each time, so search lines for the 7 attribute id numbers.
for /F "skip=66 delims=" %%L IN (%SNAP%) do (
   if !nAttribs! GTR 0 (
      set line=%%L
      set Attrib=[!line:~0,3!]
      set AttID=  9
      if !Attrib!==[!AttID!] (
         set /A "PowerOnHours=!line:~61!, nAttribs-=1"
      ) else (
         set AttID= 12
         if !Attrib!==[!AttID!] (
            set /A "PowerCycles=!line:~61!, nAttribs-=1"
         ) else (
            set AttID=173
            if !Attrib!==[!AttID!] (
               set /A "ABEC=!line:~61!, nAttribs-=1"
            ) else (
               set AttID=197
               if !Attrib!==[!AttID!] (
                  rem  Bogus_Current_Pend_Sect seems related to MX500 WAF bug, is 1 during FTL burst writes.
                  set /A "C5=!line:~61!, nAttribs-=1"
               ) else (
                  set AttID=246
                  if !Attrib!==[!AttID!] (
                     rem  Skip 61 chars, then prepend blanks to ensure at least 12 chars,
                     rem  and split into a high leading portion and the last 8 digits:
                     set line=            !line:~61!
                     set /A "F6lo=1!line:~-8!-100000000, F6hi=!line:~0,-8!+0, nAttribs-=1"
                  ) else (
                     set AttID=247
                     if !Attrib!==[!AttID!] (
                        set line=            !line:~61!
                        set /A "F7lo=1!line:~-8!-100000000, F7hi=!line:~0,-8!+0, nAttribs-=1"
                     ) else (
                        set AttID=248
                        if !Attrib!==[!AttID!] (
                           set line=            !line:~61!
                           set /A "F8lo=1!line:~-8!-100000000, F8hi=!line:~0,-8!+0, nAttribs-=1"
)  )  )  )  )  )  )  )  )
if !nAttribs! GTR 0 (
   echo ERROR: !nAttribs! OF 7 SMART ATTRIBUTES NOT FOUND. Probably an unexpected ID#
   echo Not all devices supply the same attributes, and smartctl plays a role too.
   pause
)

rem  Get an 8th attribute, Sectors Read, from the extended section of the "smartctl -x" output.
rem  Note: Crucial MX500 ssd has firmware bug: Sectors Read rolls over at 2048 GB.
for /f "tokens=* delims=" %%L in ('findstr /C:"Logical Sectors Read" %SNAP%') do set "Smart=%%L"
set "Smart=            !Smart:~15,15!"
set /A "Readlo=1!Smart:~-8!-100000000"
set /A "Readhi=!Smart:~0,-8!+0"

rem  Concat leading_digits and end8digits_with_leading_zeros:
set zeroslo=00000000!F6lo!
set F6=!F6hi!!zeroslo:~-8!
set zeroslo=00000000!F7lo!
set F7=!F7hi!!zeroslo:~-8!
set zeroslo=00000000!F8lo!
set F8=!F8hi!!zeroslo:~-8!
set zeroslo=00000000!Readlo!
set SectorsRead=!Readhi!!zeroslo:~-8!

rem  Calculate total host writes in GB rounded down to int, by dividing F6 by 2,097,152:
set /A "F6GB=F6hi*hiFactor+(F6hi*GB_mod+F6lo)/2097152"
rem  Calculate total host reads in GB, taking Rollovers into account:
set /A ReadGB="Readhi*hiFactor+(Readhi*GB_mod+Readlo)/2097152+ReadRollovers*2048"

rem  Each month, start a new log.
set month=!date:~4,2!
if NOT !month!==!prevmonth! (
   set prevmonth=!month!
   set Changed=Y
)
rem  Change the log filename if month or params changed.
if !Changed!==Y (
   set Changed=N
   set datetime=!date:~10,4!.!date:~4,2!.!date:~7,2!-!time:~0,2!!time:~3,2!!time:~6,2!
   set datetime=!datetime: =0!
   set "LOG=%PROGDIR%\Logs\%BATNAME%%BATVER%_!datetime!_[%DESIREDSECONDS%_seconds].LOG"
   echo Date,Time,TotalHostSectorsRd,TotalHostSectorsWr,TotalHostWrGB,TotalHostWrPages,TotalFTLPages,PowerOnHours,ABEC,PowerCycles, WAF, HostReadsMB,HostWritesMB,HostPages,FTLPages,C5 > !LOG!
)

echo %magenta%------------%yellow%
if !prevF6hi!!prevF6lo!==00 (
   rem  First pass.  Display less data, and write header row to log files:
   echo Total Host Sectors Read [!SectorsRead!]
   echo Total Host LBAs Written [!F6!]
   echo Total Host NAND Pages Written [!F7!]
   echo Total FTL NAND Pages Written [!F8!]
   echo !Datestamp!,!Timestamp!,!SectorsRead!,!F6!,!F6GB!,!F7!,!F8!,!PowerOnHours!,!ABEC!,!PowerCycles!,,,,,,!C5!>> !LOG!
) else (
   rem  Calculate changes in SMART counts F6,F7,F8,SectorsRead,etc:
   set /A "deltaF6=100000000*(F6hi-prevF6hi)+F6lo-prevF6lo"
   set /A "deltaF7=100000000*(F7hi-prevF7hi)+F7lo-prevF7lo"
   set /A "deltaF8=100000000*(F8hi-prevF8hi)+F8lo-prevF8lo"

   rem  Convert Host_LBAs_Written to MBytes, with two decimal places:
   rem  Note that 1 LBA is 512 bytes.
   set /A "_Int=deltaF6/2048, _Dec=((100*deltaF6)/2048)%%100"
   rem  Ensure decimal is two digits by prepending leading zeros and taking rightmost digits:
   set _Dec=00!_Dec!
   set dF6MB=!_Int!.!_Dec:~-2!

   rem  Note: ssd firmware bug causes Host Sectors Read rollover at 4,294,967,296‬ (32bit unsigned int)
   if !Readhi! LSS !PrevReadhi! (
      rem  Rollover bug occurred, so compensate by substracting 4,294,967,296 from prev:
      set /A "ReadRollovers+=1"
      for /L %%H in (1,1,4) do (
         rem  Subtract 1,073,741,824 [one fourth of 4,294,967,296] from prev:
         if !prevReadlo! GEQ 73741824 (
            set /A "prevReadhi-=10, prevReadlo-=73741824"
         ) else (
            set /A "prevReadhi-=11, prevReadlo+=(100000000-73741824)"
            rem  It's okay if prevReadhi goes negative.
         )
      )
      set /A ReadGB+=2048"
   )
   rem  Convert Host_Sectors_Read to MBytes, with two decimal places (1 sector = 512 bytes):
   set /A "deltaRead=100000000*(Readhi-prevReadhi)+Readlo-prevReadlo"
   set /A "_Int=deltaRead/2048, _Dec=((3*deltaRead)/64 + deltaRead/512)%%100"
   rem  Ensure decimal is two digits by prepending leading zeros and taking rightmost digits.
   set _Dec=00!_Dec!
   set dReadMB=!_Int!.!_Dec:~-2!

   rem  Calculate WAF with two decimal places, fudge if necessary to avoid divide by zero.
   if !deltaF7! EQU 0  (
      set "Waf=999999.99" && echo Avoided dividing by zero
   ) else (
      set /A "WafInt=1+(deltaF8/deltaF7)"
      set /A "WafDec=((100*deltaF8)/deltaF7)%%100"
      rem  Ensure decimal is two digits by prepending two leading zeros and taking rightmost two digits.
      set WafDec=00!WafDec!
      set WafDec=!WafDec:~-2!
      set Waf=!WafInt!.!WafDec!
   )

   rem  Display data (if bat file is not being run hidden).
   echo WAF=!Waf! dHostRdMB=!dReadMB! dHostWrMB=!dF6MB! dHostPages=!deltaF7! dFTLPages=!deltaF8! C5=!C5!
   echo TotalHostRdGB=!ReadGB! TotaldHostWrGB=!F6GB! TotalHostWrPages=!F7! TotalFTLPages=!F8!
   echo POH=!PowerOnHours! ABEC=!ABEC! PowerCycles=!PowerCycles!

   rem  Append data to log file(s).
   echo !Datestamp!,!Timestamp!,!SectorsRead!,!F6!,!F6GB!,!F7!,!F8!,!PowerOnHours!,!ABEC!,!PowerCycles!, !Waf!, !dReadMB!,!dF6MB!,!deltaF7!,!deltaF8!,!C5!>> !LOG!
   if !WafInt! GTR 9 (
      if !deltaF8! GEQ %LARGEFTL% (
         if NOT exist %FTLLOG% (
            echo Date,Time, WAF, HostReadsMB,HostWritesMB,HostPages, FTLPages, C5> %FTLLOG%
         )
         echo !Datestamp!,!Timestamp!, !Waf!, !dReadMB!,!dF6MB!,!deltaF7!, !deltaF8!, !C5!>> %FTLLOG%
   )  )

   rem  Check whether WAF for long period exceeded threshold, or if ABEC increased, and alert user if so.
   set HighWAF=N
   if %DESIREDSECONDS% GEQ 86400 (
      rem  Alert if daily WAF is at least 3.50:
      if !WafInt! GEQ 3 (
         if !WafInt! GEQ 4 (
            set HighWAF=Y
         ) else (
            if !WafDec! GEQ 50 (
               set HighWAF=Y
   )  )  )  )
   if !HighWAF!==Y  (
      set "_NotifyHeader=The ssd WAF was high."
      set "_NotifyText=!Datestamp! !Timestamp! -- WAF was !WafInt!.!WafDec! during %DESIREDSECONDS% seconds period."
      call :notify_using_PowerShell
      call :logAlertData %WAFALERT%
   )
   if !ABEC! GTR !PrevABEC!  (
      set "_NotifyHeader=The ssd ABEC increased."
      set "_NotifyText=!Datestamp! !Timestamp! -- Average Block Erase Count increased to !ABEC!."
      call :notify_using_PowerShell
      call :logABECIncrease
      call :logAlertData %ABECALERT%
)  )
set prevC5=!C5!
set prevABEC=!ABEC!
set prevF6hi=!F6hi!
set prevF6lo=!F6lo!
set prevF7hi=!F7hi!
set prevF7lo=!F7lo!
set prevF8hi=!F8hi!
set prevF8lo=!F8lo!
set prevReadhi=!Readhi!
set prevReadlo=!Readlo!

EXIT /B 0

:logAlertData
rem  This subroutine is called if ABEC increased or if daily WAF is high.
if not exist %ALERTLOG% (
   echo Date,Time,TotalHostSectorsRd,TotalHostSectorsWr,TotalHostWrGB,TotalHostWrPages,TotalFTLPages,PowerOnHours, ABEC, PowerCycles, WAF, HostReadsMB,HostWritesMB,HostPages,FTLPages,C5 > %ALERTLOG%
)
echo !Datestamp!,!Timestamp!, !ReadSectors!,!F6!,!F6GB!,!F7!,!F8!, !PowerOnHours!, !ABEC!, !PowerCycles!, !Waf!, !dReadMB!,!dF6MB!,!deltaF7!,!deltaF8!,!C5!>>%ALERTLOG%
rem  Create a flag file and assume another .bat will detect the flag and display the data when convenient.
type NUL > "%~1"
EXIT /B 0

:notify_using_PowerShell
rem  This subroutine displays a Windows balloon notification and adds it to Windows' Notification Center.
powershell -Command "&{[reflection.assembly]::loadwithpartialname('System.Windows.Forms'); " ^
   "[reflection.assembly]::loadwithpartialname('System.Drawing'); " ^
   "$notify = new-object system.windows.forms.notifyicon; " ^
   "$notify.icon = [System.Drawing.SystemIcons]::Information; " ^
   "$notify.visible = $true; " ^
   "$notify.showballoontip(10,'%BATNAME%_v%BATVER% -- %_NotifyHeader%','%_NotifyText%',[system.windows.forms.tooltipicon]::None)} "
   set "_NotifyHeader="
   set "_NotifyText="
EXIT /B 0

:logABECIncrease
rem  This subroutine is called if ABEC increased.
if not exist %ABECLOG% (
   echo Date,Time,TotalHostSectorsRd,TotalHostSectorsWr,TotalHostWrGB,TotalHostWrPages,TotalFTLPages,PowerOnHours, ABEC, PowerCycles, WAF, HostReadsMB,HostWritesMB,HostPages,FTLPages,C5 > %ABECLOG%
)
echo !Datestamp!,!Timestamp!, !SectorsRead!,!F6!,!F6GB!,!F7!,!F8!, !PowerOnHours!, !ABEC!, !PowerCycles!, !Waf!, !dReadMB!,!dF6MB!,!deltaF7!,!deltaF8!,!C5!>>%ABECLOG%
EXIT /B 0

:init
set BATNAME=SSD_SMARTLogger
set BATVER=7.7.0
call %~dp0%setCONSTANTS.bat

TITLE %BATNAME%_v%BATVER% [Log SSD SMART data every %DESIREDSECONDS% seconds]
set "SNAP=%TMPDIR%\%BATNAME%%BATVER%_snap_%DESIREDSECONDS%.txt"
for /F "delims=#" %%E in ('"prompt #$E# & for %%E in (1) do rem"') do set "ESCchar=%%E"
set "green=%ESCchar%[92m"
set "yellow=%ESCchar%[93m"
set "magenta=%ESCchar%[95m"
set "cyan=%ESCchar%[96m"
set "white=%ESCchar%[97m"
set "resetcolor=%ESCchar%[0m"
if not exist "%SMARTCTL%" (
   echo %magenta%Aborting: '%SMARTCTL%' not found.%white%
   EXIT /B 1
)

rem  Start a new log file each month.
set prevmonth=0

rem  Embed the start date & time in the LargeFTL log filename and in the ABEC log filename:
set datetime=%date:~10,4%.%date:~4,2%.%date:~7,2%-%time:~0,2%%time:~3,2%%time:~6,2%
set datetime=%datetime: =0%
set "ABECLOG=%PROGDIR%\Logs\%BATNAME%%BATVER%_ABEC_%datetime%_[%DESIREDSECONDS%_second_intervals].LOG"
set "FTLLOG=%PROGDIR%\Logs\%BATNAME%%BATVER%_LargeFTL_%datetime%_[%DESIREDSECONDS%_second_intervals].LOG"
rem  Note: Large FTL write bursts appear to be multiples of about 37000 pages, and
rem     based on experience those do not occur while the ssd is running a selftest.
rem  Write to the LargeFTL log only when FTL Page Writes is "relatively large":
if %DESIREDSECONDS% GEQ 86400 (
   set LARGEFTL=100000
) else (
   if %DESIREDSECONDS% GTR 10 (
      set LARGEFTL=30000
   ) else (
      rem  There are occasional FTL write bursts of about 2400 pages that can
      rem     happen even while the ssd is running a selftest.  Log those too.
      set LARGEFTL=1000
)  )

echo SMARTsnapshotfile [%SNAP%]  AlertLog [%ALERTLOG%]

set prevF6hi=0
set prevF6lo=0
set prevF7hi=0
set prevF7lo=0
set prevF8hi=0
set prevF8lo=0
set prevC5=0
set prevABEC=0
set prevReadhi=0
set prevReadlo=0

rem  Useful constants for converting sectors to GB:
set /A "GB_mod=100000000%%2097152, hiFactor=100000000/(1024*2048)"

set ReadRollovers=0

EXIT /B 0
========================================
CHANGELOG:
  7.7.0 Use PowerShell to "balloon notify" user about high WAF and about increase of ABEC.
        Use new version numbering system (3 numbers separated by periods).
  7.6 Log data to new ABEC file each time ABEC increased if loop period is
         less than one day, so that rows of long term data can be quickly
         pasted to ABEC Increases spreadsheet sheet.
  7.5 Moved definitions of constants to setCONSTANTS.bat
  7.4 Removed version number from .bat filename.
      Increased threshold for WAF Alert from 3 to 3.5.
      Don't create the LargeFTL log file until a large FTL event happens.
      Some code refactoring.
  7.3 Log Large FTL only if WAF is also high, so LargeFTL log will be more meaningful.
TODO:
  Track increases of Total NAND Pages Written (the key to Remaining Life) and alert user when high.
  Add daily check for High WAF to shorter-than-daily logger so a single instance will suffice.
  Save ReadRollovers and date in a file so that data won't be lost, and restore it from the file.

Here's LOOP.bat :
Code:
@echo off
setlocal EnableDelayedExpansion
call :yellow
if [%1]==[] (
   echo USAGE: loop DesiredSecondsPerLoop [LoopCount [progfile [parameters for progfile]]]
   EXIT /B
) else (
   set /A "DESIREDSECONDS=%1"
   rem echo DesiredSecondsPerLoop[%DESIREDSECONDS%]
   set LOOPCOUNT=0
   set "WORKCMD=:simulate"
   set INIT=INIT
)
if [%2]==[] (
   rem echo LoopCount[!LOOPCOUNT!] Workload[!WORKCMD!]
) else (
   set /A "LOOPCOUNT=%2+0"
   rem echo LoopCount[!LOOPCOUNT!]
   if [%3]==[] (
      rem echo Workload[!WORKCMD!]
   ) else (
      set PROGFILE=%3
      rem echo ProgFile [!PROGFILE!]
      if EXIST "!PROGFILE!" (
         set "WORKCMD=!PROGFILE!"
         set "WORKPARMS=%4 %5 %6 %7 %8 %9"
         rem echo Workload[!WORKCMD! !WORKPARMS!]
      ) else (
         echo File %3 not found, aborting.
         EXIT /B
      )
   )
)
if %LOOPCOUNT% EQU 0 (
   rem  Infinite loop if %2 was zero or missing
   echo Will run infinite loop!
   set /A "LOOPSTART=0, LOOPSTEP=0, LOOPEND=0"
   TITLE !WORKCMD! [infinite loop every !DESIREDSECONDS!]
) else (
   echo Will run finite loop.
   set /A "LOOPSTART=1, LOOPSTEP=1, LOOPEND=LOOPCOUNT"
   TITLE !WORKCMD! [!LOOPCOUNT! loops, every !DESIREDSECONDS!]
)

call :green
rem  Since DesiredSeconds might be more than an hour, split loop into
rem     pieces each less than an hour to simplify Daylight Savings tests.
set /A "MAXSECONDSPERLOOP=600"
rem  For more convenient testing, use a small maxloop duration:
rem set /A "MAXSECONDSPERLOOP=30"
rem  There will be zero or more inner loops of maxsecondsperloop duration, plus
rem     a final inner loop that completes the total desired outer loop duration.
set /A "NUMBEROFLOOPS=1+DESIREDSECONDS/MAXSECONDSPERLOOP"
set /A "FINALLOOPSECONDS=DESIREDSECONDS%%MAXSECONDSPERLOOP"
if %FINALLOOPSECONDS% EQU 0 (
   rem Don't let final loop be zero seconds.
   set /A "NUMBEROFLOOPS-=1, FINALLOOPSECONDS=MAXSECONDSPERLOOP"
)
rem echo !time! InnerLoops[%NUMBEROFLOOPS%] FinalInnerLoopSeconds[%FINALLOOPSECONDS%]

rem  Initialize the var Endtime to the starting time, expressed in "seconds after midnight":
for /F "tokens=1-3 delims=:." %%a in ("!time!") do (
   rem Note HH may have leading blank, MM and SS may have leading zero octal confusion.
   set /A "EndTime=3600*%%a+60*(1%%b-100)+1%%c-100"
)
echo !time!  SecondsAfterMidnight[!EndTime!]  STARTING.

FOR /L %%G in (%LOOPSTART%,%LOOPSTEP%,%LOOPEND%) DO (
   call :yellow
   if [!INIT!]==[INIT] (
      call %WORKCMD% INIT %WORKPARMS%
      set INIT=
   ) else (
      call %WORKCMD% %WORKPARMS%
   )
   rem  Now wait long enough so the total elapsed seconds is as desired.
   call :cyan
   for /L %%G in (1,1,%NUMBEROFLOOPS%) do (
      rem  Set EndTime to the desired end of this iteration of the For:
      if %%G LSS %NUMBEROFLOOPS% (
         set /A "EndTime+=MAXSECONDSPERLOOP"
      ) else (
         set /A "EndTime+=FINALLOOPSECONDS"
      )

      rem  To calculate the number of seconds to pause, and to check for
      rem     midnight rollover and for a change to/from Daylight Savings Time,
      rem     we need to know the current time, as seconds after midnight.
      for /F "tokens=1-3 delims=:." %%a in ("!time!") do (
         set /A "CurrentTime=3600*%%a+60*(1%%b-100)+1%%c-100"
      )
      rem  We passed midnight if endtime is much greater than currenttime
      rem     so in that case subtract 24 hours from endtime.
      set /A "TestTime=CurrentTime+43200"
      if !EndTime! GTR !TestTime!  set /A "EndTime-=86400"

      rem  A change to Daylight Savings Time occurred if endtime < currenttime-1800
      rem     so in that case add an hour to endtime
      set /A "TestTime=CurrentTime-1800"
      if !EndTime! LSS !TestTime! set /A "EndTime+=3600"

      rem  A change to Standard Time occurred if endtime>currenttime+3600
      rem     so in that case subtract an hour from endtime
      set /A "TestTime=CurrentTime+3600"
      if !EndTime! GTR !TestTime! set /A "EndTime-=3600"

rem    echo EndTime[!EndTime!]  CurrentTime[!CurrentTime!]
      if !EndTime! GTR !CurrentTime! (
         set /A "SecsToWait=EndTime-CurrentTime"
         echo !time!  [inner loop %%G of %NUMBEROFLOOPS%]  Pausing !SecsToWait! seconds...
         TIMEOUT /t !SecsToWait! /NOBREAK >nul
      ) else (
         if !EndTime! LSS !CurrentTime! (
            call :magenta
            echo !time!  [inner loop %%G of %NUMBEROFLOOPS%]
            echo One or more workloads ran long... skipping pauses until timeline restored.
            call :cyan
         )
      )
   )
)
exit /B

:simulate
rem For testing, simulate a workload that lasts for 0 to 9 seconds:
for /F "tokens=1-4 delims=:." %%a in ("%time%") do (
   set /A "WorkSecs=(1%%d-100)%%10"
)
call :yellow
echo %time%  Simulating workload for approximately !WorkSecs! seconds...
TIMEOUT /t !WorkSecs! /NOBREAK >nul
exit /B

endlocal

:green
echo | set /p="[92m"
exit /B

:cyan
echo | set /p="[96m"
exit /B

:yellow
echo | set /p="[93m"
exit /B

:magenta
echo | set /p="[95m"
exit /B

:resetcolor
echo | set /p="[0m"
exit /B

Here's setCONSTANTS.bat :
Code:
@echo off
rem  This subroutine is called during other .bat scripts' initialization step,
rem  YOU MUST EDIT THE FOLLOWING VARIABLE DEFINITIONS TO MATCH YOUR SYSTEM!

set "TMPDIR=R:"
set "PROGDIR=N:\fix_ssd_waf"
set "SMARTCTL=C:\Smartmontools\smartctl.exe"
set "SSD=/dev/sda"
set /A "SSDSECTORS=1000000000"
set "SELFTESTFLAG=%TMPDIR%\ssdSelftestRunning.txt"

rem  The following are used by the pair of .bat scripts that work together with
rem     the Logger .bat to alert the user if WAF was high or if ABEC increased.
set "ALERTTASK=\Fix_SSD\SSD Alert"
set WAFALERT=%TMPDIR%\WafAlert.txt
set ABECALERT=%TMPDIR%\AbecAlert.txt
set IDLELOG=%TMPDIR%\ssd_IdleChecks.log
set ALERTLOG=%TMPDIR%\SSDAlert.LOG
set ALERTCOPY=%TMPDIR%\SSDAlertCopy.LOG
 
  • Like
Reactions: chrysalis
Thanks I had been manually taking txt dumps with crystal diskinfo daily, and also left it on as it creates graphs, will enable your scripts later.

I ran the self test for 3 days, since then I power cycled and have had laptop up for another 2 days without self test and erase cycles is still on 100.

The ssd is writing a lot though I have tamed many logs, no browser running, temp on ram disk. Still doing a gig of writes every 3-4 hours so about 6 gig a day, windows writes a ton of data.

Process monitor seems to indicate its the registry journals getting pounded, especially software1.log. So even with all this the cycles has not increased since the evasive actions been taken.

Also I adjusted power explorer settings, to not enable DIPM anymore and will later lock it to active to see if the SSD idle mode could be causing problems, I have also disabled slumber.
 
-snip-
I ran the self test for 3 days, since then I power cycled and have had laptop up for another 2 days without self test and erase cycles is still on 100.

The ssd is writing a lot though I have tamed many logs, no browser running, temp on ram disk. Still doing a gig of writes every 3-4 hours so about 6 gig a day, windows writes a ton of data.
-snip-

I explored the graph functionality of CrystalDiskInfo and it seems too limited to be useful. It looks like it can display only one SMART attribute at a time, and the only "write" value that looks correct is Total Host Writes, which I assume Crystal derives from the F6 attribute. F7 and F8 display the constant 100.

So I'll assume that where you wrote "a gig of writes every 3-4 hours" you meant the increase of F6 (which loosely correlates with the increase of F7, NAND pages written by the host) and didn't mean F8 or F7+F8. I think you probably don't mean bursts of writing where it writes a gig in a few seconds, every 3-4 hours.

My logs indicate each increment of the Average Block Erase Count (attribute AD) typically corresponds to an increase of F7+F8 by about 14 million NAND pages. The maximum was about 25 million and the minimum about 10 million. The maximum values were in Feb 2020, when I began logging each increment of ABEC but before I began running the selftests regime to tame the F8 bug. Although larger "increase of F7+F8 per ABEC increment" is better all else being equal, not all else is equal in this case: it took only a day to write each of the large amounts, back in Feb 2020. The large amounts were mostly large increases of F8.

That might be a useful hint about the nature of the F8 bug. I/we should ponder why "F7+F8 NAND pages written per block erase increment" was somewhat larger while the bug was untamed than while the bug is tamed with selftests.

Here's a tip I plan to follow someday: Edit the logger .bat file to change the log filenames' extensions from .LOG to .CSV so that doubleclicking a log file will open it as a spreadsheet (ready for copypasting to an analysis spreadsheet). I've gotten tired of rightclicking the .LOG files, then clicking "Open with..." then clicking "LibreOffice Calc." But I don't want to change the doubleclick association of the .LOG extension from the Notepad app to the spreadsheet app.
 
Last edited:
An update, it seems only 3 days uptime and it triggers again. No cycles for 5 days then 2 in one day.

Some data from the snapshots I took from smart. Note they not done the same time each day, I just check my laptop when I finish work, and hit save in crystal diskinfo.

24 sept host writes 2044GB cycles 100 idle state prevented by self tests
25 sept host writes 2053GB cycles 100 idle state prevented by self tests
26 sept host writes 2059GB cycles 100 idle state prevented by self tests
26 sept (later) host writes 2062GB cycles 100 power cycled
27 sept host writes 2064GB cycles 100 allowed to idle
28 sept host writes 2071GB cycles 100 allowed to idle
29 sept host writes 2076GB cycles 102 allowed to idle

If anyone is curious why this is still writing several gigs a day after I heavily tamed the logs. It seems its related to the registry and its journaling, the registry files are 16-25 meg a piece, and they are been written to several times a minute. I have not found a way to safely tame this, as they are a core part of the OS.
 
Last edited:
An update, it seems only 3 days uptime and it triggers again. No cycles for 5 days then 2 in one day.

Some data from the snapshots I took from smart. Note they not done the same time each day, I just check my laptop when I finish work, and hit save in crystal diskinfo.

24 sept host writes 2044GB cycles 100 idle state prevented by self tests
25 sept host writes 2053GB cycles 100 idle state prevented by self tests
26 sept host writes 2059GB cycles 100 idle state prevented by self tests
26 sept (later) host writes 2062GB cycles 100 power cycled
27 sept host writes 2064GB cycles 100 allowed to idle
28 sept host writes 2071GB cycles 100 allowed to idle
29 sept host writes 2076GB cycles 102 allowed to idle

If anyone is curious why this is still writing several gigs a day after I heavily tamed the logs. It seems its related to the registry and its journaling, the registry files are 16-25 meg a piece, and they are been written to several times a minute. I have not found a way to safely tame this, as they are a core part of the OS.

Are you not logging the SMART F8 attribute (NAND pages written by the ssd's FTL controller)? I think F8, combined with either F7 (NAND pages written by the host pc) or F6 (number of 512-byte sectors written by the host pc), provides the most insight into how badly the amplification bug is behaving, and when.

I assume "cycles 102" on Sept 29 means "Average Block Erase Count = 102." Your ssd's ABEC increase of 2 from Sept 28 to Sept 29 is pretty large for one day. I began daily logging of ABEC on 2/08/2020, and only once has ABEC increased more than 1 in a day: from 114 on 2/19/2020 to 116 on 2/20/2020 (several days before I began experimenting with ssd selftests).

I doubt you can reduce your ssd writing much below the 5 or 6 GB per day you've been observing, assuming you don't relocate frequently written files to a second drive. But I think you need not be concerned, assuming you tame the F8 bug, because at that rate your ssd ought to endure 25 or 30 years (assuming the selftests regime has no as-yet-undiscovered destructive side effects). My pc typically writes about 2 GB per day to the ssd, occasionally 3 or 4 GB, and my analysis spreadsheet is estimating 79 more years before Remaining Life reaches zero if it continues at the same rate as during the last 13 months.

Yes, the Windows Registry is written to often. I didn't find a way to move it to hard drive, and for the sake of speed it's probably best kept on the ssd.

Another occasional cause of large brief host writing is updates of Windows and other apps.

An antivirus scanner can write a lot to the ssd if it's set to scan the contents of compressed archive files (.zip, .rar, etc) and it extracts to ssd before scanning the contents. Comodo does this, but this feature can be disabled.

A cause of small but frequent host writes is apps that cache their data. My web browser Firefox used to cache data every few seconds, but that rate can be reduced by editing its config file (and also you could move it to hard drive if you had one). Windows also logs a lot of data associated with web browsing -- I don't know if it does that because Firefox commands Windows to log it or because Microsoft chooses to log it without being commanded -- and I never discovered a way to move that to hard drive.

Another cause of small but frequent host writes is apps that log their status. The software of my CyberPower UPS uninterruptible power supply used to log status every few seconds, so I moved the app to hard drive.
 
Thanks for the info, in my case I got no a/v writes or UPS writes, I monitored the system with process monitor, and I will say at this point 90% of the writes is the registry, apparently it used to be even worse, Microsoft made an optimisation on the writes in an early build of windows 10.

I have been logging the other values as well.

I have since yesterday forced the SSD state in all power profiles to active, I just want to see if there is a way to tame this thing without the self tests. If that fails, I am out of ideas and will probably resume the self test loop, I am not keen on RMA'ing it now as I heard crucial are likely to send back a refurb, and I expect that would have the same firmware bug.

Of course you are right on a proper working SSD whilst the writes are very wasteful, it would still be absolutely fine for decades.

My 7 year old 850 pro has only 65 erase cycles, and that was even used in a ps4 pro for 1-2 years which auto records game footage. Before that was my OS drive for 5 years. on main PC
 
Thank you so much for this forum thread and the comment at the end of the Anandtech review. I bought this drive and thought it would be superior to the WD Blue 3D drive.

Then I found out about this issue and thankfully I was able to cancel my order and get a WD Blue 3D 2TB instead.

I can't believe they've known about this issue for 1.5 years and have not done anything to fix it. My old Samsung Evo 840 has about 10TB of write and is still at 94% health.

The idea of this drive dying in a short time from what appears to be a firmware bug that performs wear leveling at a ridiculously aggressive level.

I don't know what is wrong with this thing, but I just wanted to thank you for saving me the pain and hassle of buying it only to figure out it has a major flaw that I would end up stuck with.
 
Thank you so much for this forum thread [...]

You're welcome. I hope the problem becomes widely known.

It would be interesting to find out whether Crucial's more recent firmware (or hardware) solves the amplification problem.

Their website provides no way to update my ssd's firmware (or older versions) to their newer version. That seems very unusual, and is a hint that either (1) they've changed the hardware design and newer firmware versions are compatible only with the changed hardware, or (2) the firmware in my ssd has a bug (or "feature") that would cause the update routine to fail.
 
The endurance numbers on these new drives are noticeably lower than they were on older SSDs. I have a 250GB 840EVO that has 14.4TB written and is at 94% health. So it can withstand roughly 1000 P/E cycles. The WD 2TB is rated for 250 P/E cycles based on the 2TB size and 500TBW endurance rating.

It's a little weird that the 1TB drive has a 400TBW value and the 2TB is rated for 500TBW.

I think using the entire drive as SLC and then moving/converting the data to TLC is the primary reason that these newer drives have lower endurance ratings. They are trading endurance for sustained write performance. Even if I wrote 100GB to the drive every day it would still last just under 14 years. So I think endurance is a non-issue for most. I wouldn't mind if it was higher.

When I received the WD SSD I was surprised to find that it had a 5 year warranty. I registered the drive on WDs site and confirmed the 5 year warranty. The drive was just made in July of 2021 so WD must have switched back to a 5 year warranty.

One thing I considered when I thought about ordering the MX500 (again) was that with it now using 96L TLC flash (maybe) and the SM2259 controller instead of the SM2258 that it might no longer have this problem. In the end there just wasn't a way to guarantee I could avoid issues unless I purchased another brand and so I did. I'm just glad I still got a 5 year warranty.

Links to the info about Crucial using 96L Nand and SM2259 controller in MX500 now.

SSD Comparison Spreadsheet (notice the cell to the far right on the MX500 line)

Forum where someone opens up their new MX500 about a year ago

Reddit Thread about 96L TLC (B27A) on the MX500 also about a year old.

It might be worth taking Crucial up on their offer to replace your drive with a new one under warranty if you haven't already. Coming from them it should have the SM2259 controller at least and possibly 96L TLC.
 
The endurance numbers on these new drives are noticeably lower than they were on older SSDs. I have a 250GB 840EVO that has 14.4TB written and is at 94% health. So it can withstand roughly 1000 P/E cycles. The WD 2TB is rated for 250 P/E cycles based on the 2TB size and 500TBW endurance rating.
[...]

Your calculations of erase cycles appear to neglect write amplification. Did you look at the actual Average Block Erase Count in your 250GB ssd that has 94% remaining? Is that SMART attribute available?

Regarding Crucial's offer in Feb 2020 to exchange my ssd, I lacked confidence they would replace it with a new ssd, and they wouldn't ship the replacement until after they received my ssd. I presume nothing has changed to mitigate either of those concerns. It's also possible that new MX500s still have the old bug. I'd also like to let the selftests experiment keep running, for the sake of learning whether it causes any long term negative side effects.
 
I'm going to be installing the mx500 500GB with MCR44 as a boot drive in a few days, its possible that this issue was already resolved with the 2259 MCR33 ssds since i've been looking at a few reddit posts where WAF has held a fairly constant rate on high uptime systems.
 
I'm going to be installing the mx500 500GB with MCR44 as a boot drive in a few days, its possible that this issue was already resolved with the 2259 MCR33 ssds since i've been looking at a few reddit posts where WAF has held a fairly constant rate on high uptime systems.

I don't think "high uptime" alone would necessarily reveal whether the bug persists in the newer MX500 revisions. Much better at exposing the bug is a low rate of host pc writing (SMART attributes F6 and F7). On systems where the host pc has a high rate of writing to the ssd, the bug's share of the NAND pages written by the FTL controller (F8) is smaller relative to normal write amplification.

Another way to detect the bug, at least for the original models, is to log Current Pending Sectors every few seconds and check whether it briefly becomes 1 (which correlates with the 1GB buggy F8 write bursts). This test might not work for the newer revisions because Crucial may have suppressed the Current Pending Sectors behavior in the newer revisions without eliminating the write amplification bug. Many people have complained to Crucial about the frequent pending sectors alerts issued by their SMART monitoring software, and it would be easy to modify the firmware so SMART will continue to output 0 unless the underlying condition persists, a cheat that would prevent the monitoring software from issuing the alerts.

I hope you'll regularly log the relevant SMART data of your new MX500, and will let us know in a few months how its WAF has behaved.
 
My life has been chaos, but will now post the full figures requested for the dates I gave data for. Sadly they are hexdecimal.

24 sept host writes 2044GB cycles 100 idle state prevented by self tests
F7 4AE9AD8
F8 5C79D7A6
25 sept host writes 2053GB cycles 100 idle state prevented by self tests
F7 4B389E9
F8 5C816DAB
26 sept host writes 2059GB cycles 100 idle state prevented by self tests
F7 4B68544
F8 5C8263C5
26 sept (later) host writes 2062GB cycles 100 power cycled
F7 4B85114
F8 5C91D4AE
27 sept host writes 2064GB cycles 100 allowed to idle
F7 4B9677D
F8 5CA3FD48
28 sept host writes 2071GB cycles 100 allowed to idle
F7 4B99A0B
F8 5CB87A13
29 sept host writes 2076GB cycles 102 allowed to idle
F7 4C03846
F8 5E006E92

From this point forward I will be using the stat collection script you provided.
 
My life has been chaos, but will now post the full figures requested for the dates I gave data for. Sadly they are hexdecimal

Below is your 9/24-to-9/29 data in base 10, plus WAF calculations. I entered only the latter of your two 9/26 entries, so each row of the table corresponds to one-ish day. The columns are in the order produced by my logger .bat script, so you'll be able to easily paste it into your log if you wish, for example by pasting it into a spreadsheet and saving it in .csv format. (The logger produces the data for the first ten columns, in .csv format. The spreadsheet into which I paste the logger output calculates the rest of the columns, plus other columns not shown here.)

To see the Daily WAF column (highlighted in blue) you'll likely need to scroll to the right. There's a dramatic increase soon after you stopped the selftests.

I don't know what times of day you recorded the values, and I don't assume they were recorded 24 hours apart. Something to keep in mind is that write amplification lags host writing, so host writing near the end of a period of time is unlikely to be amplified until the next period (or even later, depending on the length of a period).

Date​
Time​
S.M.A.R.T. Total Host Sectors Read​
S.M.A.R.T.
F6
Total Host Sectors Written​
Total Host Writes (GB)​
S.M.A.R.T.
F7​
S.M.A.R.T.
F8​
Power On Hours​
Average Block Erase Count​
Power Cycle Count​
WAF
= 1 +
F8/F7​
ΔF7
1 row​
ΔF8
1 row​
ΔF7+ΔF8,
1 row​
Daily WAF
= 1 +
ΔF8/ΔF7
WAF from 9/24/2021
to row date​
09/24/2021
2,044
78,551,768
1,551,488,934
100
20.75
09/25/2021
2,053
78,875,113
1,551,986,091
100
20.68
323,345​
497,157​
820,502​
2.54
2.54
09/26/2021
2,059
79,188,244
1,553,061,038
100
20.61
313,131​
1,074,947​
1,388,078​
4.43
3.47
09/27/2021
2,064
79,259,517
1,554,251,080
100
20.61
71,273​
1,190,042​
1,261,315​
17.70
4.90
09/28/2021
2,071
79,272,459
1,555,593,747
100
20.62
12,942​
1,342,667​
1,355,609​
104.74
6.70
09/29/2021
2,076
79,706,182
1,577,086,610
102
20.79
433,723​
21,492,863​
21,926,586​
50.55
23.17
 
  • Like
Reactions: chrysalis