Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Page 6 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

Assuming RL truly depends on bytes written, the decrease in RL is accelerating and something is very wrong. The latest decrease in RL, from 94% to 93%, occurred after writing only 138 GB in 20 days.

(Note 1: After RL reached 95%, I took some steps to reduce "unnecessary" writes to the ssd by moving some frequently written files to a hard drive, for example the Firefox profile folder. That's why only 528 GB have been written to the ssd since Dec 23rd, even though the pc is set to Never Sleep and is always powered on. Note 2: After the pc and ssd were about 2 months old, around September, I changed the pc's power profile so it would Never Sleep. Note 3: The ssd still has a lot of free space; only 111 GB of its 500 GB capacity is occupied. Note 4: Three different software utilities agree on the numbers: Crucial's Storage Executive, HWiNFO64, and CrystalDiskInfo. Note 5: Storage Executive also shows that Total Bytes Written isn't much greater than Total Host Writes, implying write amplification hasn't been a significant factor.)

My understanding is that Remaining Life is supposed to depend on bytes written, but it looks more like the drive reports a value that depends mainly on its powered-on hours. Can someone explain what's happening? Am I misinterpreting the meaning of Remaining Life? Isn't it essentially a synonym for endurance?


Crucial MX500 500GB SSD in desktop pc since summer 2019​
Date​
Remaining Life​
Total Host Writes (GB)​
Host Writes (GB) Since Previous Drop​
12/23/2019​
95%​
5,782​
01/15/2020​
94%​
6,172​
390​
02/04/2020​
93%​
6,310​
138​
 
  • Like
Reactions: demonized

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Here's some log data regarding the beneficial effect of Power Cycling on ssd Write Amplification. The data strongly indicates that the more days have elapsed since the ssd was power-cycled off then on, the worse the amplification.

The obvious remedy -- a manual labor nuisance -- is to shutdown the computer more often. It's the reason why I chose to shutdown my pc last night, 21 days since the previous shutdown. (Although a nuisance, shutdowns may have additional benefits, since Windows tends to behave worse the longer it's been running. Windows 10 not so bad as Windows Vista, my previous OS.)

In recent weeks, beginning around the end of August, I've focused more on efforts to reduce the sum of SMART attributes F7 & F8 (247 & 248 in base 10) and stopped focusing on the write amplification F8/F7, since I believe the sum F7+F8 more directly affects ssd Remaining Lifetime. I reduced the rate of host pc writing to the ssd (which reduces the rate at which F7 increases) and this analysis doesn't attempt to control for that change. Of course, the ssd selftests regime has been running throughout to try to keep F8 in check.

The first of the two charts here doesn't separate F7 and F8; it's only about the sum F7+F8. Nevertheless, the beneficial effect of power cycling on amplification is suggested by looking at the rightmost two columns. The rightmost column is the increase of F7+F8 since the most recent power cycle, divided by the number of days since the most recent power cycle. There's a strong pattern: At each power cycle, the number in that column plummets, and then tends to increase each day until the next power cycle.

The log data is for the period from 6/13/2020 (Power Cycle Count reached 108 in the daily log) through today 10/21/2020 (Power Cycle Count reached 118 in the daily log).

After writing the above and pasting the first table (below), I realized I should have analyzed whether the effect on F7+F8 might be mainly an effect on F7, rather than on F8 amplification. I assume an effect on F7 would be due to Windows behaving worse the longer it's been running since a restart. (Note: Not all restarts involve power cycling.) So I created a second table that separates F7 and F8. I see no way to paste more data into the first table, so I pasted the second table below the first. It covers the same range of dates but doesn't include the Date column or some other columns of the first table. It repeats the F7+F8 column and adds a column for F7 and a column for F8. The F7 numbers are the increase of F7 since the most recent power cycle, divided by the days since the most recent power cycle. The F8 numbers are the increase of F8 since the most recent power cycle, divided by the days since the most recent power cycle.

Looking at the second table, it appears that "days since power cycle" does sometimes have a bad effect on F7, but the bad effect on F8 appears stronger and more consistent. I considered deleting the first table but its raw attribute data might be of use to someone.

TABLE 1 of 2:
DateF7 = total NAND pages written by host pcF8 = total NAND pages written by ssd's FTL controllerPower Cycle CountDays since last Power CycleAverage Daily Increase of F7+F8 since last Power Cycle
06/13/2020
262,686,703
1,473,283,999
108
0​
06/14/2020
262,910,985
1,473,494,607
108
1​
434,890​
06/15/2020
263,158,511
1,473,915,964
108
2​
551,887​
06/16/2020
263,409,107
1,474,189,617
108
3​
542,674​
06/17/2020
263,707,360
1,474,463,037
108
4​
549,924​
06/18/2020
264,027,678
1,474,962,512
108
5​
603,898​
06/19/2020
264,393,018
1,475,285,550
108
6​
617,978​
06/20/2020
264,637,602
1,475,570,003
108
7​
605,272​
06/21/2020
264,982,101
1,475,855,429
108
8​
608,354​
06/22/2020
265,731,603
1,476,265,234
108
9​
669,571​
06/23/2020
266,093,539
1,477,233,559
108
10​
735,640​
06/24/2020
266,432,523
1,477,876,771
108
11​
758,054​
06/25/2020
266,736,954
1,478,364,036
108
12​
760,857​
06/26/2020
267,037,173
1,479,090,682
108
13​
781,319​
06/27/2020
267,452,250
1,479,926,339
109
0​
06/28/2020
267,670,029
1,480,075,115
110
0​
06/29/2020
267,831,467
1,480,373,747
110
1​
460,070​
06/30/2020
268,055,495
1,480,843,566
110
2​
576,959​
07/01/2020
268,326,984
1,481,210,089
110
3​
597,310​
07/02/2020
268,508,455
1,481,530,249
110
4​
573,390​
07/03/2020
268,775,459
1,481,904,177
110
5​
586,898​
07/04/2020
269,097,697
1,482,223,281
110
6​
595,972​
07/05/2020
269,547,282
1,482,570,922
110
7​
624,723​
07/06/2020
269,865,012
1,483,031,681
110
8​
643,944​
07/07/2020
270,135,115
1,483,464,707
110
9​
650,520​
07/08/2020
270,363,723
1,484,133,984
110
10​
675,256​
07/09/2020
270,771,455
1,484,669,872
110
11​
699,653​
07/10/2020
271,198,165
1,485,203,829
110
12​
721,404​
07/11/2020
271,431,890
1,485,821,233
110
13​
731,383​
07/12/2020
271,707,066
1,486,451,312
110
14​
743,802​
07/13/2020
271,997,906
1,486,834,809
110
15​
739,171​
07/14/2020
272,324,624
1,487,297,232
110
16​
742,295​
07/15/2020
273,153,092
1,488,206,919
110
17​
800,875​
07/16/2020
273,416,549
1,488,853,892
110
18​
806,961​
07/17/2020
273,817,659
1,489,272,204
110
19​
807,617​
07/18/2020
274,050,476
1,489,652,733
110
20​
797,903​
07/19/2020
274,283,713
1,490,015,309
110
21​
788,280​
07/20/2020
274,530,619
1,490,354,959
110
22​
779,111​
07/21/2020
275,430,443
1,490,818,821
110
23​
804,527​
07/22/2020
275,682,753
1,491,351,610
110
24​
803,717​
07/23/2020
275,912,458
1,491,814,714
110
25​
799,281​
07/24/2020
276,163,396
1,492,417,946
110
26​
801,392​
07/25/2020
276,449,163
1,493,190,585
111
0​
07/26/2020
276,681,393
1,493,390,752
111
1​
432,397​
07/27/2020
276,913,201
1,493,709,105
111
2​
491,279​
07/28/2020
277,166,220
1,494,108,564
111
3​
545,012​
07/29/2020
277,403,763
1,494,384,251
111
4​
537,067​
07/30/2020
277,643,532
1,494,779,839
111
5​
556,725​
07/31/2020
277,941,002
1,495,192,802
111
6​
582,343​
08/01/2020
278,191,969
1,495,544,822
111
7​
585,292​
08/02/2020
278,567,870
1,495,949,954
111
8​
609,760​
08/03/2020
278,742,032
1,496,598,242
111
9​
633,392​
08/04/2020
278,908,331
1,497,170,767
111
10​
643,935​
08/05/2020
279,068,589
1,497,806,637
111
11​
657,771​
08/06/2020
279,231,233
1,498,262,587
111
12​
654,506​
08/07/2020
279,606,834
1,498,876,719
111
13​
680,293​
08/08/2020
279,848,556
1,499,294,138
111
14​
678,782​
08/09/2020
280,146,248
1,499,766,181
111
15​
684,845​
08/10/2020
280,361,128
1,500,068,175
111
16​
674,347​
08/11/2020
280,588,594
1,500,447,345
111
17​
670,364​
08/12/2020
281,072,906
1,500,913,119
111
18​
685,904​
08/13/2020
281,440,082
1,501,327,447
111
19​
690,936​
08/14/2020
281,833,461
1,501,764,591
111
20​
697,915​
08/15/2020
282,062,167
1,502,242,593
111
21​
698,334​
08/16/2020
282,303,454
1,503,058,695
111
22​
714,655​
08/17/2020
282,433,106
1,503,615,182
111
23​
713,415​
08/18/2020
282,577,538
1,504,068,079
111
24​
708,578​
08/19/2020
283,112,821
1,504,520,466
111
25​
719,742​
08/20/2020
283,293,751
1,505,210,866
111
26​
725,572​
08/21/2020
283,519,047
1,505,901,288
111
27​
732,614​
08/22/2020
283,744,498
1,506,447,632
111
28​
734,014​
08/23/2020
283,953,592
1,506,963,532
111
29​
733,703​
08/24/2020
284,140,881
1,507,622,138
111
30​
737,442​
08/25/2020
284,375,335
1,508,317,499
111
31​
743,648​
08/26/2020
284,616,616
1,508,900,885
111
32​
746,180​
08/27/2020
284,870,529
1,509,449,189
111
33​
747,878​
08/28/2020
285,127,137
1,510,497,763
111
34​
764,269​
08/29/2020
285,419,643
1,511,513,502
112
0​
08/30/2020
285,602,557
1,511,772,947
113
0​
08/31/2020
285,755,692
1,512,116,302
113
1​
496,490​
09/01/2020
285,912,183
1,512,479,611
113
2​
508,145​
09/02/2020
286,116,449
1,512,697,903
113
3​
479,616​
09/03/2020
286,241,394
1,512,978,477
113
4​
461,092​
09/04/2020
286,405,230
1,513,353,713
115
0​
09/05/2020
286,507,048
1,513,622,384
115
1​
370,489​
09/06/2020
286,613,571
1,513,899,954
115
2​
377,291​
09/07/2020
286,775,593
1,514,091,439
115
3​
369,363​
09/08/2020
286,962,315
1,514,532,938
115
4​
434,078​
09/09/2020
287,094,637
1,514,739,354
115
5​
415,010​
09/10/2020
287,176,773
1,515,458,638
115
6​
479,411​
09/11/2020
287,307,151
1,516,474,060
115
7​
574,610​
09/12/2020
287,377,354
1,516,949,786
115
8​
571,025​
09/13/2020
287,473,179
1,517,354,319
115
9​
563,173​
09/14/2020
287,557,148
1,517,635,436
115
10​
543,364​
09/15/2020
287,675,321
1,518,207,113
115
11​
556,681​
09/16/2020
288,306,718
1,518,672,448
115
12​
601,685​
09/17/2020
289,202,742
1,518,971,066
115
13​
647,297​
09/18/2020
289,271,628
1,519,596,805
115
14​
650,678​
09/19/2020
289,348,814
1,520,074,910
115
15​
644,319​
09/20/2020
289,420,533
1,520,720,060
115
16​
648,853​
09/21/2020
289,573,467
1,521,770,541
115
17​
681,474​
09/22/2020
289,651,773
1,523,259,467
115
18​
730,683​
09/23/2020
289,751,249
1,524,816,490
115
19​
779,410​
09/24/2020
289,828,172
1,525,663,097
115
20​
786,616​
09/25/2020
289,921,600
1,526,606,212
115
21​
798,518​
09/26/2020
290,006,153
1,527,946,681
116
0​
09/27/2020
290,313,118
1,528,168,257
116
1​
528,541​
09/28/2020
290,387,065
1,528,563,122
116
2​
498,677​
09/29/2020
290,511,492
1,528,894,695
117
0​
09/30/2020
290,572,303
1,529,066,586
117
1​
232,702​
10/01/2020
290,641,215
1,529,353,854
117
2​
294,441​
10/02/2020
290,700,175
1,529,731,168
117
3​
341,719​
10/03/2020
290,816,455
1,530,217,438
117
4​
406,927​
10/04/2020
290,882,910
1,532,024,503
117
5​
700,245​
10/05/2020
290,946,886
1,532,492,621
117
6​
672,220​
10/06/2020
291,041,878
1,532,824,408
117
7​
637,157​
10/07/2020
291,089,075
1,533,310,821
117
8​
624,214​
10/08/2020
291,228,154
1,533,731,083
117
9​
617,006​
10/09/2020
291,329,136
1,534,273,830
117
10​
619,678​
10/10/2020
291,490,268
1,535,545,502
117
11​
693,598​
10/11/2020
291,635,420
1,536,339,745
117
12​
714,082​
10/12/2020
291,750,877
1,537,376,552
117
13​
747,788​
10/13/2020
291,821,329
1,538,213,497
117
14​
759,189​
10/14/2020
291,875,046
1,538,802,787
117
15​
751,443​
10/15/2020
291,937,972
1,539,852,616
117
16​
774,025​
10/16/2020
292,013,400
1,541,264,724
117
17​
815,996​
10/17/2020
293,032,409
1,542,192,439
117
18​
878,815​
10/18/2020
293,101,812
1,543,189,726
117
19​
888,703​
10/19/2020
293,158,338
1,544,286,934
117
20​
901,954​
10/20/2020
293,218,796
1,545,957,965
117
21​
941,456​
10/21/2020
293,341,664
1,546,632,948
118
0​

TABLE 2 of 2:
Days since last Power CycleAverage Daily Increase of F7+F8 since last Power CycleAverage Daily Increase of F7 since last Power CycleAverage Daily Increase of F8 since last Power Cycle
0​
1​
434,890​
224,282​
210,608​
2​
551,887​
235,904​
315,983​
3​
542,674​
240,801​
301,873​
4​
549,924​
255,164​
294,760​
5​
603,898​
268,195​
335,703​
6​
617,978​
284,386​
333,592​
7​
605,272​
278,700​
326,572​
8​
608,354​
286,925​
321,429​
9​
669,571​
338,322​
331,248​
10​
735,640​
340,684​
394,956​
11​
758,054​
340,529​
417,525​
12​
760,857​
337,521​
423,336​
13​
781,319​
334,652​
446,668​
0​
0​
1​
460,070​
161,438​
298,632​
2​
576,959​
192,733​
384,226​
3​
597,310​
218,985​
378,325​
4​
573,390​
209,607​
363,784​
5​
586,898​
221,086​
365,812​
6​
595,972​
237,945​
358,028​
7​
624,723​
268,179​
356,544​
8​
643,944​
274,373​
369,571​
9​
650,520​
273,898​
376,621​
10​
675,256​
269,369​
405,887​
11​
699,653​
281,948​
417,705​
12​
721,404​
294,011​
427,393​
13​
731,383​
289,374​
442,009​
14​
743,802​
288,360​
455,443​
15​
739,171​
288,525​
450,646​
16​
742,295​
290,912​
451,382​
17​
800,875​
322,533​
478,341​
18​
806,961​
319,251​
487,710​
19​
807,617​
323,559​
484,057​
20​
797,903​
319,022​
478,881​
21​
788,280​
314,937​
473,343​
22​
779,111​
311,845​
467,266​
23​
804,527​
337,409​
467,118​
24​
803,717​
333,864​
469,854​
25​
799,281​
329,697​
469,584​
26​
801,392​
326,668​
474,724​
0​
1​
432,397​
232,230​
200,167​
2​
491,279​
232,019​
259,260​
3​
545,012​
239,019​
305,993​
4​
537,067​
238,650​
298,417​
5​
556,725​
238,874​
317,851​
6​
582,343​
248,640​
333,703​
7​
585,292​
248,972​
336,320​
8​
609,760​
264,838​
344,921​
9​
633,392​
254,763​
378,629​
10​
643,935​
245,917​
398,018​
11​
657,771​
238,130​
419,641​
12​
654,506​
231,839​
422,667​
13​
680,293​
242,898​
437,395​
14​
678,782​
242,814​
435,968​
15​
684,845​
246,472​
438,373​
16​
674,347​
244,498​
429,849​
17​
670,364​
243,496​
426,868​
18​
685,904​
256,875​
429,030​
19​
690,936​
262,680​
428,256​
20​
697,915​
269,215​
428,700​
21​
698,334​
267,286​
431,048​
22​
714,655​
266,104​
448,550​
23​
713,415​
260,171​
453,243​
24​
708,578​
255,349​
453,229​
25​
719,742​
266,546​
453,195​
26​
725,572​
263,253​
462,319​
27​
732,614​
261,848​
470,767​
28​
734,014​
260,548​
473,466​
29​
733,703​
258,773​
474,929​
30​
737,442​
256,391​
481,052​
31​
743,648​
255,683​
487,965​
32​
746,180​
255,233​
490,947​
33​
747,878​
255,193​
492,685​
34​
764,269​
255,235​
509,035​
0​
0​
1​
496,490​
153,135​
343,355​
2​
508,145​
154,813​
353,332​
3​
479,616​
171,297​
308,319​
4​
461,092​
159,709​
301,383​
0​
1​
370,489​
101,818​
268,671​
2​
377,291​
104,171​
273,121​
3​
369,363​
123,454​
245,909​
4​
434,078​
139,271​
294,806​
5​
415,010​
137,881​
277,128​
6​
479,411​
128,591​
350,821​
7​
574,610​
128,846​
445,764​
8​
571,025​
121,516​
449,509​
9​
563,173​
118,661​
444,512​
10​
543,364​
115,192​
428,172​
11​
556,681​
115,463​
441,218​
12​
601,685​
158,457​
443,228​
13​
647,297​
215,193​
432,104​
14​
650,678​
204,743​
445,935​
15​
644,319​
196,239​
448,080​
16​
648,853​
188,456​
460,397​
17​
681,474​
186,367​
495,108​
18​
730,683​
180,364​
550,320​
19​
779,410​
176,106​
603,304​
20​
786,616​
171,147​
615,469​
21​
798,518​
167,446​
631,071​
0​
1​
528,541​
306,965​
221,576​
2​
498,677​
190,456​
308,221​
0​
1​
232,702​
60,811​
171,891​
2​
294,441​
64,862​
229,580​
3​
341,719​
62,894​
278,824​
4​
406,927​
76,241​
330,686​
5​
700,245​
74,284​
625,962​
6​
672,220​
72,566​
599,654​
7​
637,157​
75,769​
561,388​
8​
624,214​
72,198​
552,016​
9​
617,006​
79,629​
537,376​
10​
619,678​
81,764​
537,914​
11​
693,598​
88,980​
604,619​
12​
714,082​
93,661​
620,421​
13​
747,788​
95,337​
652,451​
14​
759,189​
93,560​
665,629​
15​
751,443​
90,904​
660,539​
16​
774,025​
89,155​
684,870​
17​
815,996​
88,348​
727,649​
18​
878,815​
140,051​
738,764​
19​
888,703​
136,333​
752,370​
20​
901,954​
132,342​
769,612​
21​
941,456​
128,919​
812,537​
0​
 
Oct 19, 2020
4
0
10
I think you should bite the bullet and do a secure erase. Chances are your ssd will recover from the buggy state it's in and operate normally and you won't have to run ridiculous workarounds anymore.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
I think you should bite the bullet and do a secure erase. Chances are your ssd will recover from the buggy state it's in and operate normally and you won't have to run ridiculous workarounds anymore.

I assume that where you wrote "chances are" you mean you think the probability exceeds 1/2. How have you calculated or estimated the probability? What sources of information or personal experiences lead you to believe a secure erase is likely to help? In my conversations with Crucial's tech support in January/February, they never suggested trying a secure erase.

Do you have any insight into the kind of problem that a secure erase could fix, yet is also tamed by selftests?

I'm not the only person with this excessive write amplification problem. On my ssd, the occasional write bursts by the ssd's FTL controller correlate perfectly with the well-known Crucial ssd "Bogus Current Pending Sectors" bug. Assuming the bursts correlate perfectly also on the ssds of the many people who suffer from the "Bogus Current Pending Sectors" bug, the excessive write amplification is a very wide-spread phenomenon, which makes me doubt whether a secure erase would solve it.

Because the write amplification got worse over time, my hunch is that a secure erase, if it does help, would only help for a few months. The ssd was new when installed at the end of July 2019. I didn't log SMART data in 2019 -- I was only alerted each time Remaining Life decreased -- so my records aren't conclusive, but I noticed the decreases of Remaining Life accelerated, and since my usage of the computer hadn't radically changed it's reasonable to assume the acceleration was due to increasing write amplification. I reached that conclusion retrospectively after I logged more and learned more. It's consistent with the increasing amplification that I logged beginning 1/15/2020; by then Overall WAF had reached 5.5 and daily WAF was probably an order of magnitude higher. (By "overall" I mean it includes the ssd's entire history: 1 + F8/F7. Not a 1 + ΔF8/ΔF7 snapshot where the deltas are measured over a short period of time, like daily.) The high daily WAF caused Overall WAF to keep climbing and it peaked at 7.26 on 2/23/2020, when I began experimenting with selftests. I think the triggering event that caused the amplification to skyrocket was in late December 2019 when I significantly reduced the writing by the pc (to about 80 kBytes/second average, by moving or symlinking some "hot" folders to my hard drive).

The SMART data from a few other Crucial users suggests write amplification is worse for users whose computers don't write heavily to the ssd. In other words, the lower the rate of writing by the pc, the more frequent and larger are the write bursts by the ssd's FTL controller. It's almost as if the ssd has firmware that was designed to make it fail soon after the warranty expires even if it isn't used much.

If my hunch that a secure erase would be at most a temporary fix is correct, then its manual labor, downtime and risk sounds more ridiculous than automated selftests.

On the other hand, the risk associated with nearly nonstop selftests is unknown. I speculate that the 30 seconds pause between selftests allows the FTL controller's low priority background processes enough runtime to maintain the health of the ssd, and it would be nice to learn whether this is true. It will remain unknown if no one is willing to continue the selftests regime for many years.

The selftests are either a "ridiculous workaround" as you say, or a sensible workaround of a ridiculous bug.
 
Oct 19, 2020
4
0
10
If my hunch that a secure erase would be at most a temporary fix is correct, then its manual labor, downtime and risk sounds more ridiculous than automated selftests.

I disagree. You'll have to copy 500GB of data back and forth once. It's not a hard job as long as you have enough empty space around.

Secure erase will be resetting FTL and all the other internal data structures firmware is using. If your SSD has somehow tainted internal data causing it to go berserk a secure erase has a good chance of fixing that. (preferably coupled with a firmware update).

If that broken internal data was created by an early firmware version, and if by any chance newer firmware versions won't create such invalid internal data the bug won't relapse again.

If secure erase fixes the issue it'll confirm the broken FTL theory.

If it reoccurs shortly after it'll prove it wasn't fixed in later firmware versions.

If it doesn't change anything at all it'll prove the problem is not there but somewhere else.

In any case you'll have a valuable information about probable causes and use cases causing it. You'll have more data points to provide to Crucial if they got interested in the issue.

I'm not the only person with this excessive write amplification problem. On my ssd, the occasional write bursts by the ssd's FTL controller correlate perfectly with the well-known Crucial ssd "Bogus Current Pending Sectors" bug.

I'm aware you're not the only person with WAF problem, but we're not seeing people screaming over rooftops about rapidly decreasing health on MX500 drives, either. So it must be still a rarely occuring problem which is why I'm suggesting a secure erase.

Also beware that correlation between write bursts and bogus current_pending_sector might be only one way, ie. bogus Current_Pending_Sector might be occuring on more drives - maybe even on all drives (ie. not necessarily only on the drives plagued with the WAF bug)
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
I disagree. You'll have to copy 500GB of data back and forth once. It's not a hard job as long as you have enough empty space around.

What I meant is that if the effect of a secure erase is only temporary, it would not be a one-time task; it would need to be repeated, perhaps every few months.

Secure erase will be resetting FTL and all the other internal data structures firmware is using. If your SSD has somehow tainted internal data causing it to go berserk, a secure erase has a good chance of fixing that. (preferably coupled with a firmware update). If that broken internal data was created by an early firmware version, and if by any chance newer firmware versions won't create such invalid internal data the bug won't relapse again.

The ssd had the latest firmware when it was new and no update was available in February. There may be a new version now, but Crucial's website says no update is available. It looks like they have a problem figuring out how to allow users to overwrite the drive's existing firmware. (Maybe another bug in the drive?) Here's a tomshardware thread about that issue: https://forums.tomshardware.com/threads/crucial-mx500-firmware-update-error.3648524/
Here's where Crucial says no update is available (without the website knowing what version I have!):
https://www.crucial.com/support/ssd-support/mx500-support

What kind of tainted internal structures would cause the write amplification to go berserk? You haven't yet tried to answer the questions I asked about your sources of information, and your answer about the possible nature of the problem is more vague than I prefer.

If secure erase fixes the issue it'll confirm the broken FTL theory.

If it reoccurs shortly after it'll prove it wasn't fixed in later firmware versions.

If it doesn't change anything at all it'll prove the problem is not there but somewhere else.

In any case you'll have a valuable information about probable causes and use cases causing it. You'll have more data points to provide to Crucial if they got interested in the issue.

A secure erase may be an experiment worth trying, and I'll consider it someday... but not until after Crucial solves their inability to update the firmware. I don't have time now for the "secure erase, restore from backup" ritual. However, if the short term result is that the excessive amplification appears to have gone away, I think that would be less conclusive than you do. Crucial's bug might not manifest at first, especially if it's intentional.

I would also need to seriously consider whether to trust a firmware update. If Crucial intentionally designed the MX500 to die soon after the warranty expires (by arranging for FTL write bursts to compensate when the pc isn't writing much), they might choose to raise the priority of the background process that's responsible for the write bursts, when that process isn't getting as much runtime as they want it to. In other words, they could prevent selftests from mitigating the problem. If their list of what's fixed by a firmware update doesn't include excessive amplification (or something that seems intimately connected to amplification) then I'd be less inclined to trust the update.

I'm aware you're not the only person with WAF problem, but we're not seeing people screaming over rooftops about rapidly decreasing health on MX500 drives, either. So it must be still a rarely occuring problem which is why I'm suggesting a secure erase.

Also beware that correlation between write bursts and bogus current_pending_sector might be only one way, ie. bogus Current_Pending_Sector might be occuring on more drives - maybe even on all drives (ie. not necessarily only on the drives plagued with the WAF bug)

I think few people have their pc write to their ssd as little as my pc does. This would explain why we don't see more people complaining about excessive amplification. It's not necessarily a rarely occurring problem; it could simply be that most people aren't aware that their ssd has excess amplification, because for them the ratio of excess amplification to necessary amplification is smaller. They have much more of the necessary amplification. Why would it occur to them that some of their amplification is unnecessary?

On my ssd, the write bursts correlate perfectly with Bogus Current Pending Sectors. In particular, every time it changes briefly to 1, there's a corresponding write burst. Why do you think other people might experience a brief change to 1 without a corresponding write burst? If that's so, what might it imply about the nature of the two behaviors?

It would help if some of the people who've complained about the Current Pending Sectors bug would perform high frequency logging of both Current Pending Sectors and NAND Pages Written by the FTL Controller (F8), preferably in a format such as .csv that would have one row per pair and could be opened by spreadsheet or database software, so that this question could be settled.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Another reason why I'm skeptical about whether a secure erase would reduce the ssd's excess write amplification is my recent analysis (posted here a few days ago) about the beneficial effect of power cycling on write amplification. If excess amplification is caused by corrupt FTL data that a secure erase would clean up, why does it take many days after a power cycle before the corruption resumes causing high amplification?

Below is a table of ssd log data that's similar to the tables I posted a few days ago. (The differences are described in notes further below.) The pattern after the most recent power cycle (7pm Oct 20) continues as before: Daily WAF grows higher when the ssd goes a long time without being power cycled, and each power cycle reduces Daily WAF back to a lower value.
Date​
Power Cycle Count​
ΔF7
1 row​
ΔF8
1 row​
ΔF7+ΔF8,
1 row​
Daily WAF
= 1 +
ΔF8/ΔF7​
09/04/2020
115
163,836​
375,236​
539,072​
3.29
09/05/2020
115
101,818​
268,671​
370,489​
3.64
09/06/2020
115
106,523​
277,570​
384,093​
3.61
09/07/2020
115
162,022​
191,485​
353,507​
2.18
09/08/2020
115
186,722​
441,499​
628,221​
3.36
09/09/2020
115
132,322​
206,416​
338,738​
2.56
09/10/2020
115
82,136​
719,284​
801,420​
9.76
09/11/2020
115
130,378​
1,015,422​
1,145,800​
8.79
09/12/2020
115
70,203​
475,726​
545,929​
7.78
09/13/2020
115
95,825​
404,533​
500,358​
5.22
09/14/2020
115
83,969​
281,117​
365,086​
4.35
09/15/2020
115
118,173​
571,677​
689,850​
5.84
09/16/2020
115
631,397​
465,335​
1,096,732​
1.74
09/17/2020
115
896,024​
298,618​
1,194,642​
1.33
09/18/2020
115
68,886​
625,739​
694,625​
10.08
09/19/2020
115
77,186​
478,105​
555,291​
7.19
09/20/2020
115
71,719​
645,150​
716,869​
10.00
09/21/2020
115
152,934​
1,050,481​
1,203,415​
7.87
09/22/2020
115
78,306​
1,488,926​
1,567,232​
20.01
09/23/2020
115
99,476​
1,557,023​
1,656,499​
16.65
09/24/2020
115
76,923​
846,607​
923,530​
12.01
09/25/2020
115
93,428​
943,115​
1,036,543​
11.09
09/26/2020
116
84,553​
1,340,469​
1,425,022​
16.85
09/27/2020
116
306,965​
221,576​
528,541​
1.72
09/28/2020
116
73,947​
394,865​
468,812​
6.34
09/29/2020
117
124,427​
331,573​
456,000​
3.66
09/30/2020
117
60,811​
171,891​
232,702​
3.83
10/01/2020
117
68,912​
287,268​
356,180​
5.17
10/02/2020
117
58,960​
377,314​
436,274​
7.40
10/03/2020
117
116,280​
486,270​
602,550​
5.18
10/04/2020
117
66,455​
1,807,065​
1,873,520​
28.19
10/05/2020
117
63,976​
468,118​
532,094​
8.32
10/06/2020
117
94,992​
331,787​
426,779​
4.49
10/07/2020
117
47,197​
486,413​
533,610​
11.31
10/08/2020
117
139,079​
420,262​
559,341​
4.02
10/09/2020
117
100,982​
542,747​
643,729​
6.37
10/10/2020
117
161,132​
1,271,672​
1,432,804​
8.89
10/11/2020
117
145,152​
794,243​
939,395​
6.47
10/12/2020
117
115,457​
1,036,807​
1,152,264​
9.98
10/13/2020
117
70,452​
836,945​
907,397​
12.88
10/14/2020
117
53,717​
589,290​
643,007​
11.97
10/15/2020
117
62,926​
1,049,829​
1,112,755​
17.68
10/16/2020
117
75,428​
1,412,108​
1,487,536​
19.72
10/17/2020
117
1,019,009​
927,715​
1,946,724​
1.91
10/18/2020
117
69,403​
997,287​
1,066,690​
15.37
10/19/2020
117
56,526​
1,097,208​
1,153,734​
20.41
10/20/2020
117
60,458​
1,671,031​
1,731,489​
28.64
10/21/2020
118
122,868​
674,983​
797,851​
6.49
10/22/2020
118
103,384​
240,568​
343,952​
3.33
10/23/2020
118
532,575​
481,317​
1,013,892​
1.90
10/24/2020
118
161,794​
184,042​
345,836​
2.14
A few days (Sept 16/17, Oct 17, Oct 23) show high ΔF7 (a lot of writing by the host pc) because Windows updated itself. Daily WAF was lower on those days, not surprising.

The table columns differ from the tables posted a few days ago in the following ways:
  1. It goes back only to Sept 4th when Power Cycle Count reached 115.
  2. It goes up to today (Oct 24) so it includes data after the most recent (118th) power cycle late on Oct 20.
  3. It shows the Daily WAF, which gives a rough idea at a quick glance.
  4. It's closer to the raw data; it doesn't show the cumulative moving average of the deltas nor the number of days since the most recent power cycle.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
My pc's Power Plan is set to "never sleep." Today I did a quick test of sleep mode because it ought to be more efficient to occasionally power cycle the ssd by a brief sleep than by a shutdown/restart. (In order to knock down the WAF that grows after many days of ssd uptime.) The result: Yes... putting the pc to sleep for a few seconds incremented the ssd's Power Cycle Count. Not a surprise. The next time that WAF grows high, I'll verify that brief sleep knocks down WAF.

An unwanted side effect: The brief sleep stopped the ssd selftest that had been running. That was a surprise, since I thought I'd configured the ssd so that after a power cycle it would automatically resume a selftest that had been in progress. I should doublecheck that setting. If I can't get the ssd to reliably resume a selftest that was in progress, I would create a task that launches a selftest when the computer wakes. (I don't think there's a general solution to that, since Windows Task Scheduler doesn't have a "trigger on wake" option. But I don't need a general solution... the sleeps that I'm planning would be automatically scheduled, so the wakes could be scheduled too, using the "wake the computer to run scheduled task" condition, where the scheduled task would simply launch a selftest.)
 
Jan 6, 2021
6
0
10
Hello. I have 250GB MX500 SSD and noticed that my "life percentage" dropped to 72% just after half a year of usage. I used SSDlife Pro program to check this and the amount of written data which now sits at about 11TB. When googling the problem i found this thread with similar case. Thanks Lucretia19 for your observations and tests. Today I started using your .bat file to see will the situation improve.

I have one question regarded to .bat, should i leave PauseSeconds, and SelftestSeconds as its now for my SSD, since its lower capacity than yours?
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
I have one question regarded to .bat, should i leave PauseSeconds, and SelftestSeconds as its now for my SSD, since its lower capacity than yours?

I chose the two timing parameters based on some trial & error experimentation. I don't know whether they're optimal, but they've been good enough to greatly slow the decrease of my 500 GB Crucial ssd's Remaining Life, which reached 92% on 3/13/2020 and 91% on 10/19/2020. (It should be noted, however, that my pc writes less to my ssd than the average pc does, in part because I redirected from ssd to hard drive some frequent writing by Windows and other apps.)

My hunch is that the parameters I settled on will be reasonable for the 250 GB ssd too, so I suggest you start with them and check after about a week to see whether they caused a major reduction of your ssd's Write Amplification Factor (WAF) . The smaller WAF is, the better.

Do you know how to calculate WAF? You'll need to track two of the ssd's SMART attributes -- F7 (which is the NAND pages written by host pc) and F8 (which is the NAND pages written by the ssd's FTL controller) -- so you can see whether the ratio of the increase of F8 to the increase of F7 is much smaller during the period of time when the selftest bat runs.

Alternatively, you could ignore WAF and pay attention to the sum F7+F8, which is the total NAND pages written to the ssd. Its rate of increase is a more direct measure of the rate that ssd lifetime is used up.
 
Jan 6, 2021
6
0
10
Yea, i've read how to calculate WAF.
now F7 is 394 578 354, F8 is 2 943 521 015, thats resulting in WAF = 8.45. i try to check it throughout a week to see how those atrributes are changing.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
now F7 is 394 578 354, F8 is 2 943 521 015, that's resulting in WAF = 8.45

Did you also record any older F7 and F8 values, so that you can determine what WAF has been DURING A RECENT PERIOD (the days or weeks prior to beginning the selftests regime)? To really see the effect of the selftests, you should compare the changes in F7 and F8 that occur during a "recent period before running selftests" to the changes in F7 and F8 that occur during a "period running selftests."

Before I began testing the effect of nearly-nonstop ssd selftests on 2/23/2020, "Total WAF" over the ssd's entire period of operation since installation (7/28/2019 to 2/23/2020) was 7.26 as shown in the table below. But write amplification had actually grown much, much worse than 7.26. For example, during the two weeks from 2/06/2020 to 2/23/2020, the "Recent WAF" was 38.12. Here are several days of SMART data collected during the days before and after I began the ssd selftests experiment, plus the calculations of Total WAF and Daily WAF:
Date​
F7​
F8​
Total WAF
=
1 +
F8/F7​
ΔF7
1 row​
ΔF8
1 row​
Daily WAF =
1 + ΔF8/ΔF7
02/20/2020
223,801,408
1,383,179,818
7.18
262,998​
18,637,064​
71.86
02/21/2020
224,088,982
1,389,879,812
7.20
287,574​
6,699,994​
24.30
02/22/2020
224,290,793
1,399,713,786
7.24
201,811​
9,833,974​
49.73
02/23/2020
224,566,093
1,406,411,752
7.26
275,300​
6,697,966​
25.33
02/24/2020
224,797,509
1,407,625,231
7.26
231,416​
1,213,479​
6.24
02/25/2020
225,481,679
1,409,269,335
7.25
684,170​
1,644,104​
3.40
02/26/2020
225,799,254
1,409,707,611
7.24
317,575​
438,276​
2.38

Your 8.45 "Total WAF" is based on the 1 + F8/F7 formula. That won't be as revealing as comparing using the 1 + ΔF8/ΔF7 formula, where the delta symbol Δ means "change over the recent period of time." As you can see in the table above, the selftests regime begun on 2/23/2020 caused Daily WAF to decrease by an order of magnitude in the days that followed. There was of course only a tiny effect on Total WAF, decreasing from 7.26 to 7.24. The tiny decrease in Total WAF isn't as revealing as the large decrease in Daily WAF.

You posted a pair of F7 & F8 values and called it "now." I assume that pair was recorded either a little before or very soon after you began running the bat. And I assume you plan to record another pair after running the bat for a week. But to compare using the 1 + ΔF8/ΔF7 formula, you will need a third pair too. Did you record an earlier pair about a week (plus or minus a few days or weeks) before you began running the bat? If not, you can collect the third pair later, by stopping the selftests when you record the second pair, and recording the third pair about a week after stopping the selftests.

The SSDLife Pro website doesn't list a SMART data logging feature. That's unfortunate, since a logging feature could automatically periodically record the pairs for you. I designed a second bat that automatically logs ssd data to a comma-delimited file -- it periodically calls SMARTCTL.exe to read the SMART data, and appends the data (and date & time and other SMART attributes of interest such as Average Block Erase Count and Power Cycle Count) to the file. The log file can be opened as a spreadsheet and pasted into columns in another spreadsheet for analysis. (I actually run two instances of the second bat, one that logs every 2 hours and another that logs daily. The rows posted above were extracted from my daily spreadsheet... just a few of its columns.)

Note 1: The Crucial 250 GB ssd has an endurance rating of 100 TB, which is much less than the 180 TB rating of the 500 GB drive. This means the Remaining Life of a 250 GB ssd should be expected to decrease significantly faster than the Remaining Life of a 500 GB ssd, assuming the same writing by the host and the same WAF bug. Your RL drop to 72% after only 11 TB has been written indicates your ssd will not reach its 100 TB rating if nothing is done to mitigate the WAF bug. A 500 GB drive with 11 TB written and the bug would presumably have RL of about 85% (100% - (100%-72%)x(100TB/180TB)) which would indicate it will not reach its 180 TB rating. (Because WAF grows worse over time, as shown by the Daily WAF values that became consistently much worse than the Total WAF values, the actual endurances are much worse than these simplistic calculations indicate.)

Note 2: A recent commenter claimed that a wipe of the ssd using a Secure Erase might eliminate the WAF problem, by resetting the FTL controller's data structures. (After backing up the ssd carefully so you can restore it after the wipe.) If true, that solution would be more elegant than continually running the selftests bat. I haven't tried a wipe because I want to test the long term (years or decades) effect of the selftests regime -- in particular whether the selftesting ssd will reach its 180 TB endurance rating, or at least outlive me -- and because I'm skeptical about the claim that the excessive write amplification is caused by corrupted FTL data structures. (Crucial Tech Support never suggested to me that I try a wipe; they agreed to exchange the drive.) But you may want to consider trying a wipe instead of the selftests. It would be nice to know whether a wipe eliminates the problem, so if you try it please post your results, and if the wipe appears to succeed in the short term, please also come back someday and post your long term results.
 
Jan 6, 2021
6
0
10
Hello again

When i said now, i just started to use the bat with selftesting. That was first initial "data". I downloaded CrystalDiskInfo as a better alternative and started monitoring parameters. Now 4 days have passed, i created an Excel spreadsheet to see the dynamics of parameters, including delta of f7, f8 and waf. I tried to take measurements roughly at same time. Here are my results so far.

2e4aa4e2e8.png


The delta waf became lower, but its still sits at high. I think i need more days to test this. Or maybe lower bat parameters will help even more

You mentioned performing a Secure Erase to try to eliminate WAF bug. I might try it after some time
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The delta waf became lower, but its still sits at high.

Delta WAF became lower compared to what? That statement doesn't make sense to me because you've posted no data that shows what Delta WAF was before you began running selftests. Collect some data during a period of days when the selftests are NOT running, too, so you will have a meaningful comparison.

The comparison to Total WAF isn't meaningful. The Total WAF values of around 8.4 cover the half year since your ssd was first installed, which includes early months when WAF may have stayed low. As I mentioned earlier, WAF grows worse over time. Those early months are irrelevant.

The decrease in your Total WAF during the few days that the selftests have been running, from 8.459915 to 8.40583, is a large enough decrease to suggest that the selftests are helping. It's larger than the decrease of my Total WAF from 7.26 to 7.24 during the first few days of my selftests.

I note that your daily DeltaF7 is much larger than mine. My daily DeltaF7 averages around 150,000. (For most of 2020 my daily DeltaF7 averaged around 250,000, but I lowered it further in August/September by redirecting more Windows logs to a hard drive.) Assuming a NAND page on the 250 GB ssd is approximately the same size as a NAND page on the 500 GB ssd (around 37 kBytes), your pc is writing about 10 times more than mine does. Also, your F7 is about 400,000,000 after about 6 months, which is much more than my F7 of about 309,000,000 after about 18 months. As I wrote earlier, the sum DeltaF7+DeltaF8 is a more direct measure of the rate that the ssd life is being consumed, so consider whether you can reduce your DeltaF7. Assuming you're running Windows, Procmon (Process Monitor) is a free Microsoft tool that can show you which files are being written, the size of each write operation, and which processes are writing them; perhaps you could use some of this information to significantly reduce DeltaF7 without crippling your pc's functionality... for instance by reducing the frequency that your web browser saves data, or reducing the verbosity of logs, or disabling unimportant logging, or relocating frequently written files or folders to a hard drive.

I'm curious... are you manually copying from CrystalDiskInfo to your spreadsheet? I don't see an automatic periodic logging function in CrystalDiskInfo's menus.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Or maybe lower bat parameters will help even more

It occurs to me that I have no idea how long it takes to run an extended selftest on a 250 GB ssd. On my 500 GB ssd, an extended selftest would take about 26 minutes (if not aborted by the bat). You should measure the duration of an extended selftest on the 250 GB ssd, so you can ensure the bat loop duration doesn't exceed the selftest duration. If the loop duration exceeds the selftest duration, there would be extra time during each loop when the selftest isn't running. That would be time during which the write amplification bug wouldn't be blocked by the selftest.
 
Jan 6, 2021
6
0
10
I do indeed manually copy from CrystalDiskInfo, not a big deal to check once in a day.

Here's my update on the situation. Firstly i tried reducing selfTestSeconds to 660 to see will it improve daily WAF, but i get worse results than before with that. After few days i tried secure erasing my ssd and running some days without selftesting later. As you can guess, it wouldn't help. I got very low WAF on first day, but i guess thats because i backuped and restored a partition with Windows, which currently sits at 182GB of available 232GB. Then it got worse with WAF going higher than without selftests. During these days life percentage dropped from 72% to 71%
2587ccc571.png
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Here's my update on the situation.

Thanks for posting that info. It helps to know that a secure erase is ineffective.

Regarding the daily WAF measurements, expect a large amount of day-to-day fluctuation. This means it takes measurements over a much longer period than a day or two to have confidence in a result. I think there's also a lag time -- hours? -- between the F7 writing and the F8 amplification. If you can be patient and let each experiment run for a couple of weeks or so, I think that will be a more reliable trial-and-error method to find reasonably optimal loop timing values.

I suggest letting your pc run a few more days (or a week or two) without selftests in order to gain high confidence in the "no selftests" result, before resuming selftest experiments. Some of today's higher WAF might be due to the large restore from backup even though the restore was performed a few days ago. It may take more days before the daily WAF settles down to more consistent values. I have no experience observing WAF after a large restore; perhaps it could take weeks to settle down. My intuition is that it won't take that long, due to the secure erase that preceded the restore, but I don't know.

Reducing the loop duration (from 1200 seconds to 660 seconds) while holding the pause time constant (30 seconds) means less time is spent selftesting (assuming an extended selftest would last at least 1130 seconds if not aborted) and thus WAF should be expected to increase, as you observed. But you have only two days of measurements at 660 seconds, which I think isn't long enough to have confidence that WAF is truly higher at 660s than at 1200s.

My 500 GB ssd has a much higher percentage of free space; only about 100 GB is occupied. My understanding is that having a lot of free space helps reduce WAF, because the ssd's FTL controller's static leveling process doesn't need to move as many bytes to maintain a balanced block erase count after new bytes are written. Perhaps you could gain more free space by deleting temporary files and apps you no longer use.
 

StrikerFX

Commendable
Jan 15, 2021
4
0
1,510
@NamZIX:

Do you mean the ssd Remaining Life is now 92%, and that the ssd is 6 months old?

Before you conclude that 8% in 6 months is due to the bug I described, you need to check how many bytes your pc has written to the ssd during those 6 months. Decrease of ssd Remaining Life due to writes by the host pc is normal, since each cell of an ssd can be written only a finite number of times. Crucial's specs say the MX500 500GB endurance is 180 TBytes. Since 8% of 180 TB is approximately 14.4 TB, if your pc has written approximately 14 TB during those 6 months then you're getting what should be expected.

One caveat. I'm uncertain whether the 180 TB spec means bytes written by the host pc, or the sum of the bytes written by the host pc and by the ssd's FTL controller. Both of those numbers -- bytes written by the host pc and bytes written by the FTL controller -- can be displayed by free software such as CrystalDiskInfo or Smartmontools... any software capable of monitoring S.M.A.R.T. attributes. The bug I described causes excessive writes by the FTL controller, causing Remaining Life to decrease much faster than it should decrease. Record those two numbers, and then a few days later record them again to see how much each increased, and let me know the numbers. (The S.M.A.R.T. software will show you NAND pages written rather than bytes written, but that's fine since what matters is the ratio of the two increases. In other words, NAND pages written by the FTL controller should not be much much larger than NAND pages written by the host. During the weeks before I tamed the ssd with my selftests .bat file, the ratio was about 38 to 1. During the months that the selftests have been running, the ratio has been about 1.6 to 1. Note: Crucial defines "Write Amplification Factor" as 1 plus that ratio.)

Here's a simplified version of my .bat file. You would put this file and the smartctl.exe utility of Smartmontools in a folder named C:\fix_Crucialssd and run the .bat file with Administrator privileges. If you put the two files in a different folder, edit the .bat accordingly. If your ssd isn't C:, edit the .bat accordingly. You can use Windows Task Scheduler to have the .bat file start automatically when Windows starts, or when a user logs in. (If it starts when Windows starts, it will be hidden and won't appear in your taskbar.) In the Task Scheduler dialog box, be sure to check the checkbox labeled "Run With Highest Privileges."

Code:
@echo off
rem  Edit PROGDIR variable to be the folder containing smartctl.exe
set "PROGDIR=C:\fix_Crucialssd"

rem  Edit SSD variable, if needed, to be the ID of your Crucial ssd
set "SSD=C:"

rem  For simplicity assume smartctl.exe takes 4 secs to start selftest
set /A "PauseSeconds=26, SelftestSeconds=1170"

set "PROG=%PROGDIR%\smartctl.exe"

rem  Infinite loop:
FOR /L %%G in (0,0,0) do (
   rem  Start a selftest with 5 maximal ranges selected
   %PROG% -t select,0-max -t select,0-max -t select,0-max -t select,0-max -t select,0-max -t force %SSD%
   TIMEOUT /t %SelftestSeconds% /NOBREAK
   rem  Abort the selftest
   %PROG% -X %SSD%
   TIMEOUT /t %PauseSeconds% /NOBREAK
)

Hi! I have a Crucial MX500 500gb sdd, just like you, and I'm also having problems similar to yours and friends. My ssd was purchased about 3 months ago, during this period, it was used for 386 hours, which is equivalent to ~ 16 days, the problem was the excessive consumption during that period: 2.2TB.

Both in the crucial software and in the crystal disk info already show 99% in the health part of the ssd, needless to say how much I was scared by this value, only 16 days of use and more than 2TB consumed. Only I don't download anything on this ssd, all downloads, including photos, texts, videos, audios and etc.

I download it in hd, use the internet download manager to download and it is configured to download everything in the 1TB hd I have here.

The ssd is only for windows 10, I also have qbittorrent installed, but it is configured to download everything on hd too, even your logs, settings and etc. I configured defragmentation to be disabled, even before windows 10 had that problem of high writes on ssd because defragmentation is buggy.

I did the calculation that you posted in the topic, F7 + F8 / F7, the WAF value was 1.74, it was very low, but I don't understand why the high writing value, above 2tb, and the ssd is only for the windows 10 and nothing else.

What I use most on ssd is firefox, but yesterday I even transferred his profile to hd, to see if the problem is him, but I think it is not. What do you think I should do? I will test your .bat, I want to see if I have the same problems you are having, but I have some doubts. You say to create the folder fix_Crucialssd, add it in C :\, put both .bat and smartctl.exe, but I was in doubt about smartctl.exe.

Just copy the smartctl.exe from the setup.exe of smartmontools or do you need to have smartmontools installed for the .bat process to work? If it works with only smartctl.exe, which one should I use, what's in the bin or bin64 folder?

I thank you very much if you can help me here, because it made me very sad, because this ssd was expensive to buy and now it presents this serious problem, the crucial one disappointed me a lot, I will hardly buy storage units from here on, I don't want to risk more.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
I did the calculation that you posted in the topic, F7 + F8 / F7, the WAF value was 1.74, it was very low, but I don't understand why the high writing value, above 2tb, and the ssd is only for the windows 10 and nothing else.

The very low WAF means the WAF bug is not your problem (yet). You didn't specify whether the 2.2 TB that's been written is the amount written by the host pc to the ssd, or the total amount that also includes the ssd's internal write amplification, and I will assume you mean it's just the amount written by the host pc.

The 500 GB ssd has an endurance rating of 180 TB. If you continue writing 2.2 TB every 3 months, that corresponds to about 20 years of life. That doesn't sound bad or sad.

I assume the 386 hours of "usage" is the ssd's Power On Hours SMART value. That value is very misleading because the ssd normally spends most of its time in a "low power" mode that doesn't get counted in Power On Hours. The ssd is powered on while in the low power mode, so it would be reasonable to say that Power On Hours is buggy too. The ssd is idle while in low power mode, but in a sense it's being used, because it's online and available for reading and writing. So, I think you should pay attention to the 3 months and not pay attention to the 386 hours. (On the other hand, if the pc has been off most of those 3 months, for example if the pc is on for only an hour a day, then the 3 months is misleading too, and the 2.2 TB would then seem excessive.)

If you decide you want to reduce the writing from the host pc to the ssd, what you would need to do first is find out which files are being written to the ssd, and which processes or apps are writing them. A good free tool is Microsoft's Procmon (Process Monitor), which can show you every file write operation (or a subset of the operations, that you can select by setting filter criteria such as "Operation contains 'write'" and "Path begins with 'C:\'" and "Details contains 'length'"), the size (length) of each write operation, etc. Try to figure out which processes and files are responsible for most of the writes.

I run the free version of HWiNFO so I can see in realtime the rate of writing by the host pc to the ssd. By observing when the write rate goes high, you may be able to correlate the high rate with an app or system process that's running. Sometimes, though, there will be a delay between app activity and some system writing that gets triggered later by the app activity.

I assume your ssd is your C: system drive. Windows and some apps write a lot, even when it seems the computer isn't being used much. WIndows writes MANY log files and registry updates, and some of those log files can easily be redirected to your hard drive. Apps like Firefox write a lot too (both directly and by triggering later system logging) and you should consider moving the Firefox profile folder to your hard drive. Antivirus apps can write a lot when they download virus signatures and/or scan files, and you might be able to redirect that writing to hard drive too. My UPS app (CyberPower PowerPanel) logs a lot, so I redirected it to hard drive.

In case the WAF bug becomes a problem for you someday, the only Smartmontools file that needs to be availlable to the .bat is smartctl.exe. Use the 64-bit version of smartctl.exe if your pc runs a 64-bit version of Windows.
 
Last edited:

StrikerFX

Commendable
Jan 15, 2021
4
0
1,510
For now, I will post here the images of the crystal disk info that I made of it yesterday and today, after I started using .bat, I will also post the one I made yesterday with the crucial software.

Later I will inform you of the other information that you commented, but about the 3 months, I used the PC a few hours a day, there were days that I didn't even turn it on, so the total time used was more or less that.

Something I used a lot since installing the ssd was firefox, only today after this problem was that I changed his profile to hd, do you think firefox might be able to increase both writing?
Doing the WAF calculation still continues with 1.74, is it still a good value for you?


CDI Yesterday:

Ai5W8Dg.jpg


CDI Today:

wUlvjkW.jpg


Software Crucial yesterday:

nWXybAN.jpg
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Something I used a lot since installing the ssd was firefox, only today after this problem was that I changed his profile to hd, do you think firefox might be able to increase both writing?
Doing the WAF calculation still continues with 1.74, is it still a good value for you?

Your ssd has been behaving okay: 1% of life used after 2.2 TB written is better than the 180 TB endurance rating. The problem that concerns you is not caused by your ssd. It's not like the WAF problem that my ssd experienced before I began running selftests. The problem that my ssd had is described in the title of this thread: "... Remaining Life decreasing fast despite few bytes being written."

You haven't clearly described how much time your pc was on during the 3 months. Where you wrote "I used the PC a few hours a day" I don't know whether you mean the pc was powered on for only a few hours per day, or the pc was powered on for many hours per day but "idle" for most of those hours.

Windows does a lot of writing when the pc seems idle. Also, apps and background services may do work while the pc seems idle. If a Windows pc isn't powered off or sleeping or hibernating, it's writing a lot.

I don't understand your question about whether Firefox can "increase both writing." Both what?

If you use Firefox a lot, moving the Firefox profile to hard drive might significantly reduce writing to the ssd, as I wrote in my previous post. Now that you've moved the Firefox profile to hard drive, keep an eye on how much F7 increases during the next few days, to see whether the rate of increase of F7 is a lot less than the rate that F7 was increasing during the days before the profile was moved.

You could also avoid using Firefox for a few days to see whether the rate of increase of F7 drops a lot.

I recommend you focus on reducing the rate of increase of F7, which is controlled entirely by the pc software, and not be concerned about F8 or WAF. (Your WAF is excellent.) Try using Procmon as I described in my previous post, to analyze what's actually being written to the ssd and which processes are responsible for most of the writing. You may be surprised to see how much writing occurs while the pc seems idle. Since you also have a hard drive, Procmon will help you find a lot of writing that can be redirected to the hard drive. (Not all of the writing can be redirected, but some of it can be, using various techniques.)
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
For people like StrikerFX and myself who also have a hard drive in the computer, the following .bat can be used to redirect Windows' Winevt logs to the hard drive, to reduce writing to the ssd.

Note 1: The .bat must be run as Administrator, or the redirect commands will fail.
Note 2: Windows does a lot of other writing besides the Winevt logs.
Note 3: For people who do not have both an ssd and a hard drive, alternatives are to reduce the verbosity of the logs, or disable some of the logs entirely.

Code:
@echo off
rem  For usage, see the bottom of this file.
setlocal enabledelayedexpansion

REM  ==================================================
rem  EDIT THE FOLLOWING LINE SO THE DRIVE LETTER MATCHES YOUR COMPUTER'S HARD DRIVE:
set "_HD=O:"
rem  YOU CAN EDIT THE FOLLOWING IF YOU PREFER DIFFERENT FOLDERNAMES:
set "_SYSLOG_FOLDER=%_HD%\SysLogs"
set "_WINEVT_PATH=%_SYSLOG_FOLDER%\winevt"
REM  ==================================================

rem  The log files to redirect are the .evtx files that by default are written to the C:\Windows\System32\winevt\Logs folder.
set "DEFAULTFOLDER=C:\Windows\System32\winevt\Logs"

if not exist "%_SYSLOG_FOLDER%\" mkdir "%_SYSLOG_FOLDER%"
if not exist "%_WINEVT_PATH%\" mkdir "%_WINEVT_PATH%"

rem  For future reference, save the redirected filenames to a file named WinEvtLogs.txt:
set "LOGS=%_SYSLOG_FOLDER%\RedirectedWinevtLogs.txt"

rem  The following file is temporary:
set "WINEVTLOGS=%_SYSLOG_FOLDER%\WINEVT_filenames.txt"
rem  Write to the temporary file the filenames of all .evtx files that exist in the ssd folder:
dir %DEFAULTFOLDER%\*.evtx /B /O:-S >%WINEVTLOGS%

rem  Redirect each of the .evtx logfiles:
for /F "delims=" %%L IN (%WINEVTLOGS%) do (
   set "_Filename=%%L"
   echo Appending filename "!_Filename!" to file %LOGS% as a record of the redirected files...
   echo !_Filename! >> %LOGS%
   rem  One of the command line parameters of the redirect command is the WinEvt internal name.
   rem     To construct the internal name, replace the chars "%4" with a forward slash
   rem     and trim the .evtx extension from the end:
   set _InternalName=!_Filename:%%4=/!
   set _InternalName=!_InternalName:.evtx=!
   rem  Display and then run the redirect command:
   echo wevtutil sl "!_InternalName!" /lfn:"%_WINEVT_PATH%\!_Filename!"
   wevtutil sl "!_InternalName!" /lfn:"%_WINEVT_PATH%\!_Filename!"
)
del %WINEVTLOGS%
EXIT /B

===================================
To make the ssd drive last longer, this .bat uses Windows' "wevtutil sl" command to
redirect all existing Winevt .evtx log files to a hard drive folder (\Syslogs\winevt).
This .bat will create the \Syslogs and \Syslogs\winevt folders if they don't yet exist.

This .bat must be run as Administrator or the redirect commands will fail.

Windows will remember these redirections.  This means the .bat only needs to be run
once unless a Windows update or app update or new app causes Winevt to start writing
logs that didn't exist when the .bat was run, or something weird happens that causes
Windows to forget the redirections.
 
Last edited:
  • Like
Reactions: StrikerFX
Jan 6, 2021
6
0
10
Three more days has passed without selftesting

4f9b06b2be.png


The daily WAF still sits pretty high. The F8 in particular is about 2-3 times higher than it was during selftesting
 

StrikerFX

Commendable
Jan 15, 2021
4
0
1,510
Your ssd has been behaving okay: 1% of life used after 2.2 TB written is better than the 180 TB endurance rating. The problem that concerns you is not caused by your ssd. It's not like the WAF problem that my ssd experienced before I began running selftests. The problem that my ssd had is described in the title of this thread: "... Remaining Life decreasing fast despite few bytes being written."

You haven't clearly described how much time your pc was on during the 3 months. Where you wrote "I used the PC a few hours a day" I don't know whether you mean the pc was powered on for only a few hours per day, or the pc was powered on for many hours per day but "idle" for most of those hours.

Windows does a lot of writing when the pc seems idle. Also, apps and background services may do work while the pc seems idle. If a Windows pc isn't powered off or sleeping or hibernating, it's writing a lot.

I don't understand your question about whether Firefox can "increase both writing." Both what?

If you use Firefox a lot, moving the Firefox profile to hard drive might significantly reduce writing to the ssd, as I wrote in my previous post. Now that you've moved the Firefox profile to hard drive, keep an eye on how much F7 increases during the next few days, to see whether the rate of increase of F7 is a lot less than the rate that F7 was increasing during the days before the profile was moved.

You could also avoid using Firefox for a few days to see whether the rate of increase of F7 drops a lot.

I recommend you focus on reducing the rate of increase of F7, which is controlled entirely by the pc software, and not be concerned about F8 or WAF. (Your WAF is excellent.) Try using Procmon as I described in my previous post, to analyze what's actually being written to the ssd and which processes are responsible for most of the writing. You may be surprised to see how much writing occurs while the pc seems idle. Since you also have a hard drive, Procmon will help you find a lot of writing that can be redirected to the hard drive. (Not all of the writing can be redirected, but some of it can be, using various techniques.)
My use during the 3 months was very simple, just browsing, watching movies using the mpc-hc, games, this was more or less my use during that period. Most of it I spent just browsing, more than 90% using firefox, some time or other that I used microsft edge chrome.

I played a little during this period, but the total hours was much less than I spent using firefox, I think I spent more hours watching videos on mpc-hc than playing. I used to leave the pc idle a few times, when I needed to do something, that was a few hours, I can't give you an exact value.

From 01/14 to the present day, I have written down all the information on the F7 / F8 values, I don't know if the calculations are correct, but I would like you to check them, if they are still with good results.

The .bat file to run the seft-tests I only used between January 15th and 16th, from yesterday to today I decided to stop using it, because I want to see if there will be a lot of difference in the next few days. Below I'll post an image with the values I've been monitoring using crystal disk info:

DateTotal Host Writes (GB)S.M.A.R.T F7S.M.A.R.T F8Total WAF
=
1 + F8/F7
ΔF7ΔF8Daily WAF =
1 + ΔF8/ΔF7
WAF = (F7+F8)/F7
14/01/20212,21477,980,81757,571,802
15/01/20212,22078,213,02657,858,6091,74232,209286,8072,241,74
16/01/20212,22478,342,19157,988,8011,74129,165130,1922,001,74
17/01/20212,22578,379,12758,073,7141,7436,93684,9133,301,74
18/01/20212,22678,433,17258,081,2121,7454,0457,4981,011,74
19/01/20212,22878,494,82458,127,3071,7461,65246,0951,741,74
20/01/2021 (22:00PM)2,22978,562,97058,433,8261,7468,146306,5195,491,74
21/01/20212,23078,597,10158,493,3511,7434,13159,5252,741,74
22/01/20212,23278,648,62358,677,5411,7451,522184,1904,571,74

Do you think these values are good at the moment?
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
Do you think these values are good at the moment?

Yes, your deltaF7 and deltaF8 are both very good during the most recent 4 days.

In your most recent row, deltaF7 wasn't near zero, so I'm amazed that the deltaF8 is so low: 7,110 NAND pages. Was that row really recorded 24 hours later than the row above it, or was it for a period of time much shorter than 24 hours? I will speculate that your pc was powered off or asleep for most of that 24 hours, and the lag time between host writing (F7) and write amplification (F8) is what caused deltaF8 to be so much smaller than deltaF7 during that period of time.

The values in your Daily WAF column appear incorrect. You may have forgotten to add 1 in each cell of that column.

My question about how much time your pc was powered on during the 3 months was NOT about how much time you spent using it. Even when you aren't using the pc, Windows does a lot of writing to the system drive (C:) while the pc is powered on and not sleeping or hibernating. So, my question was about how much time your pc was powered on and either in use or idle, and also whether your pc is set to stay awake while it's idle, or go to sleep, or hibernate.

I think you have no need to run the ssd selftests bat. Your WAF is very low. If your WAF grows a lot someday, you can start running the bat.

Note: You can copy & paste highlighted cells directly from your spreadsheet to a message you're composing here. That's better than pasting a screencapture image for two reasons: (1) we can paste your data from your message into our own spreadsheets in order to analyze your data more easily, and (2) screencaptures are larger, which wastes space on the Tom's Hardware server.
 

Lucretia19

Reputable
Feb 5, 2020
195
15
5,245
The daily WAF still sits pretty high. The F8 in particular is about 2-3 times higher than it was during selftesting

Yes, you are gradually accumulating data that indicates your pc will benefit from running the ssd selftests bat all the time.

Also, you may be able to reduce the rate of writing by the host pc (deltaF7). If you can, that would help preserve the life of your ssd too. Some of my recent replies to StrikerFX were about how to reduce the writing by the host pc.