Question Crucial MX500 500GB SATA SSD - - - Remaining Life decreasing fast despite only a few bytes being written to it ?

Page 10 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Lucretia19

Reputable
Feb 5, 2020
192
14
5,245
The Remaining Life (RL) of my Crucial MX500 ssd has been decreasing rapidly, even though the pc doesn't write much to it. Below is the log I began keeping after I noticed RL reached 95% after about 6 months of use.

Assuming RL truly depends on bytes written, the decrease in RL is accelerating and something is very wrong. The latest decrease in RL, from 94% to 93%, occurred after writing only 138 GB in 20 days.

(Note 1: After RL reached 95%, I took some steps to reduce "unnecessary" writes to the ssd by moving some frequently written files to a hard drive, for example the Firefox profile folder. That's why only 528 GB have been written to the ssd since Dec 23rd, even though the pc is set to Never Sleep and is always powered on. Note 2: After the pc and ssd were about 2 months old, around September, I changed the pc's power profile so it would Never Sleep. Note 3: The ssd still has a lot of free space; only 111 GB of its 500 GB capacity is occupied. Note 4: Three different software utilities agree on the numbers: Crucial's Storage Executive, HWiNFO64, and CrystalDiskInfo. Note 5: Storage Executive also shows that Total Bytes Written isn't much greater than Total Host Writes, implying write amplification hasn't been a significant factor.)

My understanding is that Remaining Life is supposed to depend on bytes written, but it looks more like the drive reports a value that depends mainly on its powered-on hours. Can someone explain what's happening? Am I misinterpreting the meaning of Remaining Life? Isn't it essentially a synonym for endurance?


Crucial MX500 500GB SSD in desktop pc since summer 2019​
Date​
Remaining Life​
Total Host Writes (GB)​
Host Writes (GB) Since Previous Drop​
12/23/2019​
95%​
5,782​
01/15/2020​
94%​
6,172​
390​
02/04/2020​
93%​
6,310​
138​
 
  • Like
Reactions: demonized

danyulc

Distinguished
Oct 12, 2010
27
0
18,540
Your calculations of erase cycles appear to neglect write amplification. Did you look at the actual Average Block Erase Count in your 250GB ssd that has 94% remaining? Is that SMART attribute available?

Sadly the 940 EVO doesn't include any information about WAF or Average Block Erase Count. They wouldn't even give the press an estimated TBW endurance rating when the product was released as it was the first mainstream product to use TLC. The 840 also used TLC, but wasn't really a big seller like the 840 EVO.

I just based it off the writes made so far and the endurance remaining. Which as you said is not correct, but it's all I have to go on. The "wear leveling count" indicator reads "62" and the threshold value is set to 0 and the current value is 94, which is the health rating of the SSD according to HD Sentinel.
 

Lucretia19

Reputable
Feb 5, 2020
192
14
5,245
Sadly the 940 EVO doesn't include any information about WAF or Average Block Erase Count.

I assume you mean 840 EVO, not 940 EVO.

Average Block Erase Count is an ssd SMART attribute that can be read by the host pc using SMART monitoring software, such as CrystalDiskInfo or Smartmontools, both of which are free. The attribute might have a different label depending on the software you use to display it. (Good software may adhere to the manufacturer's choices of labels.) It appears to be attribute 177 (which is B1 in base 16) and is labeled "Wear Leveling Count" at https://www.anandtech.com/show/7173...w-120gb-250gb-500gb-750gb-1tb-models-tested/3 and at https://www.smartmontools.org/ticket/692

WAF is calculated using two other values: (1) NAND pages written by the host pc, and (2) NAND pages written by the ssd's internal controller. The formula is (#2 / #1) + 1. But I don't see either of these values in the 840 EVO's SMART output. Perhaps they're available in "extended" SMART output. Assuming not, here's a well-written article that seems relevant to how to do an alternative calculation: Request for Samsung 840 EVO SSD owners (write amplification calculation)

Here's an interesting article about a problem with the 840 EVO, which might also be relevant to the MX500:
Samsung 840 EVO - how to easily lose your data forever
https://forum.acelaboratory.com/viewtopic.php?t=8735
That article claims a firmware "fix" by Samsung causes a lot of extra NAND writing in order to reduce the risk of data loss. If true, maybe the MX500 firmware does the same "fix"... could it be the cause of the high WAF?
 

danyulc

Distinguished
Oct 12, 2010
27
0
18,540
Yes, I meant the 840 EVO, my apologies.

I know that when they initially released the updated firmware it was to fix issues with data that had been on the SSD for a long time reading at very slow speeds. The program SSD Read Tester was written specifically because of the 840 EVO issues. It was meant to test the disks and see the impacts of read performance degradation over time. I believe it was an issue with the voltage levels of the TLC memory. Eventually Samsung managed to fix the firmware and released a firmware update. In addition to updating the firmware you were supposed to run a utility which would re-write all the data to the drive to 'refresh' it.

People were concerned that Samsung's newest firmware would just re-write the data to the drive over and over to keep it fresh, which would of course lower the drives overall endurance. It appears they actually fixed the issue by adjusting the voltage levels of the cells when the data is written to the drive. I last ran SSD Read Tester on 9-26-2021. Here is an image of the results I received.



When I look at the text based output 35 entries read below 400MB/s out of 2313 entries in total. 107 are under 450MB/s. All of those files are smaller in size.

Most of the files on there are 858 days old and read at full speed.

The 840 EVO firmware is incredibly generous with its "life remaining" percentage. Possibly because of the increased scrutiny the drive received after the firmware update and ensuing concerns about reduced endurance. They would likely have done anything to end the 840 EVO saga.

The 840 EVO never had a TBW value published. The 850 EVO did have endurance specs published and the 250GB version is rated for 75TBW. TechReport ran an endurance test on the 840 EVO 250GB and managed to write something like 900+TB before the drive completely failed and stopped functioning.

Judging from the TechReport article on SSD endurance most of the drives tested are capable of handling far more writes than the specs state. No telling if that holds for the newer drives as I don't think a similar test has been performed. Obviously once you pass the specified TBW the drive is no longer covered under warranty. My 840 EVO has almost 7 years of power on time.

I'm going to assume the max TBW for my 840 EVO to be 50TB at the absolute most. In all likelihood I'll never even get anywhere near that amount of data written. Based on my past usage it's much more likely the drive will fail before the NAND begins wearing out. In the last 2 years the drive has seen 1TB of writes. In total I wrote 14TB in just under 7 years.

I use the 840 EVO as a scratch drive so anything on there is either backed up elsewhere or it isn't a problem if I lose it. I did the math for the WAF. My WAF is 1.05049 following the steps suggested in the link. That seems low, but the raw wear leveling count value on my drive is only 62, so 62x250 15550/14755 = 1.050491358861403. I double checked with Crystal Disk Info to make sure the numbers were the same. Below are the SMART stats for the 840 EVO in decimal format. The 11 CRC errors were from a bad cable that has long since been replaced.



In regards to my old 3TB Seagate HDD drive, I had a 5 year extended warranty that I bought for $10 back in 2017. I finally got a shipping label for it and the e-mail said that once they receive it I'll get a gift card for the purchase price. That being said it wasn't easy to get the shipping label. I had to jump through some hoops. It was the usual effort to wear the person down and avoid a warranty claim.
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
Thank you for converting it to a table and calculating WAF.

I am running your script now for data collection, however I have already decided to buy a EVO for the laptop as a long term replacement, the drive adds circa 12-13C to the running temp with continuous smart checks, I would rather have a drive that just performs as it should.

Crucial I will tell them I rather have a refund, and that would allow them to use the ssd I send them back for testing to find their firmware problem, I will keep my other MX500 as a spare.

I will post some more data next week collected from your script.

This all points to me as an issue with either wear levelling or moving data from SLC to TLC. Their code probably has some kind of issue that raises itself with high uptime. I expect as I think you have already posted the smart checks prevent this maintenance from happening which may or may not have other consequences down the line. I wonder what would happen if I did a speed test after a week of continuous smart, would I be writing direct to TLC hmmm.

Scary thought is if this happens on NVME, we wouldnt know as the SMART for that doesnt show erase cycles.
 
Last edited:

dermoth

Distinguished
Dec 18, 2013
2
1
18,515
This seems to be a common theme for the MX500 series - OTOH I have plenty of other Crucial and Micron SSDs (the later brand being an OEM laptop drive sharing same controller/smart as Crucial MX/MB series) that have no issues.

I have all smart attributes logged every day since I installed the MX500 SSD in my computer, and also 4.5years of the 7 years of the previous one (a Crucial M4, half the size so running almost full until the switch). I remember doing some really heavy writes at the beginning of the M4. it doesn't have the same smart attributes but perc life used can be summed up as follow:

First 2.5 years: pct life used went from 0 to 26 or 27 (best guess based on the value after reinstalling). You can understand why I stopped doing heavy IO on my SSD
Next 4.5 years: went from 27 to 35 (8% increase only!)

Then I switched to the MX500, same OS, same usage:

First 1 year pct life went up to 20%, lifetime WAF of 9.11
Next 9 month: pct life went from 20 to 46%, lifetime WAF of 12.78 but 17.79 for the 9mo period.

Another interesting comparison, an older BX300 SSD that is slightly smaller and almost twice as much power-on hours, used on a laptop until it died then on an old desktop. Although I don't have historical smart data for that one I matched it at the wear of the MX500, based on host program count (attr 246) which turned out to be almost 3 weeks before this SSD's 1-year anniversary.

BX300 is at 3% pct lifetime used, MX500 is at 18%
BX300 WAF is at 1.4170 vs. 8.7385 for the MX500
BX300 has written 23.2B LBA sectors (512b), MX500 has written 12.2B

Another interesting metric, my MX500, currently at 48% life used, reports 11.4TBW (converted form the LBAs, attr. 246). My son's Crucial P5 NVME, about 6mo. old, reports 16.3 TBW reports (guaranteed 300 or 5 years). If that's accurate, then I can understand how some ppl ran through an MX500 in its first year.

I contacted crucial after 9 months as I could see the drive was going at best to fail around the end of the 5y warranty. They were trying to argue it's normal back then, so I contacted them a year later when the SSD was clearly going to fail much before that. I had to insist to get it escalated and I'm still waiting for an answer, but tbh I'm tempted to just buy a bigger BX500 (they're running pretty cheap these days) and hopefully stop worrying about this one. Ugh... No. It appears the BX500 is much worse than the BX300, especially for the relative write endurance of the 480GB one.
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
192
14
5,245
[snip]
I am running your script now for data collection, however I have already decided to buy a EVO for the laptop as a long term replacement, the drive adds circa 12-13C to the running temp with continuous smart checks [selftests], I would rather have a drive that just performs as it should.

By "the running temperature" I assume you mean the temperature of the ssd. My desktop pc presumably has much better air flow and cooling than your laptop, and my ssd's temperature rise due to selftests is about 5C.

Is there room in your laptop to add a heat sink onto the case of the ssd? I stuck an M.2 heat sink (about $7 at Amazon) onto the 250GB MX500 M.2 ssd in my laptop -- which doesn't need selftests because the laptop is usually off -- and the heat sink passively cools the ssd by about 5C. (I would have chosen a heat sink with taller cooling fins, but there's only about 1/2 inch of air space between the M.2 ssd and the inside of the laptop case bottom.) If your MX500 is a 2.5" ssd and not an M.2 ssd, you might be able to put a much wider, more effective heat sink on it than I did to my M.2 ssd.

[snip]
This all points to me as an issue with either wear levelling or moving data from SLC to TLC. Their code probably has some kind of issue that raises itself with high uptime. I expect as I think you have already posted the smart checks [selftests] prevent this maintenance from happening which may or may not have other consequences down the line.

NOTE: I don't advise truly continuous selftests. My selftest regime aborts each selftest after 19.5 minutes and then pauses 30 seconds before launching the next selftest.

I'm not too concerned about negative consequences because during most of the 30 second pauses I've seen no indications that accumulated unperformed maintenance operations are finally getting a brief opportunity to run. During most of the pauses, the increase of F8 is tiny (not a write burst).

I wonder what would happen if I did a speed test after a week of continuous smart [selftests], would I be writing direct to TLC hmmm.

The selftest is lower priority than host pc reads & writes so selftests don't reduce short term performance. I don't know whether selftest is also lower priority than compacting to TLC mode any data that was written in high speed SLC mode. So the ssd write speed benchmark test that you suggested might be revealing. On the other hand, it could be very sensitive to the host pc write rate during the week of selftests... if the host pc wrote at a low rate, there won't be much SLC data that needs to be compacted to TLC.

Doesn't the MX500 dynamically allocate NAND to SLC write mode as needed? If the ssd has a lot of available space, it could take a long time (a lot of writing) to fill that space in SLC mode. Before the available space is filled, I wouldn't expect the ssd's write performance to suffer -- in other words, I wouldn't expect the ssd to write "direct to TLC" and make the host pc wait.
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
It allocates SLC which all writes go to, but part of the background activities is moving data from the SLC to the TLC area, but the self tests might be preventing that, it might be that is the buggy background activity, no idea.

I am doing the 30 second pauses as is in your script, and its a SATA SSD with a very tight fit.

These 500 GIG MX500's can hit almost same speed direct to TLC anyway so wouldnt be much of a performance hit.
 

Lucretia19

Reputable
Feb 5, 2020
192
14
5,245
It allocates SLC which all writes go to, but part of the background activities is moving data from the SLC to the TLC area, but the self tests might be preventing that, it might be that is the buggy background activity, no idea.

Below is the last 24 hours of one of my other logs. Each row corresponds to a 30 seconds pause between selftests, and it shows the duration of any write burst that occurred during the pause. Most of the pauses -- the rows that say "none" -- don't include a write burst. This is why I think the selftests don't have a negative side effect. In other words, I'm assuming that if unperformed internal maintenance (such as copying SLC to TLC) is accumulating, then the ssd would use most of the pauses to perform some of that maintenance.

The units of the three right-most columns are seconds.
Date​
Time​
PauseDuration​
BurstStartOffset​
BurstDuration​
10/20/21​
10:10:34.10​
29​
none​
10/20/21​
10:30:35.15​
29​
none​
10/20/21​
10:50:36.12​
28​
none​
10/20/21​
11:10:37.12​
27​
none​
10/20/21​
11:30:38.13​
26​
none​
10/20/21​
11:50:39.11​
25​
none​
10/20/21​
12:10:40.16​
24​
none​
10/20/21​
12:30:34.14​
23​
none​
10/20/21​
12:50:34.13​
29​
none​
10/20/21​
13:10:35.11​
29​
none​
10/20/21​
13:30:36.20​
28​
none​
10/20/21​
13:50:37.19​
27​
none​
10/20/21​
14:10:38.17​
26​
none​
10/20/21​
14:30:39.14​
25​
none​
10/20/21​
14:50:40.11​
24​
none​
10/20/21​
15:10:34.20​
23​
none​
10/20/21​
15:30:34.19​
29​
none​
10/20/21​
15:50:35.14​
29​
none​
10/20/21​
16:10:36.20​
28​
none​
10/20/21​
16:30:37.16​
27​
none​
10/20/21​
16:50:38.15​
26​
none​
10/20/21​
17:10:39.13​
25​
none​
10/20/21​
17:30:40.14​
24​
none​
10/20/21​
17:50:34.12​
23​
none​
10/20/21​
18:10:34.11​
30​
none​
10/20/21​
18:30:35.12​
29​
none​
10/20/21​
18:50:36.10​
28​
none​
10/20/21​
19:10:04.19​
27​
1​
5​
10/20/21​
19:30:38.18​
26​
none​
10/20/21​
19:50:39.13​
25​
none​
10/20/21​
20:10:40.18​
24​
none​
10/20/21​
20:30:34.18​
23​
none​
10/20/21​
20:50:34.14​
29​
none​
10/20/21​
21:10:35.14​
29​
none​
10/20/21​
21:30:36.15​
28​
none​
10/20/21​
21:50:04.15​
27​
1​
5​
10/20/21​
22:10:38.17​
26​
none​
10/20/21​
22:30:39.13​
25​
none​
10/20/21​
22:50:39.13​
24​
none​
10/20/21​
23:10:34.19​
23​
none​
10/20/21​
23:30:34.16​
29​
none​
10/20/21​
23:50:35.15​
29​
none​
10/21/21​
00:10:36.15​
28​
none​
10/21/21​
00:30:37.16​
27​
none​
10/21/21​
00:50:38.14​
26​
none​
10/21/21​
01:10:39.14​
25​
none​
10/21/21​
01:30:40.20​
24​
none​
10/21/21​
01:50:34.15​
23​
none​
10/21/21​
02:10:34.19​
29​
none​
10/21/21​
02:30:35.13​
29​
none​
10/21/21​
02:50:37.16​
28​
none​
10/21/21​
03:10:37.15​
27​
none​
10/21/21​
03:30:38.19​
26​
none​
10/21/21​
03:50:39.10​
25​
none​
10/21/21​
04:10:39.11​
24​
none​
10/21/21​
04:30:34.18​
24​
none​
10/21/21​
04:50:34.15​
29​
none​
10/21/21​
05:10:35.11​
29​
none​
10/21/21​
05:30:04.12​
28​
2​
5​
10/21/21​
05:50:37.19​
27​
none​
10/21/21​
06:10:38.17​
26​
none​
10/21/21​
06:30:05.19​
25​
0​
5​
10/21/21​
06:50:40.12​
24​
none​
10/21/21​
07:10:34.11​
23​
none​
10/21/21​
07:30:01.14​
29​
1​
4​
10/21/21​
07:50:36.13​
29​
none​
10/21/21​
08:10:36.20​
28​
none​
10/21/21​
08:30:37.16​
27​
none​
10/21/21​
08:50:05.17​
26​
1​
5​
10/21/21​
09:10:39.16​
25​
none​
10/21/21​
09:30:40.19​
24​
none​
10/21/21​
09:50:34.12​
23​
none​
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
Just read a post I think you made on reddit, here is my reply to it, tell me what you think?

The post is here.

View: https://www.reddit.com/r/unRAID/comments/gwighw/solution_to_i_can_not_recommend_crucial_ssds_for/fthxdfv/


My reply

you might be on to something with this second theory.

Remember the original 840 from samsung which I believe was their first gen planar TLC? They over estimated the nand capabilities, and it resulted in unreadable data after only a few months of been written, so their eventual fix was to frequently refresh the data, this would have the same side effect as what we seeing here, excessive internal writes.

As you said said pending sectors are caused by read errors that are not yet confirmed hardware errors, I have had one on a WD spindle before, which got cleared when the sector was written to.

The only issue I have though is if selftests significantly slow down the frequency of these data refreshes, one would maybe expect pending counter to be on a non 0 value for much longer periods as the correctional work is been prevented from running by the selftests, so I extend your theory in that this background activity is perhaps also what is detecting the soft errors by routinely checking if data is still readable. Maybe if the error correction controller hits a certian workload or if pending ever goes above 0, it triggers the cycle, then it fully makes sense to me.

I note also these drives are very cheap for what their market reputation puts them at, they had rave reviews everywhere, yet they seem to be constantly on sale. The reason what attracted me to buying two in the first place. It almost seemed too good to be true, and now we know it is.

I have offered my mx500 to a reviewer to see if it can get media coverage, he hasnt made a decision yet, I pointed him to the reddit thread and this thread.

I see you have this as a thoery as well, I missed your post earlier sorry.

https://forums.tomshardware.com/thr...few-bytes-being-written.3571220/post-22477904

The spare mx500 which I will keep will probably be repurposed as a scratch drive for my video editing.
 
Last edited:

Lucretia19

Reputable
Feb 5, 2020
192
14
5,245
Just read a post I think you made on reddit, here is my reply to it, tell me what you think?
View: https://www.reddit.com/r/unRAID/comments/gwighw/solution_to_i_can_not_recommend_crucial_ssds_for/fthxdfv/

I note also these drives are very cheap for what their market reputation puts them at, they had rave reviews everywhere, yet they seem to be constantly on sale. The reason what attracted me to buying two in the first place. It almost seemed too good to be true, and now we know it is.

I have offered my mx500 to a reviewer to see if it can get media coverage, he hasnt made a decision yet, I pointed him to the reddit thread and this thread.

I see you have this as a theory as well, I missed your post earlier sorry.
https://forums.tomshardware.com/thr...few-bytes-being-written.3571220/post-22477904

I wrote that post elsewhere (Unraid forum), and Reddit copied it.

Regarding the resetting of the Current Pending Sectors attribute from 1 to 0, you raise a good question. I haven't been running SMART-monitoring software that alerts me when the CPS attribute becomes 1, so I'm not 100% certain that CPS stays 0 during the 19.5 minutes selftests (and only changes to 1 during 30 seconds pauses between selftests). If it changes to 1 during a selftest, and stays 1 until a write burst during a pause resets it to 0, then I think my logs would show CPS=1 at the beginning of most pauses, and to the contrary CPS=0 at the beginning of nearly all of the pauses. So, here's a 2-part question: If CPS changes briefly to 1 during a selftest, what changes it back to 0? If CPS rarely or never changes to 1 during a selftest, why doesn't it?

If CPS changes to 1 during a selftest, perhaps it gets quickly reset to 0 by a process with higher priority than a selftest, after spawning the lower priority process that writes the huge bursts.

The article about the firmware update to Samsung 840 ssds that caused them to write a lot to refresh cheap NAND didn't say the Samsung writing is as excessive as the Crucial MX500 writing. To totally rewrite a 500GB drive once every few months would require writing about 5GB per day. The following excerpt from my logs in February 2020 (before I began the selftests regime) shows the FTL controller was writing upwards of 10 million NAND pages per day (the "ΔF8" column), which is roughly 300 GB per day (assuming 1 GB is about 37,000 NAND pages). So: Crucial writing extremely excessive. Furthermore, only about 100 GB of the ssd was in use; most of the ssd was free space that should not need to be refreshed. If Crucial's goal was to mimic Samsung's firmware "fix" then Crucial's implementation of the algorithm is terrible.
ΔF7
1 day​
ΔF8
1 day​
Daily WAF
= 1 +
ΔF8/ΔF7​
231,144​
12,894,568​
56.79
260,934​
10,066,176​
39.58
278,028​
16,578,426​
60.63
281,524​
2,807,244​
10.97
230,270​
8,203,271​
36.62
269,722​
14,042,509​
53.06
594,613​
5,228,740​
9.79
352,795​
7,810,689​
23.14
144,904​
12,980,755​
90.58
399,835​
21,234,970​
54.11
229,493​
1,941,470​
9.46
237,292​
8,271,372​
35.86
221,996​
12,748,544​
58.43
262,998​
18,637,064​
71.86
287,574​
6,699,994​
24.30
201,811​
9,833,974​
49.73
275,300​
6,697,966​
25.33

Regarding your "extension" of my theory... I don't think ssds require a background process to detect hard-to-read cells ("soft errors"). I think they're detected whenever ANY read process (host reads, selftests, etc) tries to read them. I don't know whether hard-to-read cells are triggering the bug. Maybe. My theory was that slow-to-read cells trigger the bug... that Crucial pushes cheap NAND to faster speeds than it can reliably handle in order to perform at speeds comparable to the competition.

I don't think selftesting refreshes cell data; I think only writing can refresh it. If the purpose of Crucial's write bursts is to refresh cells -- similar to Samsung's 840 "fix" -- and the selftests are preventing most of the write bursts, and if this prevention is risking data loss, perhaps I could find evidence of data loss by examining the output of a selftest. My selftests software has been throwing away the output of each selftest. It doesn't analyze the selftest output to see if the selftest encountered any errors.

Perhaps Crucial's selftest algorithm is smart enough to arrange for hard-to-read "soft error" cells to be refreshed, by appending their addresses to a list of cells that another process will deal with sooner or later. This would be a good thing, because instead of interfering with the discovery & handling of flaky cells, a selftest would aid their discovery.

Here's a wild theory: Perhaps the FTL controller runs a low priority background task that checks for flaky cells by trying to read cells at a speed that's faster than normal. Or something equivalent, such as measuring how long it takes to read cells and considering a cell flaky if the time exceeds some threshold. This theory could explain why other read processes -- host reads, selftests -- don't set CPS to 1, and CPS goes to 1 (for several seconds) only during pauses between selftests. To prove that the only processes that cause CPS to be set to 1 have lower priority than a selftest, I think someone would need to simultaneously run both of the following for many hours: (1) a selftests regime, and (2) software that monitors CPS at a high speed (once per second?) polling rate for many hours, and discard the CPS data that occurs during selftest pauses. Simplest would be to run nonstop selftests (no pauses) for that test, so that no data needs to be discarded (except perhaps during the brief moments between when a selftest is aborted and the next selftest is nearly immediately started). Not sure that I'll be able to find time to run that test.
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
What I can do is when I swap the SSD, keep the data on there and if I have a spare SATA port, put it in one of my PC's and without using the drive just see if background writes keeps increasing even with it been a non OS idle drive. Not sure if I will have a spare SATA port though.
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
Ok I have some interesting news. It is about my 2nd MX500 drive.

This 2nd SSD is not completely unused, it was temporarily used to boot my main PC before I moved my 970 EVO back in as the boot device, since I stopped using it, it was left in a bay not connected to any cables in the case, its been there for about 3/4 of a year.

Earlier today I powered it up and the data is not fully readable, lots of read errors and low throughput on the files it can read, however it still benchmarks at full speed in crystal disk info, I believe this is because it writes fresh data to use for its read test.

I have left it idle in the 2nd machine, to see what happens, if background activity recovers the read speeds of the existing data and if erase cycles increase.

Also the smart CRC counters dont increase, although lots of pending sectors.
 

Pextaxmx

Reputable
Jun 15, 2020
418
59
4,840
Ok I have some interesting news. It is about my 2nd MX500 drive.

This 2nd SSD is not completely unused, it was temporarily used to boot my main PC before I moved my 970 EVO back in as the boot device, since I stopped using it, it was left in a bay not connected to any cables in the case, its been there for about 3/4 of a year.

Earlier today I powered it up and the data is not fully readable, lots of read errors and low throughput on the files it can read, however it still benchmarks at full speed in crystal disk info, I believe this is because it writes fresh data to use for its read test.

I have left it idle in the 2nd machine, to see what happens, if background activity recovers the read speeds of the existing data and if erase cycles increase.

Also the smart CRC counters dont increase, although lots of pending sectors.
the JEDEC standard specifies 1 yr / 30C right? It would be interesting to know whether MX500 fails to satisfy. Do you have estimation of temperature of the box your MX500 has been sitting in?
 

Diceman_2037

Distinguished
Dec 19, 2011
53
3
18,535
it does indeed appear this is resolved in the MX500's with the new controller/firmware.

unknown.png


There hasn't been any significant increases to F8 that weren't linked to F7s.
 

chrysalis

Distinguished
Aug 15, 2003
145
4
18,715
it does indeed appear this is resolved in the MX500's with the new controller/firmware.

unknown.png


There hasn't been any significant increases to F8 that weren't linked to F7s.

Thats good, when did this come to market? I brought both my MX500s in early 2021, I feel they should have recalled existing unsold stock to replace the firmware. I also think they should offer newer firmware drives as replacement to those who query their tech support about it, instead I was told tough luck.

The more I think about it the more I hate what they have done, a new controller should mean a new model number, but then the problem they have is everyone who knows about the issue avoids MX500 as they know it has the old firmware, instead they buy and just hope they get the new firmware.
 
Last edited:

fishyjack

Commendable
Jul 21, 2021
37
1
1,535
Since this thread seems to be somewhat active, I'll ask my question here. I guess I'm parroting chrysalis' question, if anyone knows how long this apparent fix has been in circulation?

I'm looking into buying a MX500 2tb to replace an 870 EVO 2tb that died on me after only 6 months of light usage, as a storage drive for games/photoshop/music/art/etc. I was very close to ordering a MX500 until I stumbled upon this. Is it still worth buying, even if it might be a luck of the draw in getting a bugged drive or not?
 
Last edited:

USAFRet

Titan
Moderator
Just curious - why do you choose to replace it with another brand instead of going through a warranty replacement? How did it die?
I agree.
Warranty replacement would be the go to thing here.
Death of a single device is not a condemnation of the entire line or brand.

If that were the case, I'd have no drives at all.
WD, Seagate, Toshiba, SanDisk....I've had at leat one of all of those die in the last few years. All except the too old Seagate replaced under warranty.
 

fishyjack

Commendable
Jul 21, 2021
37
1
1,535
Just curious - why do you choose to replace it with another brand instead of going through a warranty replacement? How did it die?
That's an interesting story! I did RMA it nearly 2 months ago. I got my refund on the 20th. November 7th I was playing a game on steam and the game completely froze, I alt-tabbed and killed it via task manager but started to notice other games on my storage drive were loading quite slowly. I checked CrystalDiskInfo and saw that it's bad sector count was at 3 and at every re-scan it was spitting out more and more ECC Error Rate/Uncorrectable Error Count alerts. Within the span of a few hours, the bad sector count went from 3 to 8 to 10 finally stopping at 16, with the error alerts settling around 874. Samsung Magician was singing the same tune and failed repeatably to 'repair' the drive. Chkdsk didn't help either. I managed to back up all of my important files in time, zero'd the drive and started the RMA. Sent them all of my info and the drive. About 2 weeks later they basically said "yeah this thing is FUBAR" and refunded me instead of replacing it. All in all, that drive only had barely 2 TBW and was at the time about 40% full.

Normally I would just say welp I got a lemon, it happens and order a new one. I bought mine in June, it's manufacture date was January/2021. I did a bit of looking around and discovered quite a few people reporting the same exact issue as myself, with eerily identical symptoms. They had all purchased the drive around the same time as myself, and they all died around the same time too. IDK if it was just a bad gen 1 batch or what but the coincidence weirded me out a little.

That being said I'm using an 870 EVO 256gb as my boot drive and it's been trucking along with no issues so far.
 
Jan 5, 2022
1
0
10
Just wanted to thank Lucretia19, and others in this thread, for all the work they’ve done to expose and document this issue, and also to corroborate that the latest CR043 revision of this drive doesn’t appear to be affected (I’ve been tracking the daily WAF for a few days now).

Ironically, I had switched to an 870 EVO six months ago, mostly due to this, but then that drive started failing on me a few weeks ago (similar to the recent poster, with escalating drive errors). In the process of going back to my CR023 MX500, I noticed some minor issues with That drive, and figured I might as well try to RMA and see if I get a newer revision. Both were 1TB models used in a home server that’s on 24/7.

As an aside, having now done both a Samsung and Crucial RMA for the first time, at the same time, in case anyone was curious: Mailed both out on the same day, received replacements on the same day a week later. Samsung offered free return shipping, and sent back a refurbished-labeled drive. I had to pay to ship the Crucial drive back myself, but received a new-in-box retail drive.

So, if you’re affected by this issue, I highly recommend an RMA if possible, though there’s probably some luck involved with the replacement drive.