Question Replace drive in RAID, or wait for it to fail?

Status
Not open for further replies.

loonsailor

Reputable
Nov 21, 2018
6
0
4,510
I have a RAID in an OWC box on a Mac Studio, using SoftRaid. It consists of 4 Samsung SSD 870 EVO 1TB drives in a RAID-4 configuration (robust to any single drive failure). The drive still works, and SoftRaid shows no error, but I'm getting "pre-fail" messages about two of the drives both from TechTool (which I'm not sure I trust) and from DriveDX (which I think I do trust). Drive DX shows the following for the two drives, both of which have about 10,000 hours on them.


Code:
=== PROBLEMS SUMMARY ===
Failed Indicators (life-span / pre-fail)  : 0 (0 / 0)
Failing Indicators (life-span / pre-fail) : 0 (0 / 0)
Warnings (life-span / pre-fail)           : 1 (0 / 1)
Recently failed Self-tests (Short / Full) : 0 (0 / 0)
I/O Error Count                           : 0 (0 / 0)
Time in Under temperature                 : 0 minutes
Time in Over temperature                  : 0 minutes


=== IMPORTANT HEALTH INDICATORS ===
ID  NAME                                         RAW VALUE                  STATUS
  5 Retired Block Count                          9                          98.9% Warning
177 Wear Leveling Count                          10                         99.0% OK
179 Used Reserved Block Count Total              9                          98.9% OK
181 Program Fail Count                           0                          100% OK
182 Erase Fail Count                             0                          100% OK
241 Total LBAs Written                           10,374,485,073 (5.3 TB)    99.0% OK

Code:
=== PROBLEMS SUMMARY ===
Failed Indicators (life-span / pre-fail)  : 0 (0 / 0)
Failing Indicators (life-span / pre-fail) : 0 (0 / 0)
Warnings (life-span / pre-fail)           : 1 (0 / 1)
Recently failed Self-tests (Short / Full) : 0 (0 / 0)
I/O Error Count                           : 0 (0 / 0)
Time in Under temperature                 : 0 minutes
Time in Over temperature                  : 0 minutes


=== IMPORTANT HEALTH INDICATORS ===
ID  NAME                                         RAW VALUE                  STATUS
  5 Retired Block Count                          37                         95.6% Warning
177 Wear Leveling Count                          7                          99.0% OK
179 Used Reserved Block Count Total              37                         95.6% OK
181 Program Fail Count                           0                          100% OK
182 Erase Fail Count                             0                          100% OK
241 Total LBAs Written                           10,374,504,317 (5.3 TB)    99.0% OK

I don't know enough about SSDs to know how serious this is, or how imminent one or both of these might be to failure. The RAID is robust to a single drive failure, but not to two drives failing. So, the question is, should I replace one or both of them now, or wait?

Thanks for any advice on this!
 
I would argue it's better to replace things sooner rather than later. If one drive is about to fail, another might start to fail shortly after. Especially if they were purchased around the same time. So the longer you drag this out, the higher the chance you'll have another failure sooner than you'd like.
 
Retired blocks is not a good sign. You'd better have a good backup. With 2 drives showing signs of failure and having only 1 drive redundancy it's very possible that replacing one drive will stress the other to the point of failure during the rebuild and cost you the entire array.
 

loonsailor

Reputable
Nov 21, 2018
6
0
4,510
Yikes! Thanks for the (scary) replies. I back up with IDrive, but I guess it’s time to sync to a separate hdd and rebuild. Then to send these drives back to Samsung for warranty. They shouldn’t have failed after only a year!
 
Status
Not open for further replies.