[SOLVED] Help understanding SMART attributes with drive taken out of service by RAID.

macphoto

Honorable
Dec 8, 2016
22
1
10,525
I have 4 WD 4TB Black Enterprise Edition models running in a RAID 5 with a Highpoint RR2721 controller. I checked the status and found it had been critical going back to 3/13/20 when there were 15 read errors on my #4 disk over an 8 minute period. That drive was then taken out of service. I removed the drive and ran the extended test using WD's Data Lifeguard. This is supposed to test every block/sector on the drive which passed. I reinstalled the drive and the RAID is now rebuilding. My question is about understanding the SMART attributes reported by Highpoint's Storage Health Inspector. The columns are: Threshold, Worst, Value, and Status. All the Statuses are OK.

What does "Threshold" mean? Is this the point below which one starts to have problems?

Is "Value" the current value for the drive? For all but two attributes, Worst and Value are identical. I've read that higher values are better but I can't reconcile this with most of the Worsts and Values being identical, yet the drive status being ok.

Thanks for your help.


 
Solution
The optimal value for each attribute is variable. For some it is 200, for others it is 100. Then there are attributes such as Spin-Up Time where 100 is a baseline rather than an "optimal" value. Temperature attributes also have their own format. An initial value of 255 (or 253) is used by some drives to indicate that the statistics for that particular attribute are not yet significant.
Think I've found the answer.

Value = Current Value
Worst = Worst value ever recorded for the drive
Threshold = the point below which the drive is considered out of spec.

The reporting range for my drives is 0-255. So when Value and Worst are the same, e.g., 200, it means that the drive has declined from an optimal 255 (it's at its worst) but is still fine.
 
The optimal value for each attribute is variable. For some it is 200, for others it is 100. Then there are attributes such as Spin-Up Time where 100 is a baseline rather than an "optimal" value. Temperature attributes also have their own format. An initial value of 255 (or 253) is used by some drives to indicate that the statistics for that particular attribute are not yet significant.
 
Solution
The optimal value for each attribute is variable. For some it is 200, for others it is 100. Then there are attributes such as Spin-Up Time where 100 is a baseline rather than an "optimal" value. Temperature attributes also have their own format. An initial value of 255 (or 253) is used by some drives to indicate that the statistics for that particular attribute are not yet significant.
Thanks, that's very helpful. My set of drives is over five years old, so there is reason to be concerned. As I click from drive to drive, the Smart Attribute values barely change. I suppose that's good.

My underlying question was really "Is it safe to keep using this drive?" I've been backing up via Backblaze with a local copy for the most important data. I assumed that I'd probably lose the entire RAID, so before testing the drive in question, I copied everything to an external device. The only glitch was a Windows message that two music files were missing. Maybe those were the read errors from back in March?

The good news is that drive capacities have increased so much that I can replace all off them with one or two disks. I just don't want to spend the bucks this very minute.