Question RAID 10 disk failure and reallocated sectors count ?

rcfant89

Distinguished
Oct 6, 2011
547
3
19,015
I noticed my RAID 10 was being slow after a windows update, very slow. I rebooted and it was also slow. I ran the dell diagnostics and saw "Hard Drive 6 - S/N xxxxxxx, self test did not complete [Validation Code 70338]". I figured the drive died.

I replaced the drive and it's automatically rebuilding. I figured I should check the rest of the drives for health while I was at it. Three of the drives (including the bad one that I was replacing) had a yellow caution icon on CrystalDiskInfo but only one has anything in "Current Pending Sector Count" and "Uncorrectable Sector Count" and that was disk 6, the one that failed or was failing.

After replacing that one, six of the eight are "Good" and two have the "Caution" icon with a raw value of 8 "reallocated sectors count" and 10 for the second disk. That number seems pretty low, no?

The drives are Seagate Enterprise 10 TB so prob. cost another 500 bucks to replace these two as well. How bad are those counts? Is my disk critical or just slowing down a little bit?
 
After replacing that one, six of the eight are "Good" and two have the "Caution" icon with a raw value of 8 "reallocated sectors count" and 10 for the second disk.
That number seems pretty low, no?
8 and 10 relocated sectors is low. Yes.
Problem is, when relocated sectors start appearing, they tend to grow. And sometimes can grow fast.
You have to watch/monitor them closely.
 
  • Like
Reactions: cruisetung
Any bad sectors visible means that the drive is actively failing. Hard drives ship with a hidden table of spare sectors that is used internally by the drive to map out failed sectors invisibly. Only when that table is exhausted do you actually see the bad sectors actually appear. Usually by that time it's well past time to replace and discard the drive as there is nothing you can do fix the drive.
 
  • Like
Reactions: cruisetung
After replacing that one, six of the eight are "Good" and two have the "Caution" icon
If 3 out of 8 drives are throwing up errors, the other drives may be on their way out. They might last another 5 years or drop dead tomorrow, especially if all 8 drives were bought from the same supplier and are from the same batch.

I recommend running long S.M.A.R.T. tests on each drive in the array (for several hours), especially if you don't have the data backed up elsewhere.

Before commiting any drive to my TrueNAS Core RAID-Z2 arrays, I perform a full (non destructive) surface Read Test in Hard Disk Sentinel. It takes roughly 2 hours per Gigabyte and generates a detailed easy to read surface map, showing how many re-tries were necessary on each block.
http://harddiscsentinel.helpmax.net/en/hard-disk-tests/surface-test/

When a drive shows signs of failure, I run a full destructive Hard Disk Sentinel Write test, followed by a Read test, then decide whether to junk the drive or move it to some unimportant non-critical system.

It's not worth risking important data on failing drives, even when you have multiple backups. If your RAID system contains the only copy of your data, you may be living on borrowed time. Keep your (other?) backups current, in case the main RAID dies.
 
Any bad sectors visible means that the drive is actively failing. Only when that table is exhausted do you actually see the bad sectors actually appear.
Bad sectors? Did you mean bad blocks?
Bad block is file system concept. Usually this means block being located on a pending sector.
OS could not read this block and has marked block to be bad/unavailable in file system.

Bad blocks do not mean all spare sectors for relocation are exhausted. For that you have to examine SMART relocated sectors values.
If relocated sectors normalized value is below threshold, then spare area is exhausted and drive considered failed.
 

TRENDING THREADS