Question RAID 10 disk failure and reallocated sectors count ?

rcfant89

Distinguished
Oct 6, 2011
551
3
19,015
I noticed my RAID 10 was being slow after a windows update, very slow. I rebooted and it was also slow. I ran the dell diagnostics and saw "Hard Drive 6 - S/N xxxxxxx, self test did not complete [Validation Code 70338]". I figured the drive died.

I replaced the drive and it's automatically rebuilding. I figured I should check the rest of the drives for health while I was at it. Three of the drives (including the bad one that I was replacing) had a yellow caution icon on CrystalDiskInfo but only one has anything in "Current Pending Sector Count" and "Uncorrectable Sector Count" and that was disk 6, the one that failed or was failing.

After replacing that one, six of the eight are "Good" and two have the "Caution" icon with a raw value of 8 "reallocated sectors count" and 10 for the second disk. That number seems pretty low, no?

The drives are Seagate Enterprise 10 TB so prob. cost another 500 bucks to replace these two as well. How bad are those counts? Is my disk critical or just slowing down a little bit?
 
After replacing that one, six of the eight are "Good" and two have the "Caution" icon with a raw value of 8 "reallocated sectors count" and 10 for the second disk.
That number seems pretty low, no?
8 and 10 relocated sectors is low. Yes.
Problem is, when relocated sectors start appearing, they tend to grow. And sometimes can grow fast.
You have to watch/monitor them closely.
 
  • Like
Reactions: cruisetung
Any bad sectors visible means that the drive is actively failing. Hard drives ship with a hidden table of spare sectors that is used internally by the drive to map out failed sectors invisibly. Only when that table is exhausted do you actually see the bad sectors actually appear. Usually by that time it's well past time to replace and discard the drive as there is nothing you can do fix the drive.
 
  • Like
Reactions: cruisetung
After replacing that one, six of the eight are "Good" and two have the "Caution" icon
If 3 out of 8 drives are throwing up errors, the other drives may be on their way out. They might last another 5 years or drop dead tomorrow, especially if all 8 drives were bought from the same supplier and are from the same batch.

I recommend running long S.M.A.R.T. tests on each drive in the array (for several hours), especially if you don't have the data backed up elsewhere.

Before commiting any drive to my TrueNAS Core RAID-Z2 arrays, I perform a full (non destructive) surface Read Test in Hard Disk Sentinel. It takes roughly 2 hours per Gigabyte and generates a detailed easy to read surface map, showing how many re-tries were necessary on each block.
http://harddiscsentinel.helpmax.net/en/hard-disk-tests/surface-test/

When a drive shows signs of failure, I run a full destructive Hard Disk Sentinel Write test, followed by a Read test, then decide whether to junk the drive or move it to some unimportant non-critical system.

It's not worth risking important data on failing drives, even when you have multiple backups. If your RAID system contains the only copy of your data, you may be living on borrowed time. Keep your (other?) backups current, in case the main RAID dies.
 
Any bad sectors visible means that the drive is actively failing. Only when that table is exhausted do you actually see the bad sectors actually appear.
Bad sectors? Did you mean bad blocks?
Bad block is file system concept. Usually this means block being located on a pending sector.
OS could not read this block and has marked block to be bad/unavailable in file system.

Bad blocks do not mean all spare sectors for relocation are exhausted. For that you have to examine SMART relocated sectors values.
If relocated sectors normalized value is below threshold, then spare area is exhausted and drive considered failed.
 
Any bad sectors visible means that the drive is actively failing. Hard drives ship with a hidden table of spare sectors that is used internally by the drive to map out failed sectors invisibly. Only when that table is exhausted do you actually see the bad sectors actually appear. Usually by that time it's well past time to replace and discard the drive as there is nothing you can do fix the drive.

Ok gotcha so there are a number of extra hidden sectors that the disk will use when they start going bad (because I guess that's normal) and then once that fills up completely, then it starts counting off with new bad sectors? So basically enough already failed to fill the hidden space and then it's now eating into the "production" space, if you will. That makes sense to me, if that's correct.

So given that, it definitely sounds like as soon as you see even 1 then it's time to replace the disk (because a bunch already failed in the hidden part before).
 
Drives don't heal themselves. They only get worse.
And sometimes, very quickly.

For sure. So follow up question: is there a problem with mixing and matching, specifically:

I've got older Seagate Exos 10 TBs. The "new" Exos have a different look to them, I am sure that's not a big deal but how about going to Seagate Ironwolf/Pro, for example?

I replaced my bad disk, and I want to order two more, but the pricing of the 10 TBs seems crazy. Amazon seems to be all sold out of the 10 TB varieties, and the 10 TB on newegg basically costs the same as the 14 TB ($226 to 223). Seems crazy to me, I guess everyone thinks 10 TB sounds like the ideal amount and so demand is way higher? Idk.

But if they have some inventory for the Ironwolf Pro or another 10 TB enterprise Sata 7200 rpm HDD, is there any big deal with that? I know it's going to be "lowest common denomenator" type deal, I think the Exos are a LITTLE BIT higher quality than the IWP? But besides that, essentially won't really be anything noticeable and should be fine, right?

Or is there another enterprise grade brand to look at? I saw the WD gold I think it was but they wanted like 30 bucks/TB, where you can get about $17.50/TB with the 20 TB IWP. And I know the excess space be reduced to the lowest common denom so ideally if I can just get 2 more disks that are 10's at the same price per TB (or close), that would be the cheapest way to go.

I could replace the array with 4x20's for the same array size but that would cost me 1500 bucks AND cut my performance in half (from 8x/4x to 4x/2x).

Also they have plenty of recerts but I am thinking that those drives would be old and shit, probably die in a year or two. They don't seem regulated at all and like playing roulette. I'd like to get value for my money but I am good to spend an extra few bucks/TB to get new. The discount seems like shit tbh, they are still charging 14 bucks/TB for recerts, why even buy recert at that point?
 
Mix/match make/model?

Actually, that is recommended for a RAID array, as long as they are the same capacity.

For instance, a 6 drive array.
Buy all 6 at the same time/make/model. If one dies....chances are strong that the others may go quite soon.

Get some different ones, and spread the possibility of death around.

For make/model? For the singular devices you will own, all about the same.
My most recent dead HDD was a 16TB Toshiba Enterprise. Died at 7 months old.
Its warranty replacement is still going strong, 4 years later.

Other drives in my NAS...4x 4TB Seagate Ironwolf. Zero issue.
 
Mix/match make/model?

Actually, that is recommended for a RAID array, as long as they are the same capacity.

For instance, a 6 drive array.
Buy all 6 at the same time/make/model. If one dies....chances are strong that the others may go quite soon.

Get some different ones, and spread the possibility of death around.

For make/model? For the singular devices you will own, all about the same.
My most recent dead HDD was a 16TB Toshiba Enterprise. Died at 7 months old.
Its warranty replacement is still going strong, 4 years later.

Other drives in my NAS...4x 4TB Seagate Ironwolf. Zero issue.
Word, sounds good. Turns out Amazon prices were messed up for the 10 TB for some reason, maybe stock issue?
I found this on Newegg:

ST10000NM001G​

It's 185/each so pretty in line with what I was looking for, Seagate Exos X16 ST10000NM001G 10 TB 7.2k SATA.
It does NOT match my existing disks (as those are probably, idk 8 years old?). Those are Seagate Exos X10 ST10000NM0016 10 TB 7.2k SATA.

They also have the "

Seagate Exos X10 10TB 512e SATA 6Gb/s 7200 RPM 3.5-Inch Enterprise HDD (ST10000NM0086)"​

Which LOOK more "heavy duty" (looks like more metal/more solid housing and more like the disks that I currently have) but specs look like it's a slightly worse drive. I dunno. Probably going to go with the

ST10000NM001G​

unless there's a reason to go with the other...
 
Side note: it's weird that the two drives with the lowest power on count and power on hours are the ones failing. It's 66 and 513 power on count and 30,105 hours and 38,245 hours for the ones failing compared to the higher end which is 1601 power on count and 53k hours.
 

TRENDING THREADS