Read errors on RAID-1 -- how possible and how to repair them?

Yigals

Prominent
May 29, 2017
7
0
510
I am running RAID-1 on Asustor NAS (which runs Linux inside).
Recently I got many sector reading errors on the second disk while the first disk is reported 100% free of errors. The error reports were produced when I accessed some files, then I made a full scan of each disk to see that disk 1 is free of errors and disk 2 has 200 damaged sectors.

Now, what is Really STRANGE for me - when I read some files stored on this RAID, I get errors even though one of the disks has no errors! How comes? Is not RAID in this case supposed to be able to read the data from the not-damaged sectors of the first disk and then even write them back to the second disk restoring the health of my system? How can it be technically possible?

Thanks, Yigal
 

Yigals

Prominent
May 29, 2017
7
0
510
I resolved my problem (partially). The errors were probably related to some timeout because of the long hangs of the bad sector reading, but no I/O error was probably reported by the RAID, it could correctly access the other copy for each broken sector.

I am still puzzled, though, why the RAID seems not to rewrite the bad sectors after fetching the correct data from the other disk.
When I reread the same file, I get errors on the same sectors, same log delays..

Any help with that?
 

Paperdoc

Polypheme
Ambassador
Self-correction of corrupted files on the disks of a RAID1 array is not necessarily automatic. In fact, I doubt that ever was part of the RAID1 design. The design was just what you observe: if reading a file from one disk fails for whatever reason, the second disk is read to get the other copy. End of story.

Now YOUR task is, knowing that you have one disk with significant errors in its files, you need to fix that. FIRST, consult the manual for your RAID system to find out what tools it has, and how to use them. Many have a tool to "break" the RAID1 array - that is, to separate the two disks so that each is a separate "regular" drive containing all the files and usable as a stand-alone drive. If you do that then you must pay attention to what the system has told you so you can identify the faulty HDD unit. Then you have to decide: is the HDD repairable in some way, or are you better to simply replace it. Once you have a "good" unit mounted in place of that failed HDD, then you need another tool. (Obviously, you need to verify you have this and how to use it BEFORE you start on this journey.) Most RAID1 systems will allow you to Restore a RAID1 array after replacing one HDD unit. This step basically copies all the good info from the old good drive to the new replacement unit, then re-establishes the RAID1 array system so it performs as before but without errors.
 

RolandJS

Reputable
Mar 10, 2017
1,230
21
5,715
If possible, usb or dvd boot any backup/restore/clone program and make full image backup of your OS and your Data partitions onto a usb external hard-drive. I have never backed up a RAID before, so I do not know how to walk you through that.
 

Yigals

Prominent
May 29, 2017
7
0
510
My NAS is running linux, so I expected the RAID to recover from singular sector error, as https://linux.die.net/man/4/md says:

"a read-error will instead cause md to attempt a recovery by overwriting the bad block. i.e. it will find the correct data from elsewhere, write it over the block that failed, "


How can it be otherwise? Suppose I buy 4 disks of 10Tb each, make RAID-5 of them, then slowly some singular errors will appear on some disk. The probabilty is high as the disks are very big and i have many of them. What then? Throw it away? If yes, I cannot aford buying another 10Tb disk any time one sector gets bad. And if I leave the disk in the RAID, when some other disk breaks down completely, I will not be able to rebuild the RAID as I have no redundancy anymore and one sector on the first disk is unreadable. Sounds like a lose-lose situation...

I am much concerned with whether buying a NAS was a good idea at all and whether it worth buying big discs for it rather than with restoring my current data. It looks for me that without bad sector restoration the thing is not worth anything.
 

USAFRet

Titan
Moderator


If it is a physical fail of the sector on the drive, software can't fix that.
It may attempt to relocate that data, but the actual drive is failing. And will only get worse.
A RAID 5 will (hopefully) survive the loss of one drive.
A 4 x 10TB RAID 5 will also not be 40TB, but rather ~30TB.

If a drive is failing, it is failing. 1TB, 5TB, 10TB...doesn't matter.
Replace.
 

Yigals

Prominent
May 29, 2017
7
0
510
USAFRet> If it is a physical fail of the sector on the drive, software can't fix that.
It may attempt to relocate that data, but the actual drive is failing. And will only get worse

Well, I have no expertise with hard drives, but let me see exactly what you are claiming.
Is it that after one single unreadable sector is detected, I am supposed to throw away the hard drive because it means for sure that it is failing and things will get worse and worse? Is it applicable to desktop disks as well?
 

USAFRet

Titan
Moderator


A bad sector is not going to get better, and may be the first indication of getting worse.
It is a physical fault on the platter surface.
This is what warranties and actual backups are for.

If it dies within the warranty period, send it back.
Obviously, after running whatever diagnostics the manufacturer points you to, and in consultation with them.
Recover your data from whatever backup routine you do.

If out of the warranty period...well, it lasted for a few years.

For instance, the last drive I had die was a WD Green 3TB. Started acting up at about 5 weeks. Past the Amazon 30 day no question return window.
A little back and forth with WD and diagnostics, and they sent me a new one. As would be expected.

In the short and mid term, drive fails are not really that common.
In the long term...all drives eventually die.
 

RolandJS

Reputable
Mar 10, 2017
1,230
21
5,715
Because you're operating a RAID NAS, I cannot recommend GRC's Spinrite that would simply lockout any discovered bad sectors. On nonRAID computers, one would simply pull or isolate the problem drive, run Spinrite.
 

Yigals

Prominent
May 29, 2017
7
0
510
USAFRet> A bad sector is not going to get better, and may be the first indication of getting worse.
It is a physical fault on the platter surface. This is what warranties and actual backups are for.

A bad sector with a physical damage may be remapped by the disk firmware to the spare region. Then the disc may continue its normal life with 100% of sectors healthy. Now, I much hope that my disk with 200 unreadable sectors will be replaced by warranty, but if I had had only one broken sector, I have no hope WD or any other company warranty department would care. And I would be then left with the same problem - either to throw away a disk which may be 99.999% healthy or to leave it inside the RAID without the relocation and recovery - which makes my RAID-1 (or RAID-5 if I had it) to fail rebuild if in the future another its disk fails completely. That is why I am concerned why my RAID seems not to recover the bad sector.

Now, backups are good, but to turn to them to recover the RAID which failed the rebuild means merely a _failure_ to maintain normal redundancy which RAID is made for.
 

USAFRet

Titan
Moderator
No, 1 bad sector would not trigger a warranty replacement.
However...that drive is unlikely to get 'better'.

Yes, the software and RAID will work/may around that and reallocate. However, that drive is now suspect, and will be needing replacement before the others.

It does not 'fix' the bad sector. Just ignores it.
 

Yigals

Prominent
May 29, 2017
7
0
510
I understand it does not fix the sector physically. But the disk firmware remaps the sector logically onto another spare healthy zone, then the computer or RAID driver/controller may continue to use the "same" sector normally. This is exactly what I need to see on my RAID, but it does not do it either because of the RAID driver or because of the faulty software of the disk or becouse of some other reason -- I have no idea meanwhile.
 

Yigals

Prominent
May 29, 2017
7
0
510
RolandJS> On nonRAID computers, one would simply pull or isolate the problem drive, run Spinrite.

Well, may be I may use it just to force my bad sectors to relocate. Not to really fix this bad disk but in some other case in the future. Who knows, may be they even will be so nice as to not to spoil the info on the disk taken from the RAID. Looks good, never heard of this utility before. Thank you for the info.

After thinking a bit more... it might be dangerous for the RAID to return such a recovered disk back, as previosly bad sectors will be now probably good but zeroed. Still worth considering for mere repairing of the disk...
 

TRENDING THREADS