SataRAID keeps dropping a good drive??

tenlbham

Distinguished
Oct 7, 2011
3
0
18,510
I just upgraded my RAID 1 storage set of HDDs (not the boot drive) from two 250MB to two 1TB drives within their hot-swappable enclosures. I did a hot swap of one drive to see if my RAID controller (Sil3512 PCI card) would allow me to expand the partition, but it did not, it just rebuilt the 250MB partition on the new 1TB drive

So I powered down, put both TB drives in, booted, reformatted, and recovered data from a backup drive. Then the RAID controller dropped that first drive from the set. I removed/reinstalled, swapped drives between the drive bays, reformatted, but it still kept dropping that drive from the RAID set.

Thinking there was something wrong with the drive, I removed it and connected directly to the mobo, booted, up, and it came up just fine.

I put the original 250MB drives back in the RAID set and they worked just fine.

I even swapped cables between the SATA enclosures and the RAID PCI card. Still no luck.

What could be causing the controller to keep dropping a drive that appears to be otherwise functional? Did I mess up the drive by first hot-swapping it into the set even though it seems to work properly as a standalone drive?



Windows XP SP3, 2.5GHz AMD Athon XP, 4GB RAM,
Sil3512 PCI RAID controller, Kingwin SATA hot swap enclosures, 2x Hitachi 1TB SATA/300 7200RPM 32MB buffer HDDs.
 
When you use consumer hard drives in redundant RAID volumes you can run into problems if the drive has trouble reading data from a particular sector. If the drive takes too long while it's retrying the read over and over again the RAID controller can decide that it's not responding and declare it to be failed.

Enterprise-class drives are designed with "TLER" (Time Limited Error Recovery) to avoid this problem. If they can't recover the data from a sector in just a few tries, they report a read error back to the RAID controller. This works well in a redundant RAID set because when the controller receives the error report it simply reads the same data from one of the other drives and writes it back to the drive that reported the problem. That causes the drive to reallocate the data into a spare sector and then everything carries on as if nothing had happened.
 

tenlbham

Distinguished
Oct 7, 2011
3
0
18,510
Is there any way around this short of replacing the "problem" drive in hopes that the replacement doesn't have any bad sectors? For instance, is there a HDD utility that will fix the problem?

Thanks!
 
There were some drives which could have new firmware loaded to enable TLER, but I think that option has been removed from most of the newest ones. If you do a Google search for "TLER" and your hard drive model number you might find some more information on it.

It's may be that this doesn't happen to be the issue you're having, but it is a possibility.
 

tenlbham

Distinguished
Oct 7, 2011
3
0
18,510
Well I found some new RAID drivers for the controller card, updated them, and everything seems to be kosher now. What a PITA though to have to go through all that trouble because the controller worked perfectly for 250GB drives but barfed on ONE of the 1TB drives.

ALWAYS check for the latest drivers :cry: