Archived from groups: alt.comp.periphs.mainboard.asus (
More info?)
The reason for recommending pulling out the second disc is that EG the
adaptec controller will know if it has marked a disc as failed but thats
about it. If you pull a working disc out of one system for a replacement in
RAID 1 in another where there is no failure, the adpatec controller plays
dumb. If the replacement drive goes into Port 0 (IE first boot) and is
otherwise healthy but not from this RAID then it will happily boot off it
and not complain loudly that its own configured raid configuration is
broken. You have to be observant and note that yourself. You also have to
know which disc is which and be sure to pull out duff discs only, make sure
the good disc is in Port 0, make sure the replacement has been zotted (IE
zeroed, not just fdisked) so it does not recognise it as a useable disc and
that it is eligible as a Hot Spare........
Thats the adaptec. The Intel ICH5R is a lot more sensible. The logic in the
adaptec is fine for controllers that have more than 2 ports - it is lousy
where there are only 2 ports and a broken raid 1 = 1 good disc and 1 bad -
it is not smart enough to realise it should a) SHOUT about broken RAID, b)
protect the 1 good disc (it does by making the whole job more picky and by
allowing the rebuild in windows only where it can see if a disc has
partitions, active partitions, windows partitions and so protect those by
refusing to mark a disc as a spare automatically - hence my words about
zeroing a disc)...
Add to this the fact that the adaptec controller interface is Web based with
a fat JAVA VM and no windows based alerting and frankly, compared to the
Intel, the Intel appears the better of the two.
With Spinrite (not too faimiliar with it). Its purpose is to detect and
repair sector level issues and may well be a good tool for RAID 1. I would
not be surprised if it did stuff up a RAID 1 disc though as there is no
knowing where the RAID 1 on disc config is stored. But then it is likely
only stuffing up a RAID 1 config by correcting a disc that is in itself
wrong already so otherwise stuffed.
A: What do I mean by Do Anything? Spinrite doesn't set out to alter data -
it corrects sectors or allocates alternate sectors if one is found to be
dying which is fine. You will likely have to rebuild if it detects errors
and corrects them as sector contents could change (I am not sure if it even
does this), but spinrite will not do anything unless it detects a media or
some other error.
WRT RAID 1 failures recently reported - Yes it is bad. A working RAID 1 was
not broken... (but this happens). In 1 case, the person had a system that
failed repeatedly. The system had a memory issue that was not resolved until
it was obvious that 1 disc was damaged. The person tried to rebuild (without
addressing the memory issue) and in doing so (with a new disc and using the
remaining supposedly good disc) they found that the good disc had issues
part way through. In another case, an IBM system with OEM Adaptec
controller, basically the same outcome - how it got there I know not, but
the advice from IBM was to file copy the data off, reformat and rebuild the
disc then rebuild the array. In this case the issues were perhaps due to
some hardware issue.
The advice in both situations above was to File Copy the contents off (IE
normal backups). So to reiterate what must be said repeatedly, RAID 1 is not
a substitute for backups, it is there to prevent system failure in the event
of drive failure. It is absolutely paramount to monitor RAID volume health.
A few rather obvious statements: SATA controllers do not like overclocking.
They often fail at the slightest overclock. So if you run RAID, *never*
overclock especially the PCI bus if the SATA controller hangs off it.
RAID is there for resilience: The basic systems must be reliable first and
foremost. A system that experiences repeated crashes due to memory issues is
inviting RAID corruption.
Have a good backup regime. If you using a caching RAID controller, you
really must have a UPS or battery backup in the controller, or disable write
caching, or have a controller that honours file flush requests* - many do
not and result in file corruption in the event of power failure. * Finding
out if a controller honours flush requests prior to purchase is near
impossible. Vendors do not answer questions like that, nor do they readily
reveal if write caching can be disabled. Thankfully the Adaptec SCSI
controllers used here do - the full fledged controllers are stunning - hot
plug and do raid builds, repairs, and conversions at run time without the OS
knowing.
- Tim
"Alex Hunsley" <lard@tardis.ed.ac.molar.uk> wrote in message
news:gxAne.196644$Cq2.16581@fe2.news.blueyonder.co.uk...
> Tim wrote:
>> Just pull out one disc. IE SATA and Power.
>>
>> I would recommend pulling out the disc on the second port.
>
> Ah, ok. Isn't the point to check both disks though? If I only ever use
> spinrite on disk1, and disk2 is quitely failing and then dies without me
> knowing, I will only then be trying to recover from disk2 when disk1 shows
> a problem, by which point disk 2 is dead! Don't I need to ensuer the
> integrity of both disks?
>
> Ah, just realised that you might mean "check both disks, but always plug
> the one you're checking into the first SATA port, and unplug the other
> completely"... is that what you mean?
>
>> If you are not going to do anything at all while running spinrite and
>> don't care which disc is used as the source for rebuilding later then it
>> doesn't matter.
>
> What do you mean by "do anything at all"? You mean write data to the disk,
> as in spinrites write-then-read sector testing?
>
>> Will you have to recopy?
>> Almost definitely. It depends on the controller. For example the Intel
>> ICH5R seems to keep a time or sequence stamp on the discs so seems to
>> know which has been used most recently. Each controller is different.
>> This is why it is important to do things such as you are to learn how the
>> specific controller behaves.
>
> Righto, thanks for that..
>
>> Please post back with your experience of both the rebuild process and
>> what spinrite results / benefits there are. Some people have had issues
>> with drives refusing to rebuild after 1 failing and the 1 remaining
>> stopping the sync process part way through.
>>
>> - Tim
>
> What, so you mean they broke the RAID set deliberately, then during the
> rebuilding the source disk failed, and the destination was unusable
> because it was mid-way through being copyied to? Scary biscuits.
>
> Aeeii, the issues of RAID1! I thought it was going to be a simple way to
> protect my data...
>
> I'll note useful info as I find it and come back to this ng with it.
>
> Don't suppose anyone knows any good sites about SiI 3112 issues? I've been
> googling already, of course, but not turned up much...
>
> alex