I am familiar with the basics of RAID, but not what to do when an advanced array such as this has problems beyond basic troubleshooting.
The system this is happening on was built around 11/20/2008. Specs (Pulled from HW info link for convenience, modified to remove future changes and irrelevant info):
Infinity Rising (A/V & VM Powerhouse):
PSU: Silverstone 800W Modular (DA800)
Mobo: EVGA 780i SLI FTW (132-YW-E178-A1, manual HERE)
CPU: Core 2 Quad Q9650 (BX80569Q9650)
GPU: [strike]BFG Tech GeForce GTX 280 OC2 1GB (BFGEGTX2801024OC2E)[/strike] <-Temporarily an Asus 9600 GSO, 280's cooling fan failed and is on RMA
RAM: 8GB Corsair XMS2, DDR2 1066, (TWIN2X4096-8500C5 x2)
Sound Card: Creative SoundBlaster X-Fi Elite Pro (70SB055A00000)
HDDs: 1x WD Caviar SE16 640GB (WD6400AAKS), 3x WD Caviar Black 1TB (WD1001FALS) in Kingwin Hot-Swap Rack (KF-4000-BK)
Opticals: 2x Lite-On 20x DVD-+R/RW & DVD-RAM Drives (DH-20A4P-04)
OS: Vista Ultimate SP2 64-bit [strike]and XP SP2 64-bit[/strike]
Screenshot from nVidia Control Panel:
The degraded array is the 3x WD 1TB drives in RAID 5 on chipset-based NVIDIA Media Shield RAID. There are no hot-spares configured currently.
So far I have not found anything of use to get a more specific idea of what the problem is via the RAID controller or nvCpl. I have done basic troubleshooting such as trying various SATA and power cables with the drive and trying different SATA ports, with no progress. Just to ensure it's not drive hardware failing, I ran them through a gauntlet of nondestructive testing individually (with RAID disabled, and without booting Windows so as to not destroy the array):
■HDAT2 - Pass
■MHDD - Pass
■WD DLG - Pass
■SpinRite 6 - No errors before SR's ~540GB limit was reached, resulting in SR crashing (known SR6 issue)
I have since disconnected the array from the data and power cables until I could ask for advice to protect it from any further damage. There were no pre-failure signs. I shut down my computer about 10PM the night before, went to an air show to shoot photos, and when I came back the RAID BIOS screen was flashing "Error" for drive 3. Upon starting into Vista, I was able to watch video stored on the array successfully without any errors (done just to check parity integrity, then stopped) and then immediately began the troubleshooting steps above.
I believe my next step should be to do a rebuild on the array, but I don't want to lose anything due to factors I'm unaware of as I don't know what is involved in the rebuild process. My backup drive is currently down for RMA as well, so I would not be able to do a pre-rebuild backup via the parity fault-tolerance.
My question at this point is whether my knowledge/instincts is indicating a correct next step, or if I should do something different, such as waiting for my backup drive. Any thoughts? And is there any possibility that this could be a controller issue and not a file system one?
EDIT: Should I connect the error-state drive on it's own and do an array deletion on that drive before doing the rebuild? Or will the rebuild work without this step?
Side notes: I know the perils of southbridge-based RAID now through my research into this problem, please DO NOT sidetrack the thread into that issue unnecessarily. I intend to disable RAID in the near future anyways so I can get my board's hot plug feature to work.
And does anyone know if IDE emulation can be turned off for the SATA ports on this chipset?
The system this is happening on was built around 11/20/2008. Specs (Pulled from HW info link for convenience, modified to remove future changes and irrelevant info):
Infinity Rising (A/V & VM Powerhouse):
PSU: Silverstone 800W Modular (DA800)
Mobo: EVGA 780i SLI FTW (132-YW-E178-A1, manual HERE)
CPU: Core 2 Quad Q9650 (BX80569Q9650)
GPU: [strike]BFG Tech GeForce GTX 280 OC2 1GB (BFGEGTX2801024OC2E)[/strike] <-Temporarily an Asus 9600 GSO, 280's cooling fan failed and is on RMA
RAM: 8GB Corsair XMS2, DDR2 1066, (TWIN2X4096-8500C5 x2)
Sound Card: Creative SoundBlaster X-Fi Elite Pro (70SB055A00000)
HDDs: 1x WD Caviar SE16 640GB (WD6400AAKS), 3x WD Caviar Black 1TB (WD1001FALS) in Kingwin Hot-Swap Rack (KF-4000-BK)
Opticals: 2x Lite-On 20x DVD-+R/RW & DVD-RAM Drives (DH-20A4P-04)
OS: Vista Ultimate SP2 64-bit [strike]and XP SP2 64-bit[/strike]
Screenshot from nVidia Control Panel:
The degraded array is the 3x WD 1TB drives in RAID 5 on chipset-based NVIDIA Media Shield RAID. There are no hot-spares configured currently.
So far I have not found anything of use to get a more specific idea of what the problem is via the RAID controller or nvCpl. I have done basic troubleshooting such as trying various SATA and power cables with the drive and trying different SATA ports, with no progress. Just to ensure it's not drive hardware failing, I ran them through a gauntlet of nondestructive testing individually (with RAID disabled, and without booting Windows so as to not destroy the array):
■HDAT2 - Pass
■MHDD - Pass
■WD DLG - Pass
■SpinRite 6 - No errors before SR's ~540GB limit was reached, resulting in SR crashing (known SR6 issue)
I have since disconnected the array from the data and power cables until I could ask for advice to protect it from any further damage. There were no pre-failure signs. I shut down my computer about 10PM the night before, went to an air show to shoot photos, and when I came back the RAID BIOS screen was flashing "Error" for drive 3. Upon starting into Vista, I was able to watch video stored on the array successfully without any errors (done just to check parity integrity, then stopped) and then immediately began the troubleshooting steps above.
I believe my next step should be to do a rebuild on the array, but I don't want to lose anything due to factors I'm unaware of as I don't know what is involved in the rebuild process. My backup drive is currently down for RMA as well, so I would not be able to do a pre-rebuild backup via the parity fault-tolerance.
My question at this point is whether my knowledge/instincts is indicating a correct next step, or if I should do something different, such as waiting for my backup drive. Any thoughts? And is there any possibility that this could be a controller issue and not a file system one?
EDIT: Should I connect the error-state drive on it's own and do an array deletion on that drive before doing the rebuild? Or will the rebuild work without this step?
Side notes: I know the perils of southbridge-based RAID now through my research into this problem, please DO NOT sidetrack the thread into that issue unnecessarily. I intend to disable RAID in the near future anyways so I can get my board's hot plug feature to work.
And does anyone know if IDE emulation can be turned off for the SATA ports on this chipset?