RAID 5 Problems

radman63

Reputable
Jan 29, 2015
2
0
4,510
Lenovo ThinkServer TS440
4 GB RAM
Windows Server Essentials 2012 R2
4-4TB WD Enterprise drives in RAID 5 configuration using onboard integrated ThinkServer RAID 100 (0/1/10/5)controller.

About 2 days ago, I began having write problems to the array. Read is good, but writes are down to a crawl. Vendor I bought server from had me test with a HDD tool (ATTO Disk Benchmark) to confirm this. Vendor thinks it's a bad hard drive. All S.M.A.R.T. status indicators are green (HWiNFO).I have always heard faint clicking and clunking, but the drives don't slow down, and the noise is not consistent. It's only during writes, so maybe it's normal head movement noise? I have heard these drives are a bit noisy.

The vendor also had me check the status in the RAID BIOS. I checked, and all still looked good. All green. After restarting to see that BIOS screen, I can't successfully boot up, although I have done successful reboots since the problem first occurred. Here's what happens now. The system starts by showing the ThinkServer logo screen. Then it shows the RAID BIOS screen and gives you a few seconds to access the settings if you want to. If you do not hit "CTRL + I" to go into the settings, the ThinkServer logo screen displays again. At this point, my system freezes and won't finish booting up.

I'm fairly good with computers, but I've never had my own RAID file server before. Not sure how to pinpoint the problem. Should the array work with one HDD taken out? I'm wondering if I take out 1 HDD at a time, if the system will work, or if the noises I heard will go away. If the noises are normal, all drives should have the same noise. If the noises are not normal, then removing them one at a time should tell me which one the noisy drive is.

Even if one drive is bad, shouldn't the system still operate with 3 good drives? I thought that was the whole point of RAID 5.

My vendor wants me to remove all 4 drives, put in a different, single drive with a regular Windows operating system on it, and move it from bay to bay to troubleshoot the issue. He think that will determine if it is the drive(s) or something in the tower. Does that make sense? Would my array still be accessible when I put them in and change the BIOS back to RAID 5, assuming I put them back in the correct order?

How do I troubleshoot this? Thanks!


 
Solution
I would say remove all the present drives, make sure you got the right order noted down (label everything, drives, cables, etc), then insert another drive and take the steps recommended by the vendor to troubleshoot the system (as in, drives OR system malfunction). One way or another, you need to perform this step in order to diagnose the problem. It will take time, sorry.
I can only share some of my prior experience with a RAID 5 system (was a QNAP). It's not worth it. The way a RAID 5 is distributed will make recovery very difficult if the controller is down. First thing, you need to make sure you can backup any and all important data from that array, because otherwise your chances to recover the files (by yourself, at least) are slim. After that, you can afford to do whatever steps the vendor instructs you to do, including reinstalling the entire array if needed. If an HDD goes bad, the RAID will help you recover the files. If the system goes bad (a fried chip, for instance), you'll get data that is dispersed over 5 disks . IMO, the only way to keep such an array (other than a RAID 1) is to make regular backups of all your data on a separate HDD.
Hope this helps.
 

radman63

Reputable
Jan 29, 2015
2
0
4,510
Unfortunately, I can't do anything because I can't even boot up now. Most of my data is still on the various hard drives I copied the data from, so I won't lose much data, but a lot of time if I can't recover from this. I didn't want to lose more space with a RAID 10 configuration, and I thought the only real problems were if you didn't put the drives back in the same order, and that a rebuild time would take days. Once I got everything sorted out on the array, I was then going to figure out a backup strategy. I haven't even had time to consolidate, organize and de-duplicate everything yet. I only got this up and running a month ago!
 
I would say remove all the present drives, make sure you got the right order noted down (label everything, drives, cables, etc), then insert another drive and take the steps recommended by the vendor to troubleshoot the system (as in, drives OR system malfunction). One way or another, you need to perform this step in order to diagnose the problem. It will take time, sorry.
 
Solution