RAID Guru Help Required....

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
if i were setting your system up from scratch then i would probably do this.

4x Raid 5 for the OS/Programs/db Log Files
6x Raid 5 for the Database
1x hot spare

then again i dont like crippling my system so i usually ditch the hotspare as the hotswap backplane makes the online spare a pointless endeavor. if it fails then swap out the dead drive with a cold spare and let it rebuild on low priority.

4x Raid 5 for the OS/Programs/db Log Files
7x Raid 5 for the Database

See, I like the hot spare. For my databases, I like to get them rebuilt as fast as possible, and since work is 45 min away, it makes for a longer rebuild. But, to each his own...

It is possible though, that the backplane could be the issue, but I know of no way to test it either. To be honest, the last time I dealt with IBM was about 6 years ago. A ServeRAID card went belly up and IBM was terrible in helping us get it back up and running. That was the last straw and I never went back to them.

Good luck with that RAID, Whizzard9992.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.

That is the most blind statement I have heard (outside of the Intel v AMD or MS v Linux debate). Any proof (besides TPC benchmarks) to back that?

I have Windows boxes running Oracle and managed a 250,000 user web application on MSSQL. Both work great. It's all about the tuning done on the queries and the hardware. You can't just "plug and go", or else you will get the results you stated.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?

Whizzard,

Just thought of something here. Are your logical drives badly fragmented? I'm wondering if that is causing your massive slowdown. If they are fragmented, then the read/writes won't be so sequential.

On my last assignment running SQL, I found the logical drives to be approx 60% fragmented (it was somewhere > 50, and I think 60 was it). I sped things up greatly just by doing a defrag.

After that, I scheduled a defrag to run periodically.
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
See, I like the hot spare. For my databases, I like to get them rebuilt as fast as possible, and since work is 45 min away, it makes for a longer rebuild. But, to each his own...

It is possible though, that the backplane could be the issue, but I know of no way to test it either. To be honest, the last time I dealt with IBM was about 6 years ago. A ServeRAID card went belly up and IBM was terrible in helping us get it back up and running. That was the last straw and I never went back to them.

Good luck with that RAID, Whizzard9992.

The online spare makes since in case designs where you have to shut down the system to change out a disk. Or in your case where you have to drive long distances to replace the drive. Or the server is in a remote location, then i would definitely run an online spare.

Since I am on location every day I would rather utilize the hot swap backplane for the additional performance instead of having it sitting there doing nothing for years at a time.

The only difference from hot spare vs cold spare is the time to pull the drive, insert the new one, and have it spin up. which is just a few seconds. you end up gaining that back in the end as you get the added performance of the additional drive.

But it isnt a big deal. I would agree, to each his own.
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.

I agree with belvdr on this one. You are spreading FUD. Go away!
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Thanks for all the help.

My configuration is actually close to what you quoted:

2xRAID 1 - OS/System/SWAP
2xRAID 1 - DB Logs/Temp DB
6xRAID 5 - Database

1xHot Spare assigned to RAID 5 Array. Can be reassigned remotely.

I agree the hot spare is a matter of preference.

I checked the disks and they're HEAVILY fragmented. I'm going to defrag and run the benches again.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Working on it now :)

Hope it's working well. :)

I put in a call to our local help desk. The tech assisting me is skeptical, so he's running IO Meter. (?I always thought you could only test using IO Meter on unpartitioned space?)

I'll post up what the results were when I get them.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop

:(

*sigh*
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
So I'm running 2 workers on the RAID 5 array, with the "All in one" access spec on both.

I'm watching the specs right now and I see an avg 80MB/s, 6200 IO/sec, and a max response time of 99 (ms?). Average response time is 0.31ms, so I'm assuming that is mostly cache hits?
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop

:(

*sigh*

If that is the response from the local help desk, then they should switch careers ("Hey, who needs multiple spindles, just setup a single 2.5" drive at a low RPM!"). Ugh, anyway, I don't know a thing about IO Meter, but this tool (www.iometer.org) is from 2004. Is this thing still accurate running across arrays?

But from the spec's you give (6200 IOs/sec), that seems like a lot.

Is that spec across the entire controller or just one array? I'm guessing you don't have another controller you could move a couple of the arrays to in order to free up some bandwidth, eh?
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop

:(

*sigh*

If that is the response from the local help desk, then they should switch careers ("Hey, who needs multiple spindles, just setup a single 2.5" drive at a low RPM!"). Ugh, anyway, I don't know a thing about IO Meter, but this tool (www.iometer.org) is from 2004. Is this thing still accurate running across arrays?

But from the spec's you give (6200 IOs/sec), that seems like a lot.

Is that spec across the entire controller or just one array? I'm guessing you don't have another controller you could move a couple of the arrays to in order to free up some bandwidth, eh?

Actually I do have another controller on the server. It's an adaptec RAID, even. We needed another channel for the Tape Drive because the tape drive is ultra-wide. I considered using this instead, but the problem is that this is a production server, and we'd have to repartition the array holding production data because the new controller's going to want to restripe the drive. Another problem is that we're using a SCSI mid-plane, and I think that may be the problem. If I switch the controller and the mid-plane is the problem, we're not going to see a difference in numbers. It is a plan, though.

At this point, however, I'm almost ready to pull the server down for 8 hours or so to go ahead and diagnose this problem. I'm sure there's something going on here....

I managed to run IO meter and I get ~100 MB/s sequential reads. I actually found an old IO Meter bench I ran in October that showed ~220 MB/s with an OLTP pattern. That still sounds low to me, but it's a lot better than what I'm getting now, and it's indicitive of a problem. A smoking gun, one might say :)
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Just an update for anyone still following this issue: I've put in a call to our local help desk (Policy says they have to initiate the call to IBM if they can (1) corroborate our hardware issue and (2) cannot handle it internally).

I'm satisfied with the benches at this point to confirm a problem.

I'm assuming the high IO's are a result of the cache doing its job. If that's the case, then the MB/s bench is probably not an accurate reflection of the drive throughput.

I'll keep posting here as things happen. Thanks to everyone so far.
 

steelspy

Distinguished
Jan 2, 2007
15
0
18,510
I also have the x236 server with the 6M controller.

I am running RAID 5EE on the 6M controller with 6 10k u320 drives.

I also have an LSI controller in the same box with a maxtronic JANUSRAID external array (raid 6) connected to it via u320.

I am able to transfer 2 GB of data (Outlook PST files in five chunks) back and forth between the two arrays in approximately 35 seconds (each direction).

My technique for timing this is nothing elaborate. I just opened the system clock and watched the seconds click by while the transfer ran.

I am not a wizard when it comes to this stuff. I mention this to assure you that there hasn't been any high end tweaking.

I am using the LSI controller in addition to the 6M because the 6M wouldn't support the Janusraid enclosure. Maxtronic said they have trouble with the adaptec based HBA's.

I'll run some benchmark tools on both arrays in that box and post the results.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Good to hear the troubleshooting is coming along. I'm wondering how the benchmarks would turn out if the load was removed from the system. Of course, I know you can't just "take it down" (I have the same issues too), but curiosity has me wondering.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
I also have the x236 server with the 6M controller.

I am running RAID 5EE on the 6M controller with 6 10k u320 drives.

I also have an LSI controller in the same box with a maxtronic JANUSRAID external array (raid 6) connected to it via u320.

I am able to transfer 2 GB of data (Outlook PST files in five chunks) back and forth between the two arrays in approximately 35 seconds (each direction).

My technique for timing this is nothing elaborate. I just opened the system clock and watched the seconds click by while the transfer ran.

I am not a wizard when it comes to this stuff. I mention this to assure you that there hasn't been any high end tweaking.

I am using the LSI controller in addition to the 6M because the 6M wouldn't support the Janusraid enclosure. Maxtronic said they have trouble with the adaptec based HBA's.

I'll run some benchmark tools on both arrays in that box and post the results.

Awesome :) Thanks :!:

It takes me about 1 min per 1GB going from my RAID 5 to my mirror, both on the 6M. That's about 4 times slower that what you've clocked.

I was able to take the load off (I shut down the SQL server service) when I ran the IO Meter benches.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Awesome :) Thanks :!:

It takes me about 1 min per 1GB going from my RAID 5 to my mirror, both on the 6M. That's about 4 times slower that what you've clocked.

I was able to take the load off (I shut down the SQL server service) when I ran the IO Meter benches.

If you have 6,200 I/Os/second when the database is shutdown, something else is chewing up the disk. Is there any antivirus or other similar program on the box?

I'm not sure if there's a way in windows to see how much I/O a process is using.
 

steelspy

Distinguished
Jan 2, 2007
15
0
18,510
Bad news...

I ran HD Tach and found the bottleneck on my system.

The external array on the LSI card clocked in at about 140 MB avg

The internal array on the 6M came in at under 60 MB avg

I also received an error from HD Tach when running it on the 6M

HD Tach completed on the array connected to the LSI completed without issue.

Keep me posted as to what IBM says. I'd like to improve performance on the 6M if possible. (Loaded latest BIOS on the 6M as of a few weeks ago)
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
Check to make sure the controller is in the PCI-X 133 slot.

Looking at the manufacturers specs there are three PCI-X slots, two are 100 the other is 133. This probably isnt causing the bottleneck but you never know.