RAID Guru Help Required....

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
I'm currently working on a Production SQL server with 6 15k U320 SCSI hard drives on an IBM RAID controller in RAID 5.

We've been generally dissapointed with the performance of the machine, and historically have had nothing but problems with the machine.

I ran HD Tach as well as some other utility called DiskBench. HD Tach showed a burst rate < 100MB/s, a seek time of 6ms, and the graph showed an average of 60 MB/s read rate. It was close to how my laptop rated....

I almost cried.

I hoped that HD Tach just had prejudice against RAID 5 or the controller for some reason, so I dug up another bench tool called DiskBench.

Multi-threaded reads rated < 20MB/s. Multi-threaded writes were around 40MB/s. Something seemed really screwy here.

What's the best way to bench my array? What kind of performance should I expect from the array? Is HD Tach generally accurate?

If so, I need to call IBM and cause a ruckus.

Thanks in advance.
 

derek2006

Distinguished
May 19, 2006
751
0
18,990
I used that same program to bench my raid 0 array against my Seagate 320 gig perpendicular recording drive. It found that my Seagate 320 is faster. Kind of weird. I noticed a significant decrease in loading time in programs, startup, and games with the raid even though it sais that 320 gig was faster. It did not run so fast when I had windows on it. So I would doubt the accuracy of it.
 

croc

Distinguished
BANNED
Sep 14, 2005
3,038
1
20,810
I'd be calling IBM for a proper performance measurement tool at the least. I assume its under warranty / support... Tools are support.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
I'd be calling IBM for a proper performance measurement tool at the least. I assume its under warranty / support... Tools are support.

Yeah IBM support may as well be resident in our building. We've had so many problems with this machine.

I want to make sure I'm (1) not crying wolf and (2) I want to be able to benchmark it after they fix it and be able to say "Yes, it's good now," or "No. Keep working."

I didn't buy the server, so I don't know much about the hardware. I'm more interested at the moment in benchmarking it.

I do know it's U320, IBM ServeRAID, and I'm pretty sure it's PCI-X. All the drives are U320. (I believe the) RAID 5 stripe size is 64k.

From what I know about SCSI, it seems like a bad cable or terminator. This is because the burst rate was like 90MB/s.

Any ideas?
 

croc

Distinguished
BANNED
Sep 14, 2005
3,038
1
20,810
OS? Card version? # drives, raid configuration? Firmware version? ibm GSA has a lot of support, but these questions need to be answered....
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Maybe I should rephrase the question....

What's a good benchmarking tool for RAID drives, and what should I expect from a typical Single-channel U320 RAID 5 array with 6 15k drives?

I know enough to know that HD Tach measuring 90MB/s burst is a problem, and that copying a 1GB file takes a full minute (~17MB/s).

I can research upgrading firmware, drivers, etc myself. What I need to do is to measure my array's performance because I believe it is slow, and I need to prove it beyond demonstrating a file copy.

We've paid for IBM support, so I can get them to diagnose the problem. What I need to do is prove the problem exists via performance benchmarks and be able to confirm its resolved...
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
OS? Card version? # drives, raid configuration? Firmware version? ibm GSA has a lot of support, but these questions need to be answered....

All the info I have at the moment is in the post immediate preceeding yours.
 

croc

Distinguished
BANNED
Sep 14, 2005
3,038
1
20,810
Two people have asked for the same information...

The 1GB file copy should be a good enough test for before / after comparisons. One would think that would be enough evidence to convince an IBM tech that there is an issue.

A lot of smaller files ~64 KB that equal a GB would tend to prove your burst rate, as that's what you think your stripe size is.

Typically, what size file's are xferred to / from the server on a day to day basis? The array might need to be tuned for typical files. Not knowing what the specs of the adapter are, I can't say what the tuning options would be, but your 'resident' IBM tech should be able to advise you.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Thanks for the help.

I'll get the tech specs as soon as I can. Unfortunately in order to get most of them I have to reboot the machine and go into the BIOS :( It's a production machine, so I can't. I have to put a call and have someone at the data center do it for me.

It's a database server, but we do mostly data warehousing (i.e. frequent reads, few writes), which is why we went with RAID 5 over something like RAID 10.

Most of our reads/writes are sequential, and because we're using SQL server, they're in 8K blocks.

We actually have 11 drives on the card total; 2 in RAID 1, 2 more in RAID 1, 6 in RAID 5, and one hot-spare. I'm assuming they're all on one channel, because I tried copying from a RAID 1 array to a RAID 5 array and got that 17MB/s xfer rate. I expect channel-to-channel on the same card to be lightning fast. 17MB/s seems like they're on the same channel running 40MB/s on the line.

I hope that means I'm just running 40MB/s on the channel because of a bad cable, terminator, or config somewhere. I hope it isn't something retarded like a 33MHz PCI card, a small buffer, or something like that.

I'll create a bunch of 64K dummy files, and see what I can do about getting the specs.

Thanks again.
 

croc

Distinguished
BANNED
Sep 14, 2005
3,038
1
20,810
Don't reboot a production machine just for this post... I know how painful that can be, and at this time of year, probably not a good idea. I'd discuss with your IBM tech possibly tuning the raid5 array stripe size down to 8k, that would fit you typical usage.

Raid 1 will usually run slower than a raid 5, because of the mirroring involved in writing, but should read faster than you seem to think is the case. (no mirroring in reading) Channel to channel would be faster than in-channel, again depending on adapter specs.

What I can tell you is that it is an Adeptec controller, my experience with most of those it that they are much faster than what you are seeing.
 

Mobius

Distinguished
Jul 8, 2002
380
0
18,780
I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.

Because, the fastest drives today, are faster than anything in RAID ever was.

RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
Don't reboot a production machine just for this post... I know how painful that can be, and at this time of year, probably not a good idea. I'd discuss with your IBM tech possibly tuning the raid5 array stripe size down to 8k, that would fit you typical usage.

Raid 1 will usually run slower than a raid 5, because of the mirroring involved in writing, but should read faster than you seem to think is the case. (no mirroring in reading) Channel to channel would be faster than in-channel, again depending on adapter specs.

What I can tell you is that it is an Adeptec controller, my experience with most of those it that they are much faster than what you are seeing.

Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.

And the smart thing would be to test both configurations. You can't just immediately say RAID 1 is the solution. The OP has the right argument of test and change. Even a blind squirrel occasionally finds a nut.

Because, the fastest drives today, are faster than anything in RAID ever was.

There's no way you can convince me (other than a documented case) that a single hard drive is faster than an array of the same hard drive using enterprise level equipment. Consider arrays consisting of fiber channel drives; there's no way a single drive could outperform it.

RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.

Did you use RAID 1 to fix a problem once? It seems you're a bit evangelistic on that particular level. You can't blindly discard RAID 5 as a viable solution. It has its place, and so does RAID 1.
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
6x 15K Ultra 320 SCSI drives in a Raid 5 done correctly should get you over 300+ MB/s sustained throughput. Although transferring to the Raid 1 will be slower as its throughput will be much lower.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?

Is that 938k or 938M? Copying to RAID 1 is going to get you an effective write speed of 1 disk, so 35MB/sec is not bad. Your RAID 1 is your bottleneck here.

What kind of performance problem are you having? If you need very high read speeds in a database, I wouldn't choose RAID 5. It could be a case where a redesign of the tables is required, if the performance has dropped over time.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.

And the smart thing would be to test both configurations. You can't just immediately say RAID 1 is the solution. The OP has the right argument of test and change. Even a blind squirrel occasionally finds a nut.

Because, the fastest drives today, are faster than anything in RAID ever was.

There's no way you can convince me (other than a documented case) that a single hard drive is faster than an array of the same hard drive using enterprise level equipment. Consider arrays consisting of fiber channel drives; there's no way a single drive could outperform it.

RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.

Did you use RAID 1 to fix a problem once? It seems you're a bit evangelistic on that particular level. You can't blindly discard RAID 5 as a viable solution. It has its place, and so does RAID 1.

lol. There were so many things wrong with his post, I just chose to ignore it. Same with SupremeLaw's post ;)

You take a risk when posting on forums that people who don't know what they're talking about will try and help :)
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
just a comparison to give you an idea.

im running the bare minimum raid 5 setup on a compaq ML370

4x 15k Ultra320 on a SmartArray 641 single channel 64k stripe.

HD Tach...
I get 150MB/s sustained throughput. And this number is low as I didnt shutdown a running SQL server instance, a J2SEE front end client, and their are 12 users using this machine as a file share.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
6x 15K Ultra 320 SCSI drives in a Raid 5 done correctly should get you over 300+ MB/s sustained throughput. Although transferring to the Raid 1 will be slower as its throughput will be much lower.

Thanks. I expected about 300MB/sec read throughput and slightly slower writes as a result of parity calculation.

Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?

Is that 938k or 938M? Copying to RAID 1 is going to get you an effective write speed of 1 disk, so 35MB/sec is not bad. Your RAID 1 is your bottleneck here.

What kind of performance problem are you having? If you need very high read speeds in a database, I wouldn't choose RAID 5. It could be a case where a redesign of the tables is required, if the performance has dropped over time.

Whoa. Sorry. 938M (938,000k and change).

But yes, you've exposed the crux of the problem. I need to benchmark the actual throughput of the RAID 5 array. Granted, I expected a bottleneck with the RAID 1 volume, but I expected more like a 60MB/s-80MB/s sustained transfer rate, as opposed to < 40MB, mainly because it's a sequential write.

I dind't choose RAID 5, but I probably would have had I been involved at the time. 6 physical volumes allow for a hefty abount of parallel reads, which would allow us to saturate the U320 bus with large sequential reads(or so we thought).

Also, we're running at ~400GB for that volume. We would have needed ten drives as opposed to six to get that same storage in RAID 1.

(I appreciate the help, which is why I'm going into so much detail. I appreciate any alternative opinions)

Our database is primarily reads, and most writes are batch-jobs run offline. We're trying to squeeze the entire database into the 7GB memory footprint we have available. We've physically partitioned the table between the RAID 5 and RAID 1 volumes. The temporary database is on the RAID 1 volume, where most random writes will occur. Once the data is process, it's copied sequentially to the RAID 5 volume.

This provides us with the optimal architecture while still keeping costs down.

Now for benchmarking, I tried a program called DiskBench, which is something in .NET that just creates a file with random data, so there's no bottleneck. This is what I got:

RAID 5 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 10 seconds. ~95MB/sec.
Create 2x1GB File (Simultaneous): 28 Seconds ea. ~75MB/sec.

READ 1GB File (32MB Buffer): 34 Seconds. ~30MB/sec.
READ 2x1GB File (32MB Buffer): 118 seconds ea. ~16MB/sec.
-----------------------------

The array actually appears to perform faster writes than reads (which seems backwards to me). I tried the same program on the RAID 1 Array

RAID 1 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 18 seconds. ~54MB/sec.
Create 2x1GB File (Simultaneous): 40 Seconds ea. ~50MB/sec.

READ 1GB File (32MB Buffer): 18 Seconds. ~53MB/sec.
READ 2x1GB File (32MB Buffer): 50 seconds ea. ~40MB/sec.
-----------------------------


The RAID 1 array is more in line with what I'd expect, though it still seems a little low. Bear in mind that I have 256MB cache with write-backs enabled, and the server is under no load currently.

It looks like we might be running in wide-mode on both channels perhaps?
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
RAID 5 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 10 seconds. ~95MB/sec.
Create 2x1GB File (Simultaneous): 28 Seconds ea. ~75MB/sec.

READ 1GB File (32MB Buffer): 34 Seconds. ~30MB/sec.
READ 2x1GB File (32MB Buffer): 118 seconds ea. ~16MB/sec.
-----------------------------

The array actually appears to perform faster writes than reads (which seems backwards to me). I tried the same program on the RAID 1 Array

RAID 1 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 18 seconds. ~54MB/sec.
Create 2x1GB File (Simultaneous): 40 Seconds ea. ~50MB/sec.

READ 1GB File (32MB Buffer): 18 Seconds. ~53MB/sec.
READ 2x1GB File (32MB Buffer): 50 seconds ea. ~40MB/sec.
-----------------------------


The RAID 1 array is more in line with what I'd expect, though it still seems a little low. Bear in mind that I have 256MB cache with write-backs enabled, and the server is under no load currently.

It looks like we might be running in wide-mode on both channels perhaps?

You might check how the RAID cache is allocated. It sounds like it is more write-oriented. If you flip that around (say 75% read / 25% write), you may see a huge improvement.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
The cache doesn't appear to be configurable :?

just a comparison to give you an idea.

im running the bare minimum raid 5 setup on a compaq ML370

4x 15k Ultra320 on a SmartArray 641 single channel 64k stripe.

HD Tach...
I get 150MB/s sustained throughput. And this number is low as I didnt shutdown a running SQL server instance, a J2SEE front end client, and their are 12 users using this machine as a file share.

Thanks. That's really exaclty what I need :)





Something else to note: We're using a SCSI backplane for the RAID 5 array. How plausible is it that the backplane is bad? How could I test that?
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
yep im on a backplane as well. generally if the backplane works, then it works. its a really simple circuit board.

sorry, not much of a help there as i dont know any good means of testing the backplane itself.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
yep im on a backplane as well. generally if the backplane works, then it works. its a really simple circuit board.

sorry, not much of a help there as i dont know any good means of testing the backplane itself.

Thanks everyone for the help. This is enough to get the ball rolling with IBM I think.

I wish there was a more definitive way to test the array, but I'll just deal with what I've got. A rated 60MB sustained throughput for a U320 array is enough for me to worry.

Thanks again.
 

sandmanwn

Distinguished
Dec 1, 2006
915
0
18,990
if i were setting your system up from scratch then i would probably do this.

4x Raid 5 for the OS/Programs/db Log Files
6x Raid 5 for the Database
1x hot spare

then again i dont like crippling my system so i usually ditch the hotspare as the hotswap backplane makes the online spare a pointless endeavor. if it fails then swap out the dead drive with a cold spare and let it rebuild on low priority.

4x Raid 5 for the OS/Programs/db Log Files
7x Raid 5 for the Database
 

maury73

Distinguished
Mar 8, 2006
361
0
18,780
If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.
Anyway if you can't obtain high transfer rates with a large sequential file copying i'd investigate in this order: the SCSI cables and terminations, the SCSI drivers and the OS settings.
As a comparison I can tell you that one of the servers I manage is an IBM with 4x U320 15k IBM/Hitachi HD, connected to an IBM ServerRAID controller in RAID 5 and with Gentoo x64 (2x Opteron) I obtain a minimum of 190 MB/s in the large file sequential copying.