Access time vs Throughput - What should we look for?

bhendin

Distinguished
Dec 14, 2005
91
0
18,640
Could someone clarify in what situation you would want better access time over throughput?

I ask this question as a clarification of the following statement:

"Hitachi and Seagate still offer better access times, which is why Samsung does not dominate the I/O benchmarks"

from the article on this site (http://www.tomshardware.com/2007/11/21/samsung_overtakes_with_a_bang/page11.html) about the new F1 Samsung's.

Based on my knowledge I would say that access time is how quick it takes to find the data on the disk vs throughput is how much data it can more off the drive in a set time period.

My question is in terms of application...

If I had a database server with hundreds of records, or an e-mail server (which is essentially a DB of mail) then the higher access times would be better since I want to find records quickly and each record is relatively small.

However if I had many large files (video, music, graphics) then the larger throughput would be more important.

Is this a fairly accurate statement?

I am asking to determine what specific characteristics of a hard drive are the best for a particular environment.
Specifically I am building a VMWare server with a RAID5 array that will host a few virtual machines. The server will also be used for file storage (some large ISO files). Although a high throughput would be nice for file transfer, I would be more concerned with the overall functionality of the system to give the best performance for normal server operation (web server, exchange e-mail, etc).

So, in order to determine which drive is best for a given application, I would like to know what I should be looking at (access, latency, throughput, others?) and why one drive would theoretically perform better in a RAID environment for a VMserver, vs a file server, vs a gaming machine, etc.

Thank you for any information you can provide me!
 

UncleDave

Distinguished
Jun 4, 2007
223
0
18,680
Hi bhendin,



Your description is correct.



Well yes and no... in principal your description is a good one, in practise it might not happen that way for a number of reasons. In terms of application there is no silver bullet. The best advice I can offer is to go for as many drives as possible in a RAID 5 array.



You are still going to bottle-neck at the network irrespective of what drives you put in the server. I think, with all due respect, that you are over analysing the situation. You are building a general server stick with a simple disk strategy (KISS).


UD.
 

bhendin

Distinguished
Dec 14, 2005
91
0
18,640
Thanks UncleDave... a few question about your reply.

When you say that in practics it might not happen for a "number of reasons", any additonal details you could point me to there?

Also, while I realize that you can increase read performance to a certain degree with additional drives, do you not adversely affect write perfromance due to the increased striping and parity calculations?

Also, I suppose you are right about network bottlenecks. The fastest throughput one could probably expect to receive with gigabit ethernet is around 80 megabytes/second (and that is being generous). I think the lowest of these drives meet that, with the highest at just over 100 MB/s. So for what I described server usage you are probably correct.
 
Hundreds of I/O requests is hardly a huge demand. Typical server hard drives can handle almost 400 per second....plain ole desktop drives, about half that. From what you described, I/O doesn't appear to be a major concern. Large ISO files OTOH are a limiting factor.

Simply put, look at the numbers. What is your limiting factor ? If you don't need more than 100 MB/sec, then RAID isn't going to do much for you.....if you don't need more than 100 I/O per sec, then 15k Cheetah's aren't going to really help you any. If you don't have a gigabit network, then network capacity can make all of the other items moot.

As in most PC questions....the answer depends on where your bottleneck is.
 

UncleDave

Distinguished
Jun 4, 2007
223
0
18,680
Hi,

Since you used the example of an e-mail or db server and finding lots of records..... I was thinking that most of the time that information is stored in memory on a server where possible. Lots of little records in a db does not necessarily mean a lot of reads as the db server could have a number of records on one page and will try to cache tables that are frequently changed. Also reading big files may be broken up by the drive having to stop reading and write out some memory (paging) or to write an output file. Lots of little files will always mean lots of "seeks". The point that I was trying to make is that there are exceptions to the rule.

The overhead of splitting reads and writes across additional drives will always be "cheaper" since the splicing is done in firmware and is always going to be quicker than reading and writing to the drive which is mechanical.

Your explanation of the differences in access time and throughput was spot on. In reading your post I wanted to try to focus on the fact that there was no single answer that I would be able to give you and say this is the best!


UD
 

bhendin

Distinguished
Dec 14, 2005
91
0
18,640
Thanks all. This is one of those things that even though I understand the theory I never paid much attention to in practice. When you are working in IT shops you often have much more on your mind and typically just let the vendor work out the specific details on drive specifics.

Sure you take into account storage needs, fault tolerance, and performance scaling for the application as a whole, but generally not "Should i buy drive A which has %20 percent better access time, or drive B with %17 better throughput!" Its often whateve the vendor supplies.

Since I am buying a server for personal use I was taking the extra time that you don't always invest at work (or rather trade off for other things) and just wanted to make sure I was on the right page!