Parallel Processing, Part 2: RAM and Hard Drives

virtualban · Oct 18, 2007

This is a matter of system setup and personal experience, but I would like your opinion over what works faster with 2+ workloads to be processed: one thread per core and all running in parallel, or each workload on a different priority level using all available cores so that when one finishes the other begins. I am thinking that request collisions from HDD and memory, and the cache pollution factor from multiple threads could balance the fact that most software is artificially and not natively multithreaded. But that depends on the setup. PentiumD for example is for multitasking, while core2 seems the opposite. Has anyone tried things on my previous post and has any result to share?

gamebro · Oct 18, 2007

I didn't like the article... The RAID info hardly helped me at all...

You boneheads at Toms need to bench the loading times, not FPS..... And for petes' sake, use better games for your benches...

QUAKE IV!??! What the hell man? Oblivion is the perfect test for this, not quake!

dumb dumb dumb dumb.

virtualban · Oct 18, 2007

To gamebro:
Have you tried Postal2? Talking about perfection... on the same system, Oblivion load times are much less irritating than Postal2 ones, defragmentation or not, software running in background or not (well, Oblivion is more sensitive to software running in background, but that's because it is more demanding in general).
Anyway, no need to flame guys at Toms. Maybe they're just trying to make a point, that framerate does not depend on HDD in a well written game that loads areas while you're getting there and not during.
I still agree with you in what a hell lot more tests should have been done, but that means people working for peanuts. Advertisement should at least pay the electricity bill and shortening of the life of components if people truly have fun playing around different setups and have other ways to earn money.

rroy · Oct 18, 2007

RAID0 only improves bandwidth performance. It does not reduce latency. The drive arms still need to swing across the platters which yields in a perfect world the same latency. Since you need to wait for all arms to swing and return data, the seek performance is realistically going to be lower. Because of this, only large file transfers are going to see benefit. Small data transfers, non sequential IO, and metadata work are not going to see much if any improvement.

This is why all the calls for load time testing. Your load times should improve with RAID0 as this is sequential IO and often long sustained data transfers.

I can only assume dual channel RAM performance will only improve in the bandwidth department as well. Access times should remain the same or worse if I understand the technology correctly.

martin0642 · Oct 18, 2007

Are the reviews insane? The comparisons are apples to oranges. They are running tests which either cache data from the drive, and then execute, or run mathematical problems which have very small data input and require a lot of CPU time to come up with a result. The code to generate a new prime number is very small, but results could take years.

Running Windows from a Dual or Quad RAID 0 array is much faster, its one of the best ways to improve system performance (but not reliability) because often the HDD is the slowest part. Games dont run faster, they load faster. Games play from RAM, thats why its there. And quad core processors are great for anyone who does more than 2 things at once. I alt tab between games and browse and have torrents going and Antivirus software with all the options on with no slowdown because of it. I remember the days of alt-tabbing and waiting for an eternity because my game in windowed mode still needed CPU love. This article needs to be pulled and re-done with macro benchmark of starting windows, opening software, copying files from 1 drive to another and loading a video game and some more tests that are meaningful. The conclusion of this review should say: Apparently, the authors need to take computers 101 over again.

fletch420 · Oct 18, 2007

I thought the artice was good- kind of surprised myself that raid was not much help- but like others have said HDD perf only should affect load times. The part I as well was not thinking not well thought out was the dual channel tests, increasing the bandwidth of system memory should vastly improve load times not FPS, once it's in RAM it's all about swap time. but
cheers

powerbaselx · Oct 18, 2007

wirelessfender :

I got really surprised by the RAID 0 benchmarks results.
I don't know about you, but my real experience with RAID 0 is that the load times, boot time, shutdown time and disk access (read+write) is noticeable faster than a single disk, especially if it's used 7200rpm hard drives.

Performing benchmarks measured on FPS during a game isn't really appropriated specially if you have 2GB+ fast RAM available. Maybe if they used only 512MB it could show some difference since it would read and write from disks most of the time.

About dual-channel tests, it didn't surprise that much since i'm used to see the test comparison between low and high latency modules and the difference isn't that much also. Anyway, dual channel architecture provides surelly more stability and some resiliency to the system (even without memory mirroring).

russki · Oct 18, 2007

martin0642 :

While the RAID benchmark selection was, like I said, curious (to put it charitably) what you say is a bunch of unsupported hoo-ha. Really? RAID helps your torrents? I think those would be bottlenecked by the network troughput (with respect to the internet connection). How about the benchmarks that show that games load 3-5% faster at best with dual RAID0? Seriously, theoretical throughput is not always achievable in practice and unless you know precise access patterns for the task, there is no way to tell. And the funny thing is, access patterns for the vast majority of applications, games included, do not lend themselves to acceleration from RAID0. Sorry. Even mutlimedia tasks show much less than theoretical increase in speed, in the real world.

Now file copying is great, but unless you do that all day long (and if you do, please do tell us about your job / hobby), the disappointment is well warranted.

General_Disturbance · Oct 18, 2007

russki :

Yes that is what I do! I read data off the HDD all day. I expect raid0 to speed that up, but not speed up the PROCESSING of my data once it is in ram. raid0 will get the data loaded from the HDD and into to ram faster, and htis helps a lot. I process terrrabyes of image data looking for individual photons in millions of frames of data.

edit: more recent discussion at http://www.tomshardware.com/forum/244224-32-noob-raid-question-simple#t1735029

martin0642 · Oct 18, 2007

Russki, the last part was in reference to the statement about quad core processors, not the RAID portion:

And quad core processors are great for anyone who does more than 2 things at once. I alt tab between games and browse and have torrents going and Antivirus software with all the options on with no slowdown because of it.

I also do IT work for a fortune 50 company and work on giant NAS and DAS arrays on a daily basis. I'm fully aware of where throughput and speed apply, my point was that the authors of the article arent, which is faithfully represented by the legion of comments calling them out. Tom's usually has good testing, which is why this article stands out. In the future, a complete grasp of what was said would be nice before making claims of "un-supported hoo-hah". The framers of the article deserve rebuttal, not the ones that blow the whistle.

Personally I am looking forward to testing the FusionIO drives to possibly replace many of the arrays we have here. From preliminary data, it would seem solid state is the future.

russki · Oct 18, 2007

Martin, well, I take it back then. Besides, Anandtech just recently did a pretty good analysis on threading in Unreal 3, which was excellent, so although Tom's point that multicore is immature yet (part 1 of this article if memory serves me right) Unreal just may be a sign of things to come.

Anyway, Tom's quality of articles has plumeted, and there's nothing I could argue with that.

VTOLfreak · Oct 20, 2007

You can't notice any difference between single and dual channel because in most cases the FSB is the bottleneck.
I'm using DDR2-1066 and on a 1066FSB I don't notice much difference either. Until I crank the FSB way up to 1840. Then I start seeing large difference between single and dual channel. At that FSB speed you can drop down the mem divider to 1:1 and your still running DDR2-920.

RAID0 scales up linear when doing long reads/writes. Latency however doesnt't improve. So for short reads/writes you might as well be using a single drive. What does make a large difference hower is cache. I'm using an Areca controller with a 256MB cache in write-back mode and Windows uses all free memory to cache reads. (I have 4GB total) This made a huge difference compared to the onboard raid. Altough benchmarks tell me there is no improvement at all in latency and sequential reads/writes. Now I can write chunks up to ~250MB to disk and they finish instantly. (I can hear the drives writing it all away for several seconds afterwards). Desktop usage and loading levels in games also goes a lot faster after a while since Windows XP caches al reads in free memory.

Conclusion:
-Crank the FSB way up in dual channel. (Atleast buy a FSB1333 CPU if you arent into overclocking)
-Stop adding more drives, start adding more cache. (Get a **** of RAM, think 4GB, maybe 8 if using 2GB modules)
-Use a dedicated RAID controller for write caching. (It doesn't have to be an expensive one with dedicated XOR unit, any controller with onboard RAM wich you can put in write-back mode will do)

brownlove · Oct 21, 2007

I appreciate the article for what it helps to illuminate; more doesn't always mean better. Casual computer users won't benefit from all the horsepower these suped-up systems have. Gamers should build systems specifically for what they're trying to do; i.e. if you want a system that plays really smoothly at 1600x1200, then build a system to handle that load. Professional 2D graphics users have different needs, but they don't need a high powered gaming PC. 3D pro's have different needs - thank goodness 3D apps tend to utilize multicore systems. And I appreciate what that IT guy had to say about the company he works for. Huge processing loads require huge processing power and lots of fast moving storage. It's really quite pequliar to hear about multicore machines, and gaming. I know that's where alot of money and attention is focused these days. Even though it's a lot of fun, seems to be a waste of technology.

VTOLfreak · Oct 21, 2007

Its not just the bus, even PCI-E RAID controllers have internal limits. For example:
The Areca 1220 I'm using is internally using PCI-X and is limited to about 800MB/sec even tough its a PCI-E 8x card. I've placed it in the crossfire slot of a P35 mobo wich is wired up for 4x and it tops out at about 600MB/sec. Since this is a 8 port card that's fine: The drives will always be the bottleneck. (Unless you plug in 8 raptors, but if you can afford that you can also afford a bigger controller.)

virtualban · Oct 22, 2007

Also, something nobody mentioned, (in regard to the original article) - if you compress with winrar, sometime in the future you have to decompress the data. I have noticed very little activity in the cpu during that time, with or without virus scan, and the most limiting factor is the hdd. It would be interesting to see this tested on THG.

VTOLfreak · Oct 22, 2007

Yes I'm using the 1220. http://www.areca.com.tw/products/pcie.htm I suggest you read that a little slower this time 'round. 😴 I'm running that 1220 on an Asus P5K-E in the crossfire slot. The Areca bios itself confirms its running on 4 lanes.

12xx series are SATA PCI-E models
11xx series are SATA PCI-X models
16xx series are SAS controllers (both PCI-X and PCI-E models)

Now the funny thing about all of Areca's PCI-E controllers except the 1280/16xx is that the processor and SATA controllers on the card are PCI-X. They have simply slapped on a PCI-E to PCI-X bridge. Quick and dirty solution to turn their PCI-X cards into PCI-E cards but it seems to work well. All their cards using this setup seem to top out at about 1GB/sec

BTW: All of Areca's PCI-E cards are 8x PCI-E. I don't know where you get that 8x cards are rare. 8x PCI-E has become the new standard in servers these days. You can just find about anything on 8x PCI-E: RAID controllers, fibre controllers, 10gb NIC's, Infiniband cards, ssl accelerators and so on. Mellanox already has 8x PCI-E 2.0 Infiniband cards and even Nvidia's QuadroPlex (4 videocards in external box) can be used with a 8x adapter card.

[EDIT] I really want to see you try and sqeeze 1GB/sec out of 8 drives. So in my situation (8 port card on 4x PCI-E) the drives WILL be the bottleneck.

matobinder · Oct 23, 2007

I really did like the page 2 "Some memory history" section. Even though I was kind of annoyed it left somethings out(Rambus for example) Rambus was an interesting solution for a bit. It did get outpaced, but it was better for a short while in 2002. (I think it was first half of 2002, can't remeber)

Anyways, it would be cool to see more article's on "the history" of something. Like, how harddrive techs have changed. We see articles on the future, but it is fun to read/remeber how things worked back when. It be really cool to see benchmarks... hehe. What the IDT Winchip 200mhz Socket 7 chip behind me powered off compares to a Core 2 Duo...

pc007 · Oct 23, 2007

I'm not sure if the RAID testing took into account windows caching. If doing any operations with a file that will fit in the windows memory cache, then your timing numbers wont be relevant to the HDD speed.
2 disk RAID0 is much faster than single disk
4 disk RAID0 is much faster than 2 disk RAID0
Most of the benchmarks didn't test anything that would be affected by the HDD speed, so they didn't show this.

minnesotadon · Nov 12, 2007

I recently built a new system with an Intel DP35DP MOB, E6750 dual processor, and two 2-Gb 800 MHz mem sticks from G.Skill. Worked fine. Then a reader of my blog pointed out that I had configured the memory for single-channel rather than dual-channel.

I made the change, performing the Vista Windows Experience Index test before and after. The overall score didn't change, because it is limited to 5.4 by the RAID 1 hard disks, but the memory sub-score improved from 5.6 to 5.9.

I don't know how much difference that is in the real world, but apparently it's enough to be measured by that test. I wonder if Microsoft discloses the methodology of their Experience Index tests.

Don
http://buildmyown.blogspot.com/

awallac3 · Nov 19, 2007

This was an interesting article to me, at least the single channel / dual channel portion of it. I've been under the impression that dual channel memory configuration made a noticeable difference in memory intensive operations, but according to the THG benchmarks this is not the case. I develop algorithms for processing radar data, both real-time and "offline", and could have sworn that the memory configuration made a significant difference. The article inspired me to do a couple of tests of my own using an algorithm I developed for processing radar data. For the sake of brevity, I found that dual channel configuration was only 0.4% faster than single channel for the section of the algorithm which was very computation intense, but was 9.1% faster for the more memory intensive portion of the algorithm. I used DDR400 on an Athlon 3500 (single core) for my test. I should note that the "memory intensive" section of the algorithm manages over 1GB of data in memory, which is atypical for the average user.

Rgds,
Aaron

konocwa · Nov 24, 2007

I was a little disappointed the article did not address the issue of onboard RAID versus independent RAID controller cards. I have read a number of places that running your raid on a separate PCI controller yields much better RAID performance. This is due to the fact that onboard controllers run with software and must utilize CPU time, while separate hardware controllers run independently of the motherboard and CPU. It would have been nice to see a comparison of motherboard based controllers versus independent hardware based controllers. In that case I think the Raid benchmarks would have been much different.

nachowarrior · Dec 16, 2007

I know nobody wants it, but i'm going to put my 2 cents in.

This is a serious request for a lot of serious system builders and small businesses. And those of us that have grandmas. 😛
I would LOVE to see the test on dual channel memory repeated on an amd platform with integrated video and the video depends on system memory. I've built at least one system for a customer that had 1.5 gigs of ram and 1 gig was dual channel and there was an extra 512, laying around so i dumped it in. they all matched. my customer doesn't do a LOT of gaming, but i set the frame buffer up to the point where she could get good frame rates in simple 3d games and google earth and what not and still retain a quick system when not utilizing 3d applications. This is an area that I have not seen a lot of documentation (i may not be digging hard enough for it) but albeit I think this is an area that is heavily dependent on your memory. My hypothesis is that the single channel will lose MUCH more ground on a cost effective amd platform. But that's a hypothesis and can be proven wrong.

why i think it's worth running tests on?
It may not be a buzz-worthy topic, but I've noticed that most readers build systems not only for themselves. A good percentage of toms readers either have htpc's which depend on integrated graphics, or build systems for their family and friends which don't necessarily need "an enthusiast" system. But the topic touches that area of system building that is sometimes overlooked. I would have definitely liked to see it go that direction as well as for gaming and what not.

things in this article i liked that encompassed business solutions.
the small and adequate simple notes on how this would be used in server side situations and what not. in my limited experience, i have not had to encounter any type of server situations. but it is something useful that did not need to be included and yet was included.

and finally, trying not to be critical. just inquireing that maybe if there is time... or space... apply this to some highly memory dependent solutions. i'd do it myself but i REALLY don't have the money or resouces. 😛 thanks for reading my 2 cents.

perzy · Feb 13, 2008

I second the previous post by nachowarrior.
Dual-channel memory is an issue when one are building a rig for friends and family. Often they want to upgrade, or a new build and they allways want it dirt cheap!
So dual-channel function on AMD IS a big issue for many people out there.
Thank you THG for doing this article thou, none other hardwaresite has anything like it.

perzy · Feb 13, 2008

And also, my CPU-cooler blocks one memory slot! I bet that is very common among overclockers!

perzy · Feb 13, 2008

Update: I found a better place for further reading about memory and AMD CPU's memory handling:
http://www.digit-life.com/articles2/cpu/rmma-a64x2-ee-page1.html

Parallel Processing, Part 2: RAM and Hard Drives

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Share this page