The Ultimate Hardware Guide [Last Update: 4-14-06]

Page 7 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
verndewd said:
http://img99.imageshack.us/my.php?image=workstation1kq.jpg

It isn't the best 😛 but that's the setup I have, excluding the 10 used for exhaust.

lol,,,,,,,and you motgaged what family member to pay the electricity bill.
youre set for summer with all that air flowin,just do your addition once you get the permit and have a seat right net to the cup;nice an cool with a permanent bad hair day. :?

I shave my head 😛.

The first PSU is a Thermaltake 680w, the 2nd is an Antec TruPower 450w. The fans together drew too much on the +12v and was actually hindering performance of Windows, which is what brought the necessity for the 2nd PSU.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
I can get you a huge case,,,,my boss has some monsters. seriously big friggin cases.....put all your fans in and use it to cool your house and pc. 8)

I'll just turn my apartment into a case, LOL. Or I could steal a rack and Mad Mod my PC into it, 😉.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
All sweet, except the cache and fsb (and plz dont take it as a flame or nothing):

The Prescott Pentium 4 also has a smaller Level 1 Cache than the Athlon 64, avg. of 48KB Level 1 vs. 128KB Level 1 in the Athlon 64, which I believe is another reason for why the Pentium 4 is not as good in Games as the Athlon 64. The Pentium 4 Prescott has larger Level 2 Cache, 2MB vs. 512KB or 1MB in the Athlon 64. Which is in place to leverage the bottlenecked Front Side Bus, but I personally believe that larger Level 2 Cache does not mean greater performance, and in some cases, I believe it to decrease performance.

Have a look at the P3 coppermine - 16k+16k vs amds 64k+64k - P3 coppermine and AMD thunderbird are clock for clock equal (within 10%, games were actually faster most of the time with Intel by only a few % and slower in other apps then the AMD, no chip was the winnder but the intel was colder and the amd was cheaper) - 1/4 the cache and equal - its the architecture and cache design, Intel cache works in a diffrent manner.

Intel seems to think there is an even greater increase in performance, but this is just due to the bottlenecked Front Side Bus in my opinion.

And FSB - P4 made the QDR fsb seem slow but its not too bad (for now, and yeah its not as good as AMDs design and scalibility etc) but as we saw with conroe and the pentium m's its the netburst architecture its self thats to blame, or that P6 is FSB efficent (again the P3 coppermine example - P3 used a 133mhz SDR fsb vs AMDs 266mhz DDR fsb)

It differs from architecture to architecture.
 
All sweet, except the cache and fsb (and plz dont take it as a flame or nothing):

The Prescott Pentium 4 also has a smaller Level 1 Cache than the Athlon 64, avg. of 48KB Level 1 vs. 128KB Level 1 in the Athlon 64, which I believe is another reason for why the Pentium 4 is not as good in Games as the Athlon 64. The Pentium 4 Prescott has larger Level 2 Cache, 2MB vs. 512KB or 1MB in the Athlon 64. Which is in place to leverage the bottlenecked Front Side Bus, but I personally believe that larger Level 2 Cache does not mean greater performance, and in some cases, I believe it to decrease performance.

Have a look at the P3 coppermine - 16k+16k vs amds 64k+64k - P3 coppermine and AMD thunderbird are clock for clock equal (within 10%, games were actually faster most of the time with Intel by only a few % and slower in other apps then the AMD, no chip was the winnder but the intel was colder and the amd was cheaper) - 1/4 the cache and equal - its the architecture and cache design, Intel cache works in a diffrent manner.

Intel seems to think there is an even greater increase in performance, but this is just due to the bottlenecked Front Side Bus in my opinion.

And FSB - P4 made the QDR fsb seem slow but its not too bad (for now, and yeah its not as good as AMDs design and scalibility etc) but as we saw with conroe and the pentium m's its the netburst architecture its self thats to blame, or that P6 is FSB efficent (again the P3 coppermine example - P3 used a 133mhz SDR fsb vs AMDs 266mhz DDR fsb)

It differs from architecture to architecture.

I should hope no chip was a "winnder". This was designed to be factual/my opinion, which is what I have stated several times. I asked people to post about wrong things that are facts, not stating your opinion on performance or design. Re-Post when you find something wrong in there, and I will be happy to fix it.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
Maybe something to add to your very fine article,

If one really looks at how is data transfered from Main memory to CPU, it might be interesting to compare both platform Intel and AMD. For sake of simplicity we can put everything in the PCI bus category, PCI-E , AGP, SuperIO etc etc...

With Intel, it is true that the IO, GPU, PCI is closer to the memory, generally Controller (PCI) -> NB -> Memory, this in theory makes it good to transfer information from outside to inside. The bottle neck between processor and memory comes in when there is a lot of data to be processed and returned to memory, just like in scientific calculation. It is even worst when the data has to go from core to core passing by the FSB. For games where DMA , bus mastering and other external controller can handle themselfs it is actually a better architecture. CPU -> NB -> Memory. It is interesting to note that the Intel processor has a quite good ratio of memory / operation ratio, close to 15 ops / fetch. In this case someone will need a lot more cache as it is used not only to mask long latencies but also main memory BW.

AMD, The memory controller is attached to the switch, this switch in the case of a 2xx or 8xx Opteron also connects other processors. The BW of this switch is really high, in fact higher than the FPU can handle in the case of a y=N*scalar case. the balance is about at 8 to 12 fetch / operation. The interesting part is in the case 2xx, 8xx, the memory controller scales very well, if data is on some other processor the data goes from CPU1 -> CPU2 switch -> Memory, basically the 2nd CPU becomes a "North bridge" for memory. In the case of a 8xx CPU, we can have 2 NB with 128bit path for both. The bad thing here is that the IO is actually one further away from the memory. The crossbar itself can already pass a lot of data to the memory without disturbing the CPU. We might want to consider that latency for IO is maybe not as bad as latency to memory unless the Video card is the IO, we can consider the local memory of the GPU as a "cache".

In my opinion I think that AMD should get rid of the memory controller straight on the CPU and just make more Hyper transport interfaces available. For example, let's say the 8xx CPU would have 4 Hyper transport instead of 3 + memory controller. It would be possible to make a motherboard with 25.6GB/s BW total, where 6.4GB/s would be going to IO and 19.2GB/s would go to Memory. In a very IO based configuration someone else could design a 12.8GB/s memory and 12.8GB/s IO, or again 19.2GB/s IO and 6.4GB/s memory bus (very nice in some applications I have here)... The memory could then be handled by a specific memory controller that connects to the Hyper Transport.

I would also bet that AMD could put even more hyper transport if they would get rid of the memory controller. They could also change memory technology as fast as intel. while keeping a reasonable amount of pins for the processor. Also keep in mind from what I have seen routing Hyper transport signal would be easier than routing DDRx * 128 or 256 bit wide.

For Intel, they tried it with rambus, having the memory controller separated from the North bridge, the rambus thing was pretty bad politically. Getting rid of the Northbridge memory controller and incorporating this Transport in the CPU itself would make sense.


Feel free to use it, add it, edit it, erase it... what ever !
 
Maybe something to add to your very fine article,

If one really looks at how is data transfered from Main memory to CPU, it might be interesting to compare both platform Intel and AMD. For sake of simplicity we can put everything in the PCI bus category, PCI-E , AGP, SuperIO etc etc...

With Intel, it is true that the IO, GPU, PCI is closer to the memory, generally Controller (PCI) -> NB -> Memory, this in theory makes it good to transfer information from outside to inside. The bottle neck between processor and memory comes in when there is a lot of data to be processed and returned to memory, just like in scientific calculation. It is even worst when the data has to go from core to core passing by the FSB. For games where DMA , bus mastering and other external controller can handle themselfs it is actually a better architecture. CPU -> NB -> Memory. It is interesting to note that the Intel processor has a quite good ratio of memory / operation ratio, close to 15 ops / fetch. In this case someone will need a lot more cache as it is used not only to mask long latencies but also main memory BW.

AMD, The memory controller is attached to the switch, this switch in the case of a 2xx or 8xx Opteron also connects other processors. The BW of this switch is really high, in fact higher than the FPU can handle in the case of a y=N*scalar case. the balance is about at 8 to 12 fetch / operation. The interesting part is in the case 2xx, 8xx, the memory controller scales very well, if data is on some other processor the data goes from CPU1 -> CPU2 switch -> Memory, basically the 2nd CPU becomes a "North bridge" for memory. In the case of a 8xx CPU, we can have 2 NB with 128bit path for both. The bad thing here is that the IO is actually one further away from the memory. The crossbar itself can already pass a lot of data to the memory without disturbing the CPU. We might want to consider that latency for IO is maybe not as bad as latency to memory unless the Video card is the IO, we can consider the local memory of the GPU as a "cache".

In my opinion I think that AMD should get rid of the memory controller straight on the CPU and just make more Hyper transport interfaces available. For example, let's say the 8xx CPU would have 4 Hyper transport instead of 3 + memory controller. It would be possible to make a motherboard with 25.6GB/s BW total, where 6.4GB/s would be going to IO and 19.2GB/s would go to Memory. In a very IO based configuration someone else could design a 12.8GB/s memory and 12.8GB/s IO, or again 19.2GB/s IO and 6.4GB/s memory bus (very nice in some applications I have here)... The memory could then be handled by a specific memory controller that connects to the Hyper Transport.

I would also bet that AMD could put even more hyper transport if they would get rid of the memory controller. They could also change memory technology as fast as intel. while keeping a reasonable amount of pins for the processor. Also keep in mind from what I have seen routing Hyper transport signal would be easier than routing DDRx * 128 or 256 bit wide.

For Intel, they tried it with rambus, having the memory controller separated from the North bridge, the rambus thing was pretty bad politically. Getting rid of the Northbridge memory controller and incorporating this Transport in the CPU itself would make sense.


Feel free to use it, add it, edit it, erase it... what ever !

This is geared towards those who have little understanding, I think that is a bit advanced, but good job and thanks for suggestions, I'll probably take some snippits out and modify them into lamens.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
For an everyone comment we could say that there is 2 main reason for having cache, Bandwidth and latency. One of them is how fast the train goes as in 300km/h, while the latency is how far is the train from it's destination.

If you look at the itanium who need huge L3 caches, it is quit obvious that they need it for BW and latency. On the P4 and Conroe familly the big L2 cache will free as much as possible the single FSB. While on the AMD the cache is mostly to covering the latency in non stream cases. That's why more cache is not always better, it really depends on how the information goes from memory to the CPU and IO.
 
For an everyone comment we could say that there is 2 main reason for having cache, Bandwidth and latency. One of them is how fast the train goes as in 300km/h, while the latency is how far is the train from it's destination.

If you look at the itanium who need huge L3 caches, it is quit obvious that they need it for BW and latency. On the P4 and Conroe familly the big L2 cache will free as much as possible the single FSB. While on the AMD the cache is mostly to covering the latency in non stream cases. That's why more cache is not always better, it really depends on how the information goes from memory to the CPU and IO.

Exactly, have more L2 Cache is not always best, most of the time, with larger L2 Caches, alot of that goes unused in some users environments, it really depends on what you're doing to warrant more cache.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
All sweet, except the cache and fsb (and plz dont take it as a flame or nothing):

The Prescott Pentium 4 also has a smaller Level 1 Cache than the Athlon 64, avg. of 48KB Level 1 vs. 128KB Level 1 in the Athlon 64, which I believe is another reason for why the Pentium 4 is not as good in Games as the Athlon 64. The Pentium 4 Prescott has larger Level 2 Cache, 2MB vs. 512KB or 1MB in the Athlon 64. Which is in place to leverage the bottlenecked Front Side Bus, but I personally believe that larger Level 2 Cache does not mean greater performance, and in some cases, I believe it to decrease performance.

Have a look at the P3 coppermine - 16k+16k vs amds 64k+64k - P3 coppermine and AMD thunderbird are clock for clock equal (within 10%, games were actually faster most of the time with Intel by only a few % and slower in other apps then the AMD, no chip was the winnder but the intel was colder and the amd was cheaper) - 1/4 the cache and equal - its the architecture and cache design, Intel cache works in a diffrent manner.

Intel seems to think there is an even greater increase in performance, but this is just due to the bottlenecked Front Side Bus in my opinion.

And FSB - P4 made the QDR fsb seem slow but its not too bad (for now, and yeah its not as good as AMDs design and scalibility etc) but as we saw with conroe and the pentium m's its the netburst architecture its self thats to blame, or that P6 is FSB efficent (again the P3 coppermine example - P3 used a 133mhz SDR fsb vs AMDs 266mhz DDR fsb)

It differs from architecture to architecture.

I should hope no chip was a "winnder". This was designed to be factual/my opinion, which is what I have stated several times. I asked people to post about wrong things that are facts, not stating your opinion on performance or design. Re-Post when you find something wrong in there, and I will be happy to fix it.

~~Mad Mod Mike, pimpin' the world 1 rig at a time

well facts AMDs bigger L1 cache is NOT quicker for games but there design benifits more from the size, and L2 the same deal.

The P4 FSB is not crap P4 is crap - conroe proves and intel explains why they keep a fsb with no ht like style and IMC for flexibility.

and btw the MMM^3 on your logo = MMM x MMM x MMM? WTF
 
The interesting way to make a benchmark that uses and understand the P4 / AMD differences is actually quite simple. One has to make a work growing slowly in size, I use dataset of size 2K, 4K, 8K , ... , 512KB , 1MB, 2MB, 4MB, 8MB, untill I have covered all of the data size of local memory and in the case of an AMD numa enough to reach all memory banks.

Then also increase the number of operations with 2 read -> 1 write let's call it A op B -> C.

Then continue to increase the number of ops/ fectch+write until you get the number of OPs you can do to saturate the FPU / Branch / ...
for a FPU example I would do:


K is a register constant.
A+B -> C
AK+BK -> C
AK+AK1 + BK+ BK1 -> C
... until I would have reached my peak FP/s

After that you can try to analyse how many streams can be handled, so we go by A op B op C -> D ... try to balance the write streams and read streams too.

Once all of this is done, one can make better code for each platform considering these values. It will show how each of the caches are used, who each of the needed Operation unit are needed, which one starves and which one is a bottle neck.

I have learned very soon that without these benchmarks I never get close to the real potential of performances, AMD or Intel doesn't mather, this hold true even for Crays / Power5 / SX6 ... Then running a code of one arch compiled on another arch just goes wrong. Once I am finished I can normally use -O1 or -O2 wihtout performance differences.

Honestly the only clock anyone human care is the wall clock, Flops, GPU, CPU, L1, L2, L3, Mips, Spec2000, all of that is a real alphabet soup, at the end doesn't mather.
 
Poeple talk hardware and mips etc.... but am i wrong thinking that software is less and less hardware friendly ? :roll:

I remember playing not so bad games on a atari xt and amiga those computer had far less computing power than todays monster ....
 
You're gonna diss me on my Sig now? I spent 10 minutes in Photoshop CS2...I guess you'll always find something to flame me on, least it shows who the bigger man is **hint hint**.

@Marg: That's because nowadays, all developers think about is graphics. I mean, come on, when is the last time a real worthy commercial game came out that lasted more than 5 hours? Excluding MMO's, you got F.E.A.R. that I beat in a few hours and its "Extreme" difficulty was easier than "Normal". CoD2 is awesome, but only a few hours lasting. I know most people say "Multiplayer you idiot!" but there's already tons of good Multiplayer games out there, we need more games like Every Extra Extend and Geometry Wars! :)

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
Hey MadModMike, that is one sweet explanation there. Ok since im having no luck with my AMD and im a huge gamer and multitasker, i decided im going with intel (dont need to bash, intel can play games as smooth as silk too) and of course im overclocking. But do you suggest i should either get the 530 or the 630 intel? Also what motherboard you suggest?

Oh and you should also show what mobos you recommend, or what chipset at least.
 
P.S. Would you think a thread like this for Northbridge's would be good? I think I would be able to get it done, just the fact of whether or not it is necessary, there really aren't a huge variety of chipsets compared to the variety of CPU's.

I certainly do think a NB thread would be excellent. I'm pretty new at assembling modern rigs and have plans to build more. (I've done plenty of work building computers for scientific apps like instrument controllers, etc., but that's a different ballgame althogether.) This forum and others have helped me a ton on my current project. For example, a PC Club employee had me talked into Intel Mobo/820D foundation. I'd told him I wanted to do image processing, video processing, Internet access and to have decent gaming performance. But after digging around here, I settled on an Asus A8N32SLI/4400+ combo. Yes, I know that's pricier than the Intel parts I'd been recommended, but I got a great deal on the Asus/AMD ($700 for those two items plus the Zalman 9500 and FS-V7 coolers) and have the capabilities I need. So these forums are a good thing. I've been taking notes lately because my son's birthday is approaching and he wants a desktop that will play games well. The budget will be pretty limited and he's even adding some of his fun money to help out. So to get back to your question, everyone wants the same kind of info and it all relates to compatibility, setup, performance, reliability, cost, capabilities, etc. Different people have their own priorities that lead to buying decisions, but knowledge should be the starting point, not bias.

Great job on the CPU summary, by the way, but now I guess we're putting you back into the sweat house, huh? And you know it doesn't stop at the chipset, sorry to say. VGA performance and compatibility is an obvious followup, then maybe other PCI devices and of course a thorough treatment of setup, optimization, overclocking and all that is required to seal the deal. Sounds like a bible to me.

And you can't really choose your chipset like you can choose your CPU.

By that I assume you mean with the degree of independence you get with a CPU, rather than being strapped to the mobo specifics.
 
Sounds interesting...and impossible. :)

Never use that word. The first minicomputer I worked on had 8K of RAM. When we got an additional 8K from US Govt. Surplus, it was a big enough occasion to have a party. 8K took up about 2 cubic feet of rack space back then. Never say "impossible" about computer performance. It will happen.
 
Hey MadModMike, that is one sweet explanation there. Ok since im having no luck with my AMD and im a huge gamer and multitasker, i decided im going with intel (dont need to bash, intel can play games as smooth as silk too) and of course im overclocking. But do you suggest i should either get the 530 or the 630 intel? Also what motherboard you suggest?

Oh and you should also show what mobos you recommend, or what chipset at least.

No doubt the 630, I've seen 530's @ stock go up to 60c idle, eeks! For motherboard, I recommend the one below.

Motherboard - You can hit 4GHz+ on this board.

PS: I would look into the Pentium D's, get the 805 for $134, it's cheaper and can easily be OC'd to match the 630 but you'll have 2 cores.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
I don't need a Dual-Core laptop, that is just retarded.

You might be surprised. My trusty Vaio laptop got stepped on by our dog (which weighs 105, crunch) and it snapped the mobo like a potato chip. So I replaced it with an Acer (gasp!) that has a Dual Core. It was such a deal I couldn't turn it down. It's got 1GB RAM, 100GB HD and I got a Bluetooth transmitter and mouse thrown in, all for $1250. This is a sweet little rig! It multitasks better than any laptop I've used. The other night, out of boredom, I tried to get it to latch up by burning a DVD, copying a GB of digital images off of a USB reader through the laptop to an external USB HD, reading a big heart rate monitor file off of my Polar watch through its USB/IR reader, playing itunes and surfing the web all at once. No problema, although the fan was running pretty hard. It was like having an information vortex right there at my fingertips!
 
I don't need a Dual-Core laptop, that is just retarded.

You might be surprised. My trusty Vaio laptop got stepped on by our dog (which weighs 105, crunch) and it snapped the mobo like a potato chip. So I replaced it with an Acer (gasp!) that has a Dual Core. It was such a deal I couldn't turn it down. It's got 1GB RAM, 100GB HD and I got a Bluetooth transmitter and mouse thrown in, all for $1250. This is a sweet little rig! It multitasks better than any laptop I've used. The other night, out of boredom, I tried to get it to latch up by burning a DVD, copying a GB of digital images off of a USB reader through the laptop to an external USB HD, reading a big heart rate monitor file off of my Polar watch through its USB/IR reader, playing itunes and surfing the web all at once. No problema, although the fan was running pretty hard. It was like having an information vortex right there at my fingertips!

Pay no attention to that remark, I said that when I was pissed @ somebody, I did it to get them to shut up =/. I've used lots of Dual-Core and SMP systems to know more CPU's = Fun Fun Fun! :)

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 
tchiwam,

Your idea is nice however you are forcing the procs to NOT use the L anything cache.

Cache infers that you are placing data OFTEN used in a quick response area. Constantly changing the data causes the cache to NOT be used. I guess the fetch of A B and K would be cache dependent but those would more than likely be a very quick L1 fetch. Really not going to stress the arch in that manner. You have to find what triggers cause the transition from L1 to L2 then to L3 if available and finally to System Memory. Once you know the triggers you could write something that forces those triggers.

The hard part would be making those trigger events be "as similar" as possible between procs.

You would also have to TRACK you cache hit ratios to identify you are indeed stressing the full arch.

And to do a complete stress you would have to have an idea to have a non-cache event as well thrown into the mix.
 
tchiwam,

Your idea is nice however you are forcing the procs to NOT use the L anything cache.

Cache infers that you are placing data OFTEN used in a quick response area. Constantly changing the data causes the cache to NOT be used. I guess the fetch of A B and K would be cache dependent but those would more than likely be a very quick L1 fetch. Really not going to stress the arch in that manner. You have to find what triggers cause the transition from L1 to L2 then to L3 if available and finally to System Memory. Once you know the triggers you could write something that forces those triggers.

The hard part would be making those trigger events be "as similar" as possible between procs.

You would also have to TRACK you cache hit ratios to identify you are indeed stressing the full arch.

And to do a complete stress you would have to have an idea to have a non-cache event as well thrown into the mix.


Yes, the idea here is simple really we want to process as much data as possible, most of it will come from ethernet port , hard disk, ... and will go out to display, hard disk, ethernet or ...

The idea here is to know how many ops are needed to stuffs the processor, and how much is needed to stuff the Memory bandwidth. Once you know that ratio for every memory(L1, L2, memory) and storage (HD), you can start evaluating how much stuff can be reused in the caches. Messaging between processor can also be tested in a similar way but I usually try to make my code wihtout mutex, so that reading is from a constant source for everyone and writing is not dependent on the previous action. It is not easy and require quite acrobatic math.

Ultimately every thing comes from outside the computer and goes outside the computer, the wall clock is the thing we care about. Unless you really like to trow everything to /dev/null.

One thing I forgot to mention, when you grow the dataset, repeat the action as many time as possible so that the data is reused. sorry my bad.
 
Update. Refer to 1st post for changes.

Look for any errors please, I want to make sure all information I added and that has been there for awhile is correct. Thank you.

~~Mad Mod Mike, pimpin' the world 1 rig at a time