Build a 500 GFLOPS PC

lucaskclai

Honorable
Feb 13, 2014
17
0
10,510
I need to build a 500 GFLOPS PC and the following is my idea. It is for scientific calculation and the arithmetic power is important.

1. 2 CPU Xeon E5-2690 which has theoretical 371 GFLOPS

2. I do not which motherboard can put these 2 CPU together so that it has more than 500 GFLOPS

3. 1 256 G SSD and 1 T SATA hard disk.

4. I like to use Windows 8.1 as OS as the program I am using is MATLAB.

Any suggestion to my logic.

Thanks guy!
 
Dear Jan,

1. My motherboard is ASrock 41 without USB3
2. PSU no
3. My objective is to run big data analysis using MATLAB program. Data is in T byes.
4. It seems my machine is out of date for the task. If purchase a GeForce card could help, my budget could be more than $300.00 but I doubt it will work. To support a fast GPU card, CPU must be fast too.
5. In the long run, I like to build a machine for scientific research purpose. Hence, I come up with the idea to build a 2 latest CPU with more than 500 GFLOP and combine with GPU, it may go to 2 T GFLOP.

 
if you are adding a heavy GPU, you need to be sure your PSU can handle it. GPU cards can consume up to 300W these days.
also, I'd like to know if your how much PCIe 2.0 / 3.0 your mainboard has. It really matters in my GPU advice.
For floating-point operations, I got no specific preference. AMD is better at OpenCL integer operations compared to nvidia, but it doesn't matter in this case.
I really don't care about your CPU power: watch this:
http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
if you look at the nvidia geforce gtx titan, it will give you teraflops floating point calculation performance.
if you look at the retail price on amazon of the xeon E5-2690 vs the GTX titan, you know it is not worth investing in an expensive CPU, you will need to adapt your pc so it can host a GPU card as powerful (and hungry) as the GTX titan

*edit: why I don't care about CPU performance:
I once wrote a cluster application myself, in openCL, NPTL, and GlibC. it provided me a calculation power of nearly 8 tflops divided over 5 pcs across my network. often, I had to preserve 1 core only to do the IO communication to other PCs, GPUs and threaded task distribution. CPUs were always the slowest component, even on an amd E-350 using built-in graphics card.
 
ok, so I took a look at your current specifications:
- I do not reccommend to keep your current CPU / mainboard: you do not even have PCIe 2.0, if you empower it with a high end GPU, the card won't even be capable to get enough data through the PCIe bus
- do NOT invest in a dual-cpu system! buy a mainboard with 2 PCIe 3.0 X16 slots, and add 2 high-end GPUs (in your case, I recommend the GTX titan because it has a more acceptable single <> double precision performance penalty ). Buy 1 performance-oriented CPU. there's no need to make it a xeon, a high end hexa-core i7 will do. but if you really want to, it's the same socket. just make sure the mobo supports it.
- invest in a high end PSU and sufficient cooling: if you have 2 high end GPUs, and 1 high end CPU, your system will easily generate as much as 800W of heat. your system needs to be able to provide the power and withdraw the heat to keep it running. but using the setup of a i7-4960X and 2 GTX titans, you 'll have a system with a compute capability of +- 3 TFLOPS (double precision) / 8 TFLOPS(single precision) for a price below the original

*edit: and IF you want upgrade options: get a mainboard with 3 PCIe X16 buses and invest in 1 octa-core high-end CPU, you can add the 3rd card later. beware that you might need to invest in liquid cooling, as it is quite difficult to ventilate 1000W+ of heat in a normal ATX case
 
Dear Jan,

Thanks for your advice which is in line with my thinking. Do you suggest to use 256 G SSD and keep my existing 1 T SATA hard drive? To do research in big data, its size is 1 T. I am worry if the hard disk is fast enough to handle the task. One of my program run on Xeon took few hours to run because the data is so big. Even a MATLAB program running evolutionary algorithm take 2 days to finish on i5-4200U.
 
Dear Jan,

Would you recommend motherboard ASUS P9x79 WS Deluxe which is the best in Tom's hardware review?
GTX Titan 6 GB GDDR5 GeForce is 9 times more expensive than GTX Titan 2 GB GDDR5. Do you think this is a smart choice in my case, as I can put 6 GTX 2 GB which is 12 GB same as 2 6 GB GTX in the above motherboard but half the price.

Since I will move to a high end computer, I am still incline to use 2 CPU and use the motherboard EVGA Classified SR-X (LGA2011 Xeon E5 Dual CPU) as I can use it as my family server while I am doing scientific experiment. However, the review in Tom's webside does not give much of a credit.


 
here, my answer will largely be based on the type of calculations you are making: if you are making matrix-sized multiplications of a size, let's say 10^12 * 10^12, use that calculation once and then never use that matrix again, you'd need the 6GB versions.
But where do you find a 2GB titan for $110 anyway? I've only seen 6GB versions at the price of $1000
if you will be doing a whole procedure of calculations on the same data, you'd better go for the 2GB sizes.
note that you will not be able to put cards in the white slots: the GTX titan is a dual-slot card (like most high-end cards), and cannot be reduced to single-slot. so 4 is all you 'll get.
If you are worried about your hdd speed drive, I'd recommend switching from a single disk to a RAID-0. you'll need the sata drive, I agree with that, but MATLAB will be performing a high number of I/O operations on the drive, shortening its lifetime. the RAID-0 is a good compromise between speed and the reliability penalty
if you absolutely want to buy a dual-cpu system, why not take the asus Z9PE? it's mainly the same, but the layout is much better.
 
Dear Janpieter,

GTX Titan 6 GB GDDR5 in Hong Kong is HK$9,900.00/pc while GTX Titan 2 GB is only HK$975.00.
I am doing research in Big data studying Deep Neural network with 100,000 parameters on at least 60,000 images. This is a very tedious job and demand a lot of computational resource in computer. Of course GTX Titan 6 GB is the best but it just so happen the price difference is very large. Yes, the matrix dimension could be very big as there are many layers in Deep Neural Network. Usually, this type of job should be done in supercomputer not PC level but I just like to try on a small scale. Hence, this is no a one time short deal as I will be using it for a very long time as research is my career. I do not limit myself to MATLAB as I can write in C or java. It is just easy to use MATLAB.

I take it you agree with the motherboard ASUS P9x79 WS Deluxe and can only put 4 GTX. Your suggestion is asus Z9PE if I go for 2 CPU. How much faster do you think I can get for 2 CPU on top of your estimataion 8 T FLOPS. I like to know from investment point of view. At this stage, this new PC would not be cheap. Despite I do not have a budget for this machine for my research but I like to get the best from the money. Your suggestion is really appreciated.

Yes, you are correct to use RAID-0 for I/O as I can combine 4 to 6 T hard drive. I do not know how to do the cooling system in RAID-0. I understand GTX already have cooling system.



 

can you give me the model number of the 2GB and 6GB version? something here does not feel right


as I already said: if you have 4 geforce GTX titan cards which render your floating-point mathematics, a 2nd CPU is not worth it. you could use it for cases where the GPUs cannot be used, though. the openCL implementation in MATLAB is not complete. You said you also know how to program in C. If you have a number of routine-algorithms, it will be better to take them out of matlab and recreate them in pure openCL, as it may be faster in some cases.


you will need to design your own cooling system: if you have
- 2 xeon CPUs (260W)
- 4 GTX titans (1000W)
- 4 HDD (90W)
- mainboard + additional loss (fans, dvd, leds, usb devices, etc) (50W)
you have a system which will produce 1400W of heat at full load. you can not do this with air cooling in any way: there's no way you can push enough air through your case to cool down all these components.
you will have to:
- install waterblocks on the GTX titans
- install waterblocks on the Xeon CPUs
- buy a large external radiator with a large number of fans, and connect all those waterblocks to the radiators
- divide the HDDs so not all 4 of them are mounted next to each other. a 5.25" -> 3.5" mounting case may help here
- search for a power supply which is capable of handling this amount of power. also note that they degenerate and not everything is heat, so 1400W will not be enough: 1500W is close, but I'd recommend looking for a 1600W.

*edit*: forgot to add: if you are using more than 3 disks in a raid-0, better add an extra spare for ECC and jump to raid 5. I know it has a performance penalty, but with 6 disks in a raid-0 chances of losing data are quite high.

 
Dear all,

GeForce GTX 650 Ti BOOSTStep into GeForce® GTX gaming with GeForce GTX 650 Ti BOOST. Packed with high-performance features like NVIDIA GPU Boost technology—which dynamically maximizes clock speeds to push performance to new levels and bring out the best in every game—plus NVIDIA SLI® technology, and 2 GB of graphics memory.

This is the model I am talking about GTX 2 GB.
 
Dear Janpieter,

I am a bit lost on "if you have 4 geforce GTX titan cards which render your floating-point mathematics, a 2nd CPU is not worth it." You are talking about the motherboard is ASUS P9x79 WS Deluxe and the 4 GeForce GTX is 2 GB not the GTX 6 GB. You estimation on 8 TFLOPS is based on high end GTX which I assume is 6 GB.


If I use 4 GTX 6 GB which is very expensive even in Hong Kong total of which is already US$5,100.00. I think i7 8 cores is less than US$1,200.00 which is cheaper to buy 2 GTX 6 GB.
 


That is not a "Titan", not even close. Thats a mid range GPU at best.
 
I was NOT talking about a geforce GTX 650 TI, I was talking about the GTX titan, which is your "6GB version" :)
Yes indeed, my estimation is based on the high end GTX titan cards. in fact, there is only one gtx titan, the other one is GTX 650
you don't have to waste time on 650 TI cards. If you have the money to buld a dual-cpu Xeon system, it's worth more to drop one CPU, buy a single-cpu mainboard, and invest in a GPU system to perform the necessary calculations for you. this is wat I said from the beginning: do not invest in a dual-cpu system. a Xeon is twice as expensive as an i7, but when you drop one and replace it with 2 GTX titan cards, you get 4 times the performance back for +- the same price. if you also drop the 2nd and replace it with an i7, you have a minor performance loss (2 cpu cores), but you can invest in proper additional cooling, HDD, PSU, etc, and often have some overclock possibilites (which is useful on water-cooled cooling systems)

*edit* and a VERY IMPORTANT REMARK: make sure the theoretical calculations used by intel represent the instructions used by MATLAB: if intel is using AVX and AVX2 to perform benchmarking of their floating-point calculations, but MATLAB is not using them, you might not get the performance you expected: the calculation power of a E5-2690 will be much worse with only the instruction set of the pentium D (for example)
 
Dear Janpieter,

I just built a new computer with the following details.

1. Intel i-7 4770K @3.4GHz
2. 32 G DDR3 1600 RAM
3. GPU card ASUS R9290X-DC20C
4. Motherboard AZ87M6x which can accommodate 3 GPU cards
5. CM 700 W Power supply ready to convert to water cooling

To my disappointment, the above set up only perform 85 GLOPS.

Please advise how to increase to 500 GLOPS, adding more GPU card.

Please help guys! It already cost me more than US$2,000.00.
 
Dear logainofhades,

Thanks for your information. I have install the video card ASUS R9290X-DC20C-4GD5 both the VGA driver and the utility driver for the GPU Tweak. When I run my program using Matlab, I do not see the usage of GPU at all (it is 0%) and I checked the GLOP using the file qwikmark with only 85 GFLOP idle and 75 when running the Matlab program.

I am a new kid at the block regarding GPU, I do not like to play video game and it is only for research purpose that I install it to speed up the calculation power of the PC. It took 24 hours to run 1 of my program in 85 GFLOP with the above setup. I know something is wrong but I could not find it.

Please help guys!
 
Dear logainofhades,

I am confused. Why the GFLOP does not increase when check with file qwikmark.exe? After the installation of the video card, I was under the impression that the whole system GFLOP should increase.

Your information is very helpful. However, this is very bad news as I must find a method in each application software to use GPU in order to speed up the program like Matlab. I understand GPU is for playing video game and automatically doing the calculation work for CPU. When playing video game, GPU will do the job automatically.

Please give me more links.