[SOLVED] computer recommendation

grider · Jun 25, 2019

Hello. Hoping for some expert advice. I need a computer for running simulations. Most important is number of cores and clock speed. Number of cores is most important. I want to be able to run 96 simulations overnight (running simultaneously). I dont really care if it "overnight" is 12 or 14 hours, but I do need 96 runs (for example) so thats why I think clock speed isnt as important as number of cores for me. I dont care about graphics and tons of memory and flashy cases and things like that. I also dont mind if the cores are in different machines. For example if it is cheaper to have 2 computers 24 cores each that is fine.

So what and where can I find the best value deal for a machine with 48 cores and a decent clock speed? Would AMD instead of Intel be good enough? Any recomendations for specific configurations or sources would be greatly appreciated.

Lutfij · Jun 25, 2019

Read through this thread and follow the stylizing with a followup post with the necessary details on this thread. We can move forward from there.

NightHawkRMX · Jun 25, 2019

Threadripper is your friend.

Budget?

kanewolf · Jun 25, 2019

Is your "96 simulations" done by having one program run 96 times or attempting to run 96 simultaneous instances of a program?
What benchmark performance data do you have?
Do you know how well the software scales on number of cores?
Have you attempted to run it on an AWS instance with 48 or more cores to verify that it scales to that large a host?

grider · Jun 25, 2019

Lutfij said:
Read through this thread and follow the stylizing with a followup post with the necessary details on this thread. We can move forward from there.

Thanks Lutfig. Here is the info. Probably not a "typical" situation:
Approximate Purchase Date: ASAP

Budget Range: ~10K . . . but I want to save money anywhere I can . . . maybe the best $ per core ? Is that an OK way of looking at it?

System Usage from Most to Least Important: compute intensive simulations (sometimes days to complete)

Are you buying a monitor: No

Parts to Upgrade: All new system

Do you need to buy OS: Yes

Preferred Website(s) for Parts: no pref

Location: South Florida East Coast

Parts Preferences: no preference

Overclocking: Yes

SLI or Crossfire: Maybe

Your Monitor Resolution: not important to me

Additional Comments:

And Most Importantly, Why Are You Upgrading: all new system

grider · Jun 25, 2019

remixislandmusic said:
Threadripper is your friend.

Budget?

thanks remixislandmusic . . .
~10K . . . but I want to save money anywhere I can . . . maybe the best $ per core ? Is that an OK way of looking at it?

grider · Jun 25, 2019

kanewolf said:
Is your "96 simulations" done by having one program run 96 times or attempting to run 96 simultaneous instances of a program?
What benchmark performance data do you have?
Do you know how well the software scales on number of cores?
Have you attempted to run it on an AWS instance with 48 or more cores to verify that it scales to that large a host?

Thanks kanewolf. Answers follow.

96 separate independent instances of the app running simultaneously (not COM instantiated or anything like that). just like manually opening the app 96 times
small scale testing on 6 core machine. done this before with a similar app in the past with a couple hundred cores with good success
scales very well. no concerns there
Cloud computing is not an option for me for various reasons. I am confident that it will scale to as many cores as I can afford, assuming the rest of the hardware supports the number of cores. The nature of what I am doing is that I can divide the simulations up on independent machines even if I want.

kanewolf · Jun 25, 2019

grider said:
Thanks kanewolf. Answers follow.

96 separate independent instances of the app running simultaneously (not COM instantiated or anything like that). just like manually opening the app 96 times

small scale testing on 6 core machine. done this before with a similar app in the past with a couple hundred cores with good success

scales very well. no concerns there

Cloud computing is not an option for me for various reasons. I am confident that it will scale to as many cores as I can afford, assuming the rest of the hardware supports the number of cores. The nature of what I am doing is that I can divide the simulations up on independent machines even if I want.

I would recommend multiple smaller servers (16 core) because they will be cheaper, have higher clock speed and if one dies, you aren't 100% out of business.

grider · Oct 21, 2019

So looking at this machine
https://www.titancomputers.com/Titan-S375-Dual-AMD-EPYC-Rome-7002-Series-p/s375.htm

Here is how the price per core work out

cores per machine	price	price per core
16	3575	223.4375
24	4319	179.9583
32	4343	135.7188
48	6163	128.3958
64	7969	124.5156
96	12310	128.2292
128	19366	151.2969

So economically it looks like the sweet spot is 32 core servers. Thats $4343 per machine:
2 machines (64 cores) $8686
3 machines (96 cores) $13029

Advantages are

I'm only $4343 invested if this doesnt scale up like I think
as kanewolf said if I have 3 and one dies Im not out of business

Some concerns/questions

these are AMD. will i regret not buying Intel?
is there any way to connect up 3 of these so that they behave as 1?
are the prices Ive found reasonable? any other better places to shop?

Thanks for the help!!!!

kanewolf · Oct 21, 2019

The link you show actually is a dual socket host so you would have twice the cores listed in your table above.

Your price/core is one thing, but the clock speed per core is not taken into account. You do need to look at that also. If there is a large drop in clock speed between one of the bins, then you may not want to bump up to the higher cores/socket.
Also with 32 cores, you need to look at the cache. There is a 32 core CPU with 64MB cache and one with 128MB cache.
Also be sure you populate all 8 channels of RAM for each socket. That may mean you end up with a lot more RAM than you think you "NEED". It looks like they have 8x8GB as a memory option so you could get 128GB/server with two sockets filled.

grider · Oct 21, 2019

Thanks Kanewolf. The cpu config I selected for for the 32 bit , for example is for 2 epyc 7282 processors. It says 32 bit but not if that's per cpu or not. However if I look up the specs on teh EPYC 7282 it shows 16 cores. So I think I got the number of cores right on all of them. Am I missing something
THanks again!

kanewolf · Oct 21, 2019

grider said:
Thanks Kanewolf. The cpu config I selected for for the 32 bit , for example is for 2 epyc 7282 processors. It says 32 bit but not if that's per cpu or not. However if I look up the specs on teh EPYC 7282 it shows 16 cores. So I think I got the number of cores right on all of them. Am I missing something
THanks again!

I didn't know what you were using to calculate cores. Yes, a 7282 would be 16 cores per socket, 32 cores with hyperthreading availble. Just be sure you buy enough RAM -- you want 16 DIMMs to maximize performance. The 7302 CPU would have the same number of cores, but double the cache and a higher base clock speed. I don't know if your simulations would benefit from the increased cache, compared to the cost delta.

grider · Oct 21, 2019

kanewolf said:
I didn't know what you were using to calculate cores. Yes, a 7282 would be 16 cores per socket, 32 cores with hyperthreading availble. Just be sure you buy enough RAM -- you want 16 DIMMs to maximize performance. The 7302 CPU would have the same number of cores, but double the cache and a higher base clock speed. I don't know if your simulations would benefit from the increased cache, compared to the cost delta.

Thank you again. I will probably go with the 128 cache just out of my own sheer ignorance on how it will impact my run times. I wonder if , after I buy the first machine, I can somehow limit the size of the cache to 64 to see how it impact performance in my specific case?

grider · Oct 21, 2019

For what its worth here is a table of how many iterations I can get through by running concurrent simulations on my Coffee Lake 6 core laptop. All tests were run on the same laptop. There is substantial gain for using concurrent simulations up to the number of physical cores, then modest gains from there up to the number of logical cores minus 1. At that point Process Explorer shows all the CPUs nearly maxed out. I dont know if that gives any clues about the importance of cache size for this case?

Concurrent Simulations	Simulations per 12 hours
1	5
6	25
11	30

kanewolf · Oct 21, 2019

grider said:
For what its worth here is a table of how many iterations I can get through by running concurrent simulations on my Coffee Lake 6 core laptop. All tests were run on the same laptop. There is substantial gain for using concurrent simulations up to the number of physical cores, then modest gains from there up to the number of logical cores minus 1. At that point Process Explorer shows all the CPUs nearly maxed out. I dont know if that gives any clues about the importance of cache size for this case?

Concurrent Simulations Simulations per 12 hours
1 5
6 25
11 30

Not directly, the memory, or storage could also be limiting your performance. Usually, more cache is better, as long as the cost delta isn't too great.

grider · Oct 21, 2019

kanewolf said:
Not directly, the memory, or storage could also be limiting your performance. Usually, more cache is better, as long as the cost delta isn't too great.

the memory in process explorer is showing little usage and a steady flat line while running even with all CPUs maxed out. I suppose cache has more to do with available memory speed as opposed to available memory size, though . . .
thanks again. youve been verry helpful!

kanewolf · Oct 21, 2019

Cache is the lowest latency memory available to the CPU (since it is on the CPU chip). Cache can smooth out performance slowdowns that happen when main memory has to be accessed. The more of your simulation code and data that fits IN the cache, the less performance penalty for memory access. If cost is an issue, then the smaller cache, offset by ensuring you have 8 DIMMs per socket would be optimum.
If you end up with WAY more RAM than you can use as main memory, then think about a RAM disk to use as a scratch or cache disk. If your simulation has to do a lot of I/O then a RAM disk (with appropriate snapshot to physical disk) can improve your performance significantly

grider · Oct 21, 2019

kanewolf said:
Cache is the lowest latency memory available to the CPU (since it is on the CPU chip). Cache can smooth out performance slowdowns that happen when main memory has to be accessed. The more of your simulation code and data that fits IN the cache, the less performance penalty for memory access. If cost is an issue, then the smaller cache, offset by ensuring you have 8 DIMMs per socket would be optimum.
If you end up with WAY more RAM than you can use as main memory, then think about a RAM disk to use as a scratch or cache disk. If your simulation has to do a lot of I/O then a RAM disk (with appropriate snapshot to physical disk) can improve your performance significantly

very interesting. IS there anything I can watch (while the simulations are running) on ProcessMonitor or HWMonitor or similar that would let me know if cache size is a limitation?

kanewolf · Oct 21, 2019

grider said:
very interesting. IS there anything I can watch (while the simulations are running) on ProcessMonitor or HWMonitor or similar that would let me know if cache size is a limitation?

Intel CPUs have performance counters -- https://software.intel.com/en-us/fo...optimization-platform-monitoring/topic/548988 The VTune application might give you insight. This is advanced debugging. And probably not worth the effort unless you can also rewrite the simulation to optimize.

[SOLVED] computer recommendation

Prominent

Titan

Titan

Titan

Prominent

Prominent

Prominent

Titan

Prominent

Titan

Prominent

Titan

Prominent

Prominent

Titan

Prominent

Titan

Prominent

Titan

Share this page