[SOLVED] computer recommendation

grider

Prominent
Jun 25, 2019
21
0
510
Hello. Hoping for some expert advice. I need a computer for running simulations. Most important is number of cores and clock speed. Number of cores is most important. I want to be able to run 96 simulations overnight (running simultaneously). I dont really care if it "overnight" is 12 or 14 hours, but I do need 96 runs (for example) so thats why I think clock speed isnt as important as number of cores for me. I dont care about graphics and tons of memory and flashy cases and things like that. I also dont mind if the cores are in different machines. For example if it is cheaper to have 2 computers 24 cores each that is fine.

So what and where can I find the best value deal for a machine with 48 cores and a decent clock speed? Would AMD instead of Intel be good enough? Any recomendations for specific configurations or sources would be greatly appreciated.
 
Solution
The link you show actually is a dual socket host so you would have twice the cores listed in your table above.

Your price/core is one thing, but the clock speed per core is not taken into account. You do need to look at that also. If there is a large drop in clock speed between one of the bins, then you may not want to bump up to the higher cores/socket.
Also with 32 cores, you need to look at the cache. There is a 32 core CPU with 64MB cache and one with 128MB cache.
Also be sure you populate all 8 channels of RAM for each socket. That may mean you end up with a lot more RAM than you think you "NEED". It looks like they have 8x8GB as a memory option so you could get 128GB/server with two sockets filled.

kanewolf

Titan
Moderator
Is your "96 simulations" done by having one program run 96 times or attempting to run 96 simultaneous instances of a program?
What benchmark performance data do you have?
Do you know how well the software scales on number of cores?
Have you attempted to run it on an AWS instance with 48 or more cores to verify that it scales to that large a host?
 

grider

Prominent
Jun 25, 2019
21
0
510
Read through this thread and follow the stylizing with a followup post with the necessary details on this thread. We can move forward from there.
Thanks Lutfig. Here is the info. Probably not a "typical" situation:
Approximate Purchase Date: ASAP

Budget Range: ~10K . . . but I want to save money anywhere I can . . . maybe the best $ per core ? Is that an OK way of looking at it?

System Usage from Most to Least Important: compute intensive simulations (sometimes days to complete)

Are you buying a monitor: No

Parts to Upgrade: All new system

Do you need to buy OS: Yes

Preferred Website(s) for Parts: no pref

Location: South Florida East Coast

Parts Preferences: no preference

Overclocking: Yes

SLI or Crossfire: Maybe

Your Monitor Resolution: not important to me

Additional Comments:

And Most Importantly, Why Are You Upgrading: all new system
 

grider

Prominent
Jun 25, 2019
21
0
510
Is your "96 simulations" done by having one program run 96 times or attempting to run 96 simultaneous instances of a program?
What benchmark performance data do you have?
Do you know how well the software scales on number of cores?
Have you attempted to run it on an AWS instance with 48 or more cores to verify that it scales to that large a host?

Thanks kanewolf. Answers follow.

  1. 96 separate independent instances of the app running simultaneously (not COM instantiated or anything like that). just like manually opening the app 96 times
  2. small scale testing on 6 core machine. done this before with a similar app in the past with a couple hundred cores with good success
  3. scales very well. no concerns there
  4. Cloud computing is not an option for me for various reasons. I am confident that it will scale to as many cores as I can afford, assuming the rest of the hardware supports the number of cores. The nature of what I am doing is that I can divide the simulations up on independent machines even if I want.
 

kanewolf

Titan
Moderator
Thanks kanewolf. Answers follow.

  1. 96 separate independent instances of the app running simultaneously (not COM instantiated or anything like that). just like manually opening the app 96 times
  2. small scale testing on 6 core machine. done this before with a similar app in the past with a couple hundred cores with good success
  3. scales very well. no concerns there
  4. Cloud computing is not an option for me for various reasons. I am confident that it will scale to as many cores as I can afford, assuming the rest of the hardware supports the number of cores. The nature of what I am doing is that I can divide the simulations up on independent machines even if I want.
I would recommend multiple smaller servers (16 core) because they will be cheaper, have higher clock speed and if one dies, you aren't 100% out of business.
 

grider

Prominent
Jun 25, 2019
21
0
510
So looking at this machine
https://www.titancomputers.com/Titan-S375-Dual-AMD-EPYC-Rome-7002-Series-p/s375.htm

Here is how the price per core work out

cores per machinepriceprice per core
16​
3575​
223.4375​
24​
4319​
179.9583​
32​
4343​
135.7188​
48​
6163​
128.3958​
64​
7969​
124.5156​
96​
12310​
128.2292​
128​
19366​
151.2969​

So economically it looks like the sweet spot is 32 core servers. Thats $4343 per machine:
2 machines (64 cores) $8686
3 machines (96 cores) $13029

Advantages are
  1. I'm only $4343 invested if this doesnt scale up like I think
  2. as kanewolf said if I have 3 and one dies Im not out of business

Some concerns/questions
  1. these are AMD. will i regret not buying Intel?
  2. is there any way to connect up 3 of these so that they behave as 1?
  3. are the prices Ive found reasonable? any other better places to shop?

Thanks for the help!!!!
 

kanewolf

Titan
Moderator
The link you show actually is a dual socket host so you would have twice the cores listed in your table above.

Your price/core is one thing, but the clock speed per core is not taken into account. You do need to look at that also. If there is a large drop in clock speed between one of the bins, then you may not want to bump up to the higher cores/socket.
Also with 32 cores, you need to look at the cache. There is a 32 core CPU with 64MB cache and one with 128MB cache.
Also be sure you populate all 8 channels of RAM for each socket. That may mean you end up with a lot more RAM than you think you "NEED". It looks like they have 8x8GB as a memory option so you could get 128GB/server with two sockets filled.
 
Solution

grider

Prominent
Jun 25, 2019
21
0
510
Thanks Kanewolf. The cpu config I selected for for the 32 bit , for example is for 2 epyc 7282 processors. It says 32 bit but not if that's per cpu or not. However if I look up the specs on teh EPYC 7282 it shows 16 cores. So I think I got the number of cores right on all of them. Am I missing something
THanks again!
 

kanewolf

Titan
Moderator
Thanks Kanewolf. The cpu config I selected for for the 32 bit , for example is for 2 epyc 7282 processors. It says 32 bit but not if that's per cpu or not. However if I look up the specs on teh EPYC 7282 it shows 16 cores. So I think I got the number of cores right on all of them. Am I missing something
THanks again!
I didn't know what you were using to calculate cores. Yes, a 7282 would be 16 cores per socket, 32 cores with hyperthreading availble. Just be sure you buy enough RAM -- you want 16 DIMMs to maximize performance. The 7302 CPU would have the same number of cores, but double the cache and a higher base clock speed. I don't know if your simulations would benefit from the increased cache, compared to the cost delta.
 
  • Like
Reactions: grider

grider

Prominent
Jun 25, 2019
21
0
510
I didn't know what you were using to calculate cores. Yes, a 7282 would be 16 cores per socket, 32 cores with hyperthreading availble. Just be sure you buy enough RAM -- you want 16 DIMMs to maximize performance. The 7302 CPU would have the same number of cores, but double the cache and a higher base clock speed. I don't know if your simulations would benefit from the increased cache, compared to the cost delta.
Thank you again. I will probably go with the 128 cache just out of my own sheer ignorance on how it will impact my run times. I wonder if , after I buy the first machine, I can somehow limit the size of the cache to 64 to see how it impact performance in my specific case?
 

grider

Prominent
Jun 25, 2019
21
0
510
For what its worth here is a table of how many iterations I can get through by running concurrent simulations on my Coffee Lake 6 core laptop. All tests were run on the same laptop. There is substantial gain for using concurrent simulations up to the number of physical cores, then modest gains from there up to the number of logical cores minus 1. At that point Process Explorer shows all the CPUs nearly maxed out. I dont know if that gives any clues about the importance of cache size for this case?

Concurrent SimulationsSimulations per 12 hours
15
625
1130
 

kanewolf

Titan
Moderator
For what its worth here is a table of how many iterations I can get through by running concurrent simulations on my Coffee Lake 6 core laptop. All tests were run on the same laptop. There is substantial gain for using concurrent simulations up to the number of physical cores, then modest gains from there up to the number of logical cores minus 1. At that point Process Explorer shows all the CPUs nearly maxed out. I dont know if that gives any clues about the importance of cache size for this case?

Concurrent SimulationsSimulations per 12 hours
15
625
1130
Not directly, the memory, or storage could also be limiting your performance. Usually, more cache is better, as long as the cost delta isn't too great.
 

grider

Prominent
Jun 25, 2019
21
0
510
Not directly, the memory, or storage could also be limiting your performance. Usually, more cache is better, as long as the cost delta isn't too great.
the memory in process explorer is showing little usage and a steady flat line while running even with all CPUs maxed out. I suppose cache has more to do with available memory speed as opposed to available memory size, though . . .
thanks again. youve been verry helpful!
 

kanewolf

Titan
Moderator
Cache is the lowest latency memory available to the CPU (since it is on the CPU chip). Cache can smooth out performance slowdowns that happen when main memory has to be accessed. The more of your simulation code and data that fits IN the cache, the less performance penalty for memory access. If cost is an issue, then the smaller cache, offset by ensuring you have 8 DIMMs per socket would be optimum.
If you end up with WAY more RAM than you can use as main memory, then think about a RAM disk to use as a scratch or cache disk. If your simulation has to do a lot of I/O then a RAM disk (with appropriate snapshot to physical disk) can improve your performance significantly
 
  • Like
Reactions: grider

grider

Prominent
Jun 25, 2019
21
0
510
Cache is the lowest latency memory available to the CPU (since it is on the CPU chip). Cache can smooth out performance slowdowns that happen when main memory has to be accessed. The more of your simulation code and data that fits IN the cache, the less performance penalty for memory access. If cost is an issue, then the smaller cache, offset by ensuring you have 8 DIMMs per socket would be optimum.
If you end up with WAY more RAM than you can use as main memory, then think about a RAM disk to use as a scratch or cache disk. If your simulation has to do a lot of I/O then a RAM disk (with appropriate snapshot to physical disk) can improve your performance significantly

very interesting. IS there anything I can watch (while the simulations are running) on ProcessMonitor or HWMonitor or similar that would let me know if cache size is a limitation?
 

kanewolf

Titan
Moderator
very interesting. IS there anything I can watch (while the simulations are running) on ProcessMonitor or HWMonitor or similar that would let me know if cache size is a limitation?
Intel CPUs have performance counters -- https://software.intel.com/en-us/fo...optimization-platform-monitoring/topic/548988 The VTune application might give you insight. This is advanced debugging. And probably not worth the effort unless you can also rewrite the simulation to optimize.