Build for Scientific Computation

lxgliu

Commendable
Aug 4, 2016
14
0
1,510
I'm looking at building a dual-Xeon system for mainly integrated circuit simulation using Cadence Virtuoso, Ansys HFSS, and Keysight ADS. My budget is $7000-8000.

Both Virtuoso and HFSS can utilize many cores. My lab also has more than one people that need to access the system so it's a good idea to have more cores. On the other hand, these software pretty much relies on pure CPU power for the computation so I believe it's also good to have high clock frequency. I don't think we need a particularly strong GPU because the software can't utilize it for acceleration.

When I look at the Xeon offering, I became bewildered with the clock/cache/core/price/version combination that I don't really know what to choose.

I've taken a look at Dell outlet. There seems to be some decent offerings, such as this one: http://outlet.us.dell.com/ARBOnlineSales/Online/SecondaryInventorySearch.aspx?c=us&l=en&s=dfb&cs=28&key=WjSVing4ZtSnnxtEEO5TMA%3d%3d&puid=b3ed5270

I'm also fine with second hand systems.

Can anyone give some guidance here? Thanks!
 
Solution
This is not my area of expertise but i will note that if your calculations are able to utilize the linear nature of GPU processors they can be even more powerful than traditional CPUs. The largest Super Computers in existence use GPU processors for most of their calculations and CPUs for management and date routing. as for the CPUs on the lower scale you could get two Xeon E5-2650 v4. moving up a level you could get two Xeon E5-2687W v4 which are the same socket and the same 12 cores but faster. if you want more cores in exchange for some clock speed you could also use Xeon E5-2683 v4. if you don't buy graphics and skimp on other settings you could maybe get these Xeon E5-2697 v4. those would be the most cores but a bit slower still
 


Thanks for your info. Unfortunately we are using proprietary software (Cadence ADE) and they don't seem to support GPU acceleration.

What's your opinion on v3 or even v2 processors? I've read some reviews showing that the difference in performance between v3 and v4 is not very significant for some processors. Power consumption is less of a concern for us.

 
V3 V2 and V4 are all basically just generations. Newer models are lower power consumption lower heat output and often faster. Due to how Xeons are marketed older CPUs are not necessarily much cheaper and those still on the market are usually the priciest versions. If you are looking for more cores it is actually cheaper for more cores on newer models
 


Thanks for the clarification. I see that some older generation chips are so much cheaper on eBay. For example, the E5 2670 is ~$70. They certainly don't perform as well as the v4 chips, but the price difference is staggering.
http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2050601.m570.l1312.R1.TR10.TRC2.A0.H0.Xe5+.TRS2&_nkw=e5+2670&_sacat=0
 


Could you also teach how to find out what the maximum Turbo speed for a CPU when all the cores are running full throttle? It seems the Turbo speed in the CPU spec is for the maximum speed of one core, and usually when all the core are running the max speed is lower.
 


lxgliu,

This will be a long one,...

I tried replying to your earlier thread on this subject, but the thread was closed so I started another. Follows is that reply and continue reading at the bottom for an update based on this new thread.

OPTION 1: Standalone / New Components

Follows is a system based on a pair of Xeon E5-2640 v4 10-core 2.4 / 3.4GHz processors / 256GB of DDR4 -2133 ECC RAM / Quadro M2000 4GB GPU, Samsung 950 M.2 512GB NVMe / and 3X Seagate Es.3 4TB HDD's mounted in a Supermicro SuperWorkstation SYS-7038A-I

The E5-2640 v4 10-core CPU is a good balance of core count and clock speed. The Passmark CPU mark for a pair is 22062 and the single-thread mark is 1860 which would support complex 3D modeling.

The Quadro M2000(4GB) ($480)is a newly released GPU, with performance near that of the Quadro K4200 94GB) ($792) > (M2000 Passmark average 3D= 4262, K4200 average is 4428.) Again, sufficient for advanced graphic design, e.g. IC block diagrams and quite complex 3D modeling.

The Supermicro SuperWorkstation provides a case, motherboard, CPU coolers, and 900W power supply. This platform allows a simplified system assembly as it's only necessary to mount Processors, CPU coolers, RAM, and drives. -Very fast as there are so many decisions made and no substantial assembly or wiring. These systems are rated to be very quiet.

The SYS-7038A-I includes the Supermicro X10 DAi motherboard, and extraordinarily good one for this use, with16X RAM slots and 3X PCie x 16 slots. If future software benefits from GPU acceleration, two Tesla GPU coprocessors could be added.

Drive 1 is a Samsung 950 Pro 512GB NVMe M/2 drive, currently one of the fastest in the World. In this use, it's suggested to forgo a scratch disk and have the OS, Programs and current projects on the principal drive.

Storage /archiving, libraries are kept on 3X Seagate Constellation ES.3 Enterprise drives which have long endurance and 128MB instead of the usual 64MB. These run off an LSI MR9361 RAID controller which is rated for 12GB/s.

BambiBoom CalcuCannon SPICErackanomicADSimurific iCWonk TurboSignature Extreme Signature 9900 ®©$$™®£™©™ _ 8.4.16

Case /Motherboard /Power Supply/ CPU Coolers : Supermicro SuperWorkstation SYS-7038A-I Dual LGA2011-3 / Supermicro X10DA / 900W Mid-Tower Workstation Barebone System (Black) > $660

__ http://www.supermicro.com/products/system/tower/7038/SYS-7038A-i.cfm
__ http://www.superbiiz.com/detail.php?name=SY-7038AI

CPU: (2) Intel Xeon E5-2640 v4 Ten-Core Broadwell Processor 2.4 / 3.4GHz 8.0GT/s 25MB LGA 2011-3 CPU, OEM > $1,960 ($980 each)

__ http://ark.intel.com/products/92984/Intel-Xeon-Processor-E5-2640-v4-25M-Cache-2_40-GHz
__ http://www.superbiiz.com/detail.php?name=E5-2640V4

Memory: 256GB (16x 16GB) Samsung DDR4-2133 16GB/2Gx72 ECC/REG CL15 Server Memory > $1,280 ($80 ea.)

__ http://www.superbiiz.com/detail.php?name=D42116G4S

GPU: PNY NVIDIA Quadro M2000 4GB GDDR5 DVI/4DisplayPorts PCI-Express Video Card > 479

__ http://www.superbiiz.com/detail.php?name=PNY-M2000

RAID Controller : Broadcom LSI MegaRAID SAS 9361-4i 4-Port 12Gb/s SAS+SATA PCI-Express 3.0 Low Profile RAID Controller, Single> $401

__ http://www.superbiiz.com/detail.php?name=LSI93614IS

Disk 1: Samsung 950 PRO Series 512GB M.2 PCIe 3.0 x4 Solid State Drive, Retail (V-NAND) > $318.

__ http://www.superbiiz.com/detail.php?name=MZ-V5P512B

M.2 to PCIe Adapter: DT 120 - M.2 PCIe to PCIe 3.0 x4 Adapter (support M.2 PCIe 2280, 2260, 2242) > $20

__ http://www.newegg.com/Product/Product.aspx?Item=9SIA8RU39C6054&cm_re=m.2_to_pcie_adapter-_-9SIA8RU39C6054-_-Product

Disks 2, 3, and 4: Seagate Constellation ES.3 ST4000NM0033 4TB 7200 RPM 128MB Cache SATA 6.0Gb/s 3.5" Enterprise Internal Hard Drive Bare Drive > $732 ($244 ea.) (RAID 5)

__ http://www.newegg.com/Product/Product.aspx?Item=9SIA2W021H6289&cm_re=Seagate_Constellation_ES.3-_-22-178-307-_-Product

Optical Disk: SAMSUNG DVD Burner 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM SATA Model SH-224DB/BEBE - OEM > $18

__ http://www.newegg.com/Product/Product.aspx?Item=N82E16827151266

Operating System: Microsoft Windows 7 Professional SP1 64-bit English (1-Pack), OEM > $139.

__ http://www.superbiiz.com/detail.php?name=MSFQC08289
_______________________________________

TOTAL = $6,007

The next step up in processors would add about $2,000 to the cost and I feel that 10=cores at this clock speed is sufficient. It's possible to consider a couple of E5-2600 v3 CPU's if the software will be more optimized.

Some Questions:

What software are you using?

What are typical file sizes?

What are typical running times? On what system?

What is your situation with monitors?

___ Performance should be very good. This concept is deliberately under budget so there is overhead to focus performance. If you have a special requirement in the form and can identify the parameter and expectation, it's always possible to optimize in that direction.

As the Xeon E5-2640 v4 is very newly released, there are only 6 systems tested on Passmark. The three systems having dual processors had CPU scores of 25080 on an ASUS Z10PE-D16 WS, 22791 on a Hewlett-Packard 21291, and 21453 on another ASUS Z10PE-D16 WS. The Supermicro Z10DAi system with one CPU had a mark of 16014 and the ASUS Z10PE-D16 WS with a single CPU scored 15776. The top system rating is 6042: 2X E5-2640 v4 / ASUS Z10PE-D16 WS / 256GB / Samsung 950 Pro 256GB / GTX 980 Ti. The single CPU system using the Supermicro X10DAi was rated 5177 with single CPU / 128GB / GTX 1070 / SanDisk SSDDXPS480G

[OPTION 2: Standalone /Used Processors


As you are open to used workstations and obsolete processors, the possibilities are considerably extended.

If higher base clock speed is a priority, and the CAE software is fully scalar for physical processor cores, but not GPU cores, using dual Xeon E5-2687w v2. The E5-2687w v2 was, in my view, a high point as it it 8-core @ 3.4 / 4.0, the highest native clock speeds of any 8 or greater-core Xeons. The value of these is that the higher the base clock speed, the better the single-thread portions of software will run. Many kinds of software are hybrids that include multi-threading but for some processes are purely single-threaded. This is most common in visualization software where, for example, the 3D modeling portion is essentially single-threaded, but the rendering is fully scalar for CPU cores, e.g., Solidworks. Matlab is another that can be both, but the multi-threading has to be customized to the application.

In my view, the essential aspect of your query will be best when based on specific information as to the processes that are single-threaded and those that are multi-threaded.

BambiBoom CalcuCannon CAElectricitalSimulicious iCBlaster TurboSignature Extreme Science Stuffer 9901 ®©$$™®£™©™ _ 8.5.16

Case /Motherboard / CPU Coolers / Power Supply: Supermicro SuperWorkstation SYS-7047A-T Dual LGA2011 1200W 4U Rackmount / Tower Workstation Barebone System (Black) > $1,000

__ http://www.superbiiz.com/detail.php?name=SY-747AT
__ https://www.supermicro.com.tw/products/system/4U/7047/SYS-7047A-T.cfm

CPU: 2 X Xeon E5-2687W > used $900-$1,200 each

__ http://ark.intel.com/products/76161/Intel-Xeon-Processor-E5-2687W-v2-25M-Cache-3_40-GHz
__ Passmark: {Dual Intel Xeon E5-2687w v2: 24501 Single threaded rating: 2060

RAM: 256 (16X 16GB) Samsung DDR3-1866 16GB/2Gx72 ECC/REG CL13 Samsung Chip Server Memory $1232 ($77 each)

__ http://ark.intel.com/products/76161/Intel-Xeon-Processor-E5-2687W-v2-25M-Cache-3_40-GHz

GPU: : PNY NVIDIA Quadro M2000 4GB GDDR5 DVI/4DisplayPorts PCI-Express Video Card > $479

__ http://www.superbiiz.com/detail.php?name=PNY-M2000

RAID Controller : Broadcom LSI MegaRAID SAS 9361-4i 4-Port 12Gb/s SAS+SATA PCI-Express 3.0 Low Profile RAID Controller, Single> $401

__ http://www.superbiiz.com/detail.php?name=LSI93614IS

Disk 1: Intel SSD 750 Series SSDPE2MW800G4X1 800GB 2.5 inch PCI-Express 3.0 x4 Solid State Drive (MLC) > $654

__ http://www.superbiiz.com/detail.php?name=SSD750P800

Disks 2, 3, and 4: Seagate Constellation ES.3 ST4000NM0033 4TB 7200 RPM 128MB Cache SATA 6.0Gb/s 3.5" Enterprise Internal Hard Drive Bare Drive > $732 ($244 ea.) (RAID 5)

__ http://www.newegg.com/Product/Product.aspx?Item=9SIA2W0...

Optical Disk: SAMSUNG DVD Burner 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM SATA Model SH-224DB/BEBE - OEM > $18

__ http://www.newegg.com/Product/Product.aspx?Item=N82E168...

Operating System:: Microsoft Windows 7 Professional SP1 64-bit English (1-Pack), OEM > $139.

___________________________________________

TOTAL: about $6,900 (assuming $1,100 each for each E5-2687w v2)


OPTION 3: Beowulf Cluster

Based on the possibility of the software's ability to utilize all physical cores, and that it can be run in Linux, a Beowulf cluster system is possible and may have significant advantages in processing power:

BambiBoom CalcuCannon BeowulferCAElectricitalSimurama iCBlaster TurboSignature Extreme Science Stuffer 9902 ®©$$™®£™©™ _ 8.5.16

Base System: HP z620 : used with low specification, about $700

2ND CPU Riser: z620 specific, about $180

CPU: 2X Xeon E5-2690 (8- core @ 2.9 /3.8GHz) about $400 (@$200 each)

RAM: 128GB ( 8X 16GB)Samsung DDR3L-1600 /1Gx4 ECC/REG CL11 Sever Memory $716 ($89 each) + 4X

__ http://www.superbiiz.com/detail.php?name=D316GR16G3

GPU: : PNY NVIDIA Quadro M2000 4GB GDDR5 DVI/4DisplayPorts PCI-Express Video Card > $479

__ http://www.superbiiz.com/detail.php?name=PNY-M2000

Drive 1: HP Z Turbo Drive G1 N8T12AT M.2 (22x80) 512GB PCI-Express 3.0 x4 Internal Solid State Drive > $327

__ http://www.newegg.com/Product/Product.aspx?Item=9SIA24G3UK3409

RAID Controller : (Head system only) Broadcom LSI MegaRAID SAS 9361-4i 4-Port 12Gb/s SAS+SATA PCI-Express 3.0 Low Profile RAID Controller, Single> $401

__ http://www.superbiiz.com/detail.php?name=LSI93614IS

Drive 2,3,4: (Head system only) 3X Seagate Constellation ES.3 ST4000NM0033 4TB 7200 RPM 128MB Cache SATA 6.0Gb/s 3.5" Enterprise Internal Hard Drive Bare Drive > $732 ($244 ea.) (RAID 5)

__ http://www.newegg.com/Product/Product.aspx?Item=9SIA2W021H6289&cm_re=Seagate_Constellation_ES.3-_-22-178-307-_-Product

Optical Disk: SAMSUNG DVD Burner 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM SATA Model SH-224DB/BEBE - OEM > $18

__ http://www.newegg.com/Product/Product.aspx?Item=N82E16827151266

Operating System: Requires a Linux flavor. to be determined

Notes:

__ May require a quad LAN card

__ If each system is to use 256GB of RAM, base it on HP z820 as the RAM has a balanced slots configuration.

__________________________________________________

TOTAL: about $4,000 head system, $2,800 for the 2nd system

See the project HP z620 below for an example of some of the tasks involved in upgrading a used workstation.

The idea is that by using both depreciated systems and processors, the cost / performance is significantly enhanced. For about $6,800 which = +$800 to the E5-2640 v4 standalone and about equal cost to the standalone with E5-2687w v2 the proposed budget can afford two of these in parallel- having 32-cores / 64 threads - compared to 20/40 and 16/32 and a total of 8 drives to 4 in Options 1 and 2. A base speed of 2.9Ghz is adequate for multi-core processing while the 3.8GHz turbo speed provides a very good single-threaded performance and makes 3D modeling and visualization applications perform well. By the way, you can see the base and turbo speeds of Xeons - and the sequence of acceleration by consulting the Wikipedia article on Xeon Processors which will also link to specifications at the very useful Intel ARK site. In the cluster, additional systems could be added to the cluster. At my local particle accelerator, they are running particle experimental simulations on eleven, parallel, dual 14-core Xeons each one of which has four Tesla 20X coprocessors.

So, there are a couple of alternatives. Without knowing more about he specifics of the software, It's a difficult calculation as to which solution would be preferable,

A very interesting project. What kind of IC's are you working on?

Cheers,

BambiBoom

__________________________________________________

PS> Cost/ Performance Benefits of Upgrading a Used Workstation

Current Project:

As an example of the general sequence of Option 3, my current project (8.16) is an HP z620 rendering system to replace a Dell Precision T5500:

HP z620: > $270_ 7.7.16

HP z620 (Original) Xeon E5-1620 4-core @ 3.6 /3.8GHz) / 8GB (1X 8GB DDR3-1333) / AMD Firepro V5900 (2GB) / Seagate Barracuda 750GB + Samsung 500GB + WD 500GB
[ Passmark System Rating= 2408 / CPU= 8361 / 2D= 846 / 3D = 1613 / Mem =1584 / Disk = 574 ] 7.13.16

Purchased:

2X Xeon E5-2690: $152 and $154 (single thread rating = 1888)
CPU riser board: $150
32GB (4X8 DDR3-1600 ECC) $165
Set complete plastic case parts: $56

The set of case plastic is to replace those on the system, damage being the reason the z620 was so inexpensive. The Quadro K2200 (4GB),will come from the T5500 and the Intel 730 480GB, WD Black 1TB from my main system, an HP z420 which is replaced by a Samsung M.2 SM951 AHCI and Seagate Constellation ES.3 1TB. Value of the used parts is about $400

TOTAL = about $1,350

Plus, I can sell the E5-1620 and Firepro V5900 for about $75 each, so the net cost could be about $1,200. I hope to sell the Precision for about $100-200 less than that amount so the upgrade is not terribly expensive overall.

Results so far:

HP z620 (Revision 2) 2X Xeon E5-2690 (8-core @ 2.9 /3.8GHz) / 40GB (4X 8GB +4X 2GB DDR3-1600) / Quadro K2200 (4GB) / Seagate Barracuda 750GB + Samsung 500GB + WD 500GB / 800W > Windows 7 Professional 64-bit >
[ Passmark System Rating= 2468 / CPU= 20083 / 2D= 731 / 3D = 3535/ Mem =2278 / Disk = 541 ] 8.1.16

With updated BIOS:

[ Passmark System Rating= 2589 / CPU= 19671 / 2D= 728 / 3D = 3542/ Mem =2397 / Disk = 587 ] 8.2.16

The last Z620 on Ebahhh US with a pair of E5-2690's, 64GB of RAM, a Quadro 5000, and ordinary mech'l HDD's was sold for $3,699. As my system has a faster GPU than a Quadro 5000, and when the new HP Z Turbo arrives- the eventual performance should be comparable or better than the $3,700 system for a bit over 1/3 the cost. This demonstrates the enhanced cost / performance of upgrading that is impossible to match by building from separate components. There is of course, more specialized research / shopping and judgement calls on used components














 


Thanks bambiboom for the detailed reply. It will take me sometime to digest all the information. I will reply to you soon.
 


bambiboom, once again, thanks for your valuable insight. You provided so much information that I have to reply in several posts.  

I am primarily using two software packages, Cadence Analog Design Environment for integrated circuit simulations (SPICE-based simulation), and Ansys HFSS for full-wave 3D electromagnetic simulations (Finite Element based simulation).

The speed of simulation for both software is very much dependent on the CPU clock speed. I have not done, or seen, any study on the dependence on cache size; I would imagine a large cache is always good.

Cadence has the ability to run the simulation on multiple cores. I think the speed-up is between N/4 and N/2, where N is the number of cores.

HFSS is a mix. Because it's FEM based and FEM is known to be difficult to parallelize, the core computation part of a HFSS simulation can only run on a single core. However, the rest of the simulation, such as frequency sweeps (post processing based results generated from the core computation) can be spread to multiple cores with linear scaling. So I think overall, the speed-up for HFSS is probably in the N/4 and N/2 range, dependent on the problem being solved.
http://ansys.org/staticassets/ANSYS/staticassets/resourcelibrary/article/AA-V4-I2-HPC-Options-for-HFSS.pdf

So it is advantageous to have more cores, but ultimately, the speed of the cores is fundamental.

In our lab, we are currently using i7-based machines. We have one 4960X system with 64GB memory, one 4790K/32GB, and one 5820K/32GB. The i7s are great because their clock speed is high. When I benchmark HFSS simulation with Xeons (from another lab, running at 2.4GHz), the i7 shows quite a bit of performance advantage (single core performance).

But as we deal with more complex circuits, we are increasingly feeling the need for larger memory. We have had several models in Cadence and HFSS that require around 20GB+ memory/model. This means that our 4960X system, with its 64GB memory, is constrained to running 2 simultaneous simulations even though it has 6 cores (we turn off hyperthreading for higher single core performance). So the amount of memory has become a bottle neck for large simulations. This is why we are starting to look at machines with more memory capacity, and hence the Xeons.

So to summarize, here are what we are looking for on the CPU/Memory front:
* many physical cores to accommodate many users, or many parallel simulations
* high core speed (3GH+ preferred) when all cores are running. I recently realized that the Turbo speed in the CPU's spec is the maximum speed for a single core, not necessarily the speed of all cores when they are all running at full throttle.
* memory-physical-core ratio of around 20GB+/core.

Both the memory and simulation time scale with the complexity of the problem being solved. The longest simulation we've run on Cadence took 18 hours (on a 2.4GHz Xeon core). The longest in HFSS is around 8 hours. We want to cut down the time as much as possible so that we can have fast design iterations.

This machine will be running RHEL Linux because Cadence ADE only supports RHEL and because our department IT doesn't like anything else than RHEL. So I need to make sure the hardware is compatible with RHEL.

The machine will serve as a remote server. There are two modes of operation
1) the user SSH (or VNC) into the server and do all the work
2) the user does the schematic capture / 3D modeling in their own computers and run the simulation part on the remote server. HFSS has a remote solve capability. Cadence is supposed to have one also, but we haven't tried it yet.

The latter doesn't require much video card capabilities at all. The first mode, I'm not sure.
 
bambiboom

Among the three Options you provided, I like the 2687Wv2 the best for its high clock speed. Do you know its Turbo speed for all cores? It would be nice to add a little more memory also.

I also noticed that the 2667v2 and 2687Wv2 have almost identical specs, with the 2687Wv2's base clock a tad higher. I heard from someone on another forum that the all-core Turbo speed for 2667v2 is 3.6GHz.

 


Probably not. I took a look at the Xeon Phi product family. The largest device is 16GB memory (too low for the typical problems we are running), 1.10GHz cores (also too low for the CPU intensive simulations), and 72 cores (which is nice but we can't utilize them all with the small amount of memory). The Phi seems to be a better fit for massive parallel algorithms where each thread is a relatively simple computation.
 
bambiboom and all

While I was doing my research on Xeons yesterday, I also ran into several articles on the E5-2679v4 processor, which is obviously expensive but seems to be extremely capable. It has 20 physical cores capable of running at 3.2GHz simultaneously. It's not an official SKU and is only available on eBay.
http://www.ebay.com/itm/Intel-Xeon-E5-2679-V4-OEM-2-5Ghz-3-3-Max-20-Core-Faster-Than-E5-2699-V4-/272246720084?hash=item3f632b4654:g:KkYAAOSwTdJXRRkH

And it got me thinking about the possibility of building a system with one 2679v4 and 512GB memory (about 25GB/core) using a dual-socket motherboard for possibility of adding a second set in the future. Obviously it will go much beyond my current budget (I guess the CPU+memory would cost $7000+ alone, the whole system $10k+?). But maybe it's something that I can aim for when I plan my budget in the future.
 


lxgliu,

I appreciate your clarification of the hardware utilization and file parameters. My familiarity with SPICE is second / third hand as friends working on aerospace, avionics, instrumentation projects and particle experiment are designing / modeling or specifying hardware.

In reviewing the conditions of use, a valuable resource is the Passmark CPU Benchmarks Mega Page that may be sorted in order of the "Single Thread Mark" and alternately searched by CPU name. At the top of the Single Thread Mark is the i7-4790K at 2526. As you've noted, as the core count increases, the single thread performance declines. The Xeon E5-2679 v4 20-core @ 2.5 /3.3GHz is at the top as far as calculation cycles / sec, making a Passmark CPU Mark of 25911 for a single processor. That CPU is so new there is no result for a pair. However, the Single Thread Mark of the one sample tested is 1886. Compare that to the Xeon E5-2690 mentioned in Option 3 above which has a Single Thread Mark of 1888 - but costing about $2,500 less. Besides, the memory bandwidth and PCIe lanes advantages, the value of dual Xeon system is partially that one may achieve a higher single thread rating as well the core count by using a pair of lower core count processors having individually better single thread results.

CPU: In reviewing the conditions of use, a valuable resource is the Passmark CPU Benchmarks Mega Page that may be sorted in order of the "Single Thread Mark" and alternately searched by CPU name. At the top of the Single Thread Mark list is one familiar to you, the i7-4790K at 2526. As you've noted, as the core count increases, the single thread performance declines. The Xeon E5-2679 v4 20-core @ 2.5 /3.3GHz is the top as far as calculation cycles / sec, making a Passmark CPU Mark of 25911 for a single processor. The Price is said to be about $2,800 and if so, it's not out of the question for the proposed system. See the equation a the bottom of the post. That CPU is so new there is no result for a pair. I'm not certain, but I suspect that those offered on Ebay are in fact engineering samples, although they are not marked "confidential" i the usual way. Personally, I would always avoid ES Xeons. However, a Xeon E5-2679 v4 system is possible- see below.

However, the Single Thread Mark of the one E5-2679 v4 tested is 1886. Compare that to the Xeon E5-2690 mentioned in Option 3 above which has a Single Thread Mark of 1888 - but costing about $2,500 less. Besides, the memory bandwidth and PCIe lanes advantages, the value of dual Xeon system is partially that one may achieve a higher single thread rating as well the core count by using a pair of lower core count processors having individually better single thread results.

Scanning down the Passmark single-thread chart, the six highest-rated Xeon E5-2600 series are as follows:

1. E5-2637 v3 ____ 2154 4-core @ 3.5 / 3.7Ghz > 10306 $1,057
2. E5-2667 v3 ____ 2065 8-core @ 3.2 / 3.6Ghz > 16116 $2,110
3. E5-2687w v2 ___ 2059 8-core @ 3.4 / 4.0GHz > 16666 ~$1,100 <
4. E5- 2667 v2 ____ 2047 8-core @ 3.3 / 4.0Ghz > 16415 $2,400
5. E5-2673 v2 ____ 2009 8-core @ 3.3 / 4.0Ghz > 16415 $2,400
6. E5-2690 v4 ____ 1983 12-core@ 2.6 / 3.6GHz > 19667 ~$2,800

This list is by way of explanation of the choice of the E5-2687w v2 . The single thread rating is high because of the 4.0GHz turbo speed and the base clock speed is still 3.4GHz - higher than any other 8 core or better at that and the cost is less than 1/2 any others on the list except the E5-2637 v3 which is excluded as a 4-core.

The E5-2667 v2 as you mention does have similar performance to the E5 2687w v2, and -100 Mhz as the base clock speed would not be noticeable, Checking Ebay listings, it seems there are fewer sales of the E5-2667 v2 but one sold for $791, a number have sold for $1,000, and one for $1,300. This is the pattern for CPU's near the top of their range- the prices go all over. The E5-2670, -80, and -90 were expensive new but inexpensive used because of the very large number cycled out of large servers at firms such as Google and Facebook.

To answer your query as to the disposition of turbo clock speeds, For the E5-2687w v2 the Turbo operation is indicated as a stepping sequence: 2/2/2/2/3/4/5/6 in which the first number is the multiplier rate supported when 8 cores are active, the second number is the multiple for 7 cores, the third number is for 6 cores, and the fourth number is for 5 active cores and so on. So, the first core (2 Threads) is running at 4.0GHz and each successive core is stepped so that cores 4 to 8 are running at 3.4GHz.

Looking further into the world of v3 and v4 versions that is possible, but the single-thread performance of the E5-2687w v2 for the cost I think is impossible to improve. However, an E5-2679 v4 system is realistic:

Option 4: Supermicro Superworkstation / 1X E5-2679 v4 / 256GB RAM

Consider a E5-2679 v4 system starting with a single CPU and 256GB of RAM in 32GB modules. that would allow the system to be expanded to dual CPU's and 512GB in the future.

My first guess as to cost is: Option 1 - 2X E5-2640 v4 - RAM + 1X E5-2679 v4 + 8X 32GB DDR4-2133 ECC reg. (256GB) or:

$6007 - (1,960 + 1,280) + (2,800 + 1,240) = $6,807 , That is assuming the price of the E5-2679 is $2,800 and using the least expensive DDR 2133 32GB RAM.

So it's not impossible. The next iteration would be 7,807 + (2,800 + 1,240) = $11,847. That would have an astounding calculation density and as a proprietary system probably cost in excess of $20,000.

Memory: It is possible on a number of dual Xeon motherboards to use up to 1TB and some 2TB of RAM, depending on the type. Conventional dual processor motherboards though have 8 or 16 RAM slots so to use 512GB will require 16 X 32GB. However, 32GB RAM modules are very expensive. Using the least expensive 32GB module on that page, 512GB is 16 X $280 = $4,480. For comparison 256GB would use 16GB modules and with DDR3 1600 ECC registered is 16 X $80 = $1,456. However, DDR4 is less expensive in 32GB > 16 X 32GB is 16X $155 = $2,945 and if RAM capacity is a priority, that might be a consideration in CPU choice, requiring an LGA2011-3 Xeon 2600-series v3 or v4. It would be possible of course to start the system with 8 slots filled for 256GB and the remained filled as the budget permits. In my view, the 256GB would support 4 or 5 programs open, and if projects are 20GB each and Cadence can program can run multiple, simultaneous iterations, there would still be plenty of overhead to support the intense CPU to RAM swaps.

Great project and discussion.

Cheers,

BambiBoom
 


Thanks BambiBoom. Your post above explains so many things. I really appreciate it. Let me do some more study on this topic.
 


lxsgliu,

Embarrassingly, as I did the arithmetic for the Option 4 cost, I added $1,000 to the price of the 256GB of DDR4-2133. If the least expensive RAM is used, the figure is 8 X $155 - $1,240, not $2,240. That makes the Option 4 system with a single CPU and "only" 256 GB of RAM under $7,000.

Very sorry for that silly error. At least it''s in the happier direction.

Cheers,

BambiBoom

 
Solution