News Intel Launches Granite Rapids Xeon 6900P series with 120 cores — matches AMD EPYC’s core counts for the first time since 2017

Admin · Sep 24, 2024

Intel announced the on-time launch of its high-performance Xen 6 ‘Granite Rapids’ 6900P-series models today. Five new models, spanning from 72 cores up to 128 cores, finally exceeded the core counts of AMD’s existing EPYC models for the first time since 2017.

Intel Launches Granite Rapids Xeon 6900P series with 120 cores — matches AMD EPYC’s core counts for the first time since 2017 : Read more

NinoPino · Sep 24, 2024

This time Intel did a great job : Phoronix

atmapuri · Sep 24, 2024

The question remains, if Intel still plans to sub-license parts of the logic on these new chips as it was the plan with the previous generation. One License for AVX512, one license for Neural Engine etc... Or will this be also pricewise direct competition to AMD.

If the product is better, this alone will not be enough, if the CPU, the platform, the memory etc.. will cost 2x more than competition.

JRStern · Sep 24, 2024

I have a hundred very geeky questions about how these high core counts work in common IT scenarios, having been a performance tuner for medium to large-scale systems over the past decade or two, and seeing how many problems there are with "noisy neighbors" on shared systems, plus or minus VMs all over the place. Makes it devilishly hard to get performance data to even analyze, when your performance can vary 2x or 10x because the neighbors are throwing a party, and with 128 neighbors seems like someone is always throwing a party.

bit_user · Sep 24, 2024

NinoPino said:
This time Intel did a great job : Phoronix

Yeah, but you have to keep in mind that those are comparing against AMD's previous generation EPYCs - not the Zen 5 they're about to launch next month. Plus, you should consider than these are using 500 W each, whereas Genoa is normally limited to just 360 W.

I did skim through the benchmarks on Phoronix. AMD still got some wins and the Geomean is probably a little closer that I would've expected. It should be easy for Turin to retake the lead, overall.

STH also has benchmarks posted:

Welcome Back Intel Xeon 6900P Reasserts Intel Server Leadership

The Intel Xeon 6900P offers twice the cores, huge memory bandwidth and more connectivity. Welcome back Intel to the top end server CPU market

www.servethehome.com

bit_user · Sep 24, 2024

atmapuri said:
The question remains, if Intel still plans to sub-license parts of the logic on these new chips as it was the plan with the previous generation. One License for AVX512, one license for Neural Engine etc... Or will this be also pricewise direct competition to AMD.

Do you mean like requiring people to pay extra to use those features? Intel did indeed roll out "Intel On Demand", but it's not at the granularity of what you're saying. Instead, it applied to more specialized accelerators.

https://www.intel.com/content/www/us/en/products/docs/ondemand/overview.html

atmapuri said:
If the product is better, this alone will not be enough, if the CPU, the platform, the memory etc.. will cost 2x more than competition.

Yeah, these look huge. Cost and power should be significant considerations. So far, I have yet to see power consumption data, so all we have to go on in their 500 W TDP.

bit_user · Sep 24, 2024

JRStern said:
I have a hundred very geeky questions about how these high core counts work in common IT scenarios, having been a performance tuner for medium to large-scale systems over the past decade or two, and seeing how many problems there are with "noisy neighbors" on shared systems, plus or minus VMs all over the place. Makes it devilishly hard to get performance data to even analyze, when your performance can vary 2x or 10x because the neighbors are throwing a party, and with 128 neighbors seems like someone is always throwing a party.

Well, they have some QoS mechanisms, which can be used to keep other cores from hogging too much L3 cache, for instance. I'm not sure if memory bandwidth is budgeted in a similar way.

AMD might have some advantages in this area, given how their L3 is segmented. If your VMs are generally small enough to fit on individual CCDs, then you'd probably tend to find EPYC scales better. However, I don't have any direct experience in matter.

thestryker · Sep 24, 2024

Seems like the extra bandwidth from the MCRDIMMs ought to give Intel an edge AMD won't be able to match this generation for appropriate workloads. I'm looking forward to seeing where Zen 5 stands out since we're finally getting matched core counts and TDP. I have no doubt there will be workloads that the new architecture just runs away with.

bit_user said:
So far, I have yet to see power consumption data, so all we have to go on in their 500 W TDP.

STH said peak estimate of 1.2kW for their 2S reference system before cooling and that the CPUs can absolutely use all 500W. They also had an Intel provided slide that showed power/performance scaling over the curve. Won't have meaningful numbers until OEM systems are out though I imagine.

edit: I did enjoy Patrick's commentary on how fussy the system was

JRStern · Sep 24, 2024

bit_user said:
Well, they have some QoS mechanisms, which can be used to keep other cores from hogging too much L3 cache, for instance.

That's interesting, should help, but going to be tricky as all to manage.
When there's 127 people between you and the buffet, QoS is difficult.

bit_user · Sep 24, 2024

JRStern said:
That's interesting, should help, but going to be tricky as all to manage.
When there's 127 people between you and the buffet, QoS is difficult.

This page gives an overview of some (all?) of Intel's QoS technologies. Given that Xen has implemented them, I would assume most other hypervisors have, as well.

https://wiki.xenproject.org/wiki/Intel_Platform_QoS_Technologies

Exactly how the hypervisor presents QoS to administrators is another matter, but you can now be aware of some of the technologies, their capabilities and limitations, that are used under the hood.

hotaru251 · Sep 25, 2024

honestly...these would need to be a good 15-20%+ better than AMD's EPYC.

AMD's are MUCH more pwoer efficient so if they cant be comparable for the huge power difference servers and all will not buy them as they cost more long term (in power & cooling).

TheHerald · Sep 25, 2024

hotaru251 said:
honestly...these would need to be a good 15-20%+ better than AMD's EPYC.

AMD's are MUCH more pwoer efficient so if they cant be comparable for the huge power difference servers and all will not buy them as they cost more long term (in power & cooling).

They are 3 to 5 times (that's 300 to 500%) faster than epyc in anything that uses AMX (mainly AI i think?)

thestryker · Sep 25, 2024

hotaru251 said:
honestly...these would need to be a good 15-20%+ better than AMD's EPYC.

AMD's are MUCH more pwoer efficient so if they cant be comparable for the huge power difference servers and all will not buy them as they cost more long term (in power & cooling).

I wouldn't be so sure about Zen 5 being more efficient in the data center. All information available is pointing towards them having a 500W TDP on the top parts as well. The down stack might be lower, but there are a lot of varying products from Intel to compete there.

bit_user · Sep 25, 2024

thestryker said:
I wouldn't be so sure about Zen 5 being more efficient in the data center. All information available is pointing towards them having a 500W TDP on the top parts as well.

Efficiency is perf/W. Those 500 W parts will be running 128 x Zen 5 cores or 192 Zen 5C cores. So, more power, but also more and wider cores (with the amount of cache scaling linearly per-core).

If we just look at TDPs, 500 W represents a 38.9% increase over the Genoa and Bergamo generation. Going from 96 -> 128 cores is 33.3% more cores. Going from 128 -> 192 cores is 50% more. So, in the case of the Zen 5 Turin, per-core performance needs to improve by just 4.2% for it to have equal efficiency as Genoa. In the case of the Zen 5C version, per-core performance can actually decrease and it'd still have the same or better efficiency than Bergamo.

We know from the Ryzen 9000 benchmarks, especially those by Phoronix, that Zen 5 is easily capable of those sorts of gains. Overall performance of the 9950X improved 17.8% over the 7950X. Efficiency increased by 22.1%.

https://www.phoronix.com/review/amd-ryzen-9950x-9900x

thestryker · Sep 25, 2024

bit_user said:
Efficiency is perf/W. Those 500 W parts will be running 128 x Zen 5 cores or 192 Zen 5C cores. So, more power, but also more and wider cores (with the amount of cache scaling linearly per-core).

The point being power and core count are at parity so until we see performance results there's no way to reasonably judge.

Stomx · Sep 25, 2024

Wow, i already forgot that Intel makes server chips

Stomx · Sep 25, 2024

Look how smarter NVIDIA is doing its business compared to our two server CPU chips makers Intel and AMD. NVIDIA packs 6-8 GPUs connected by the fast link in parallel on each baseboard. Each such board manipulated by one rarely two 32-64 core CPUs. That is typical node of supercomputer. Block of 8 GPUs is 10-20 times more powerful than CPU but it also consumes 10-20 more electrical power. Obviously packing 10x more chips on the board NVIDIA will have 10x larger profit. Who prohibits Intel and AMD do the same with these new much more powerful CPUs which now becoming really competitive 1:1 CPU:GPU in the HPC field? What, AMD afraid unlucky number 4 just to start the expansion?

bit_user · Sep 25, 2024

Stomx said:
Look how smarter NVIDIA is doing its business compared to our two server CPU chips makers Intel and AMD.

Did you know that AMD and Intel also make GPUs?

Stomx said:
NVIDIA packs 6-8 GPUs connected by the fast link in parallel on each baseboard. Each such board manipulated by one rarely two 32-64 core CPUs.

Not true of Grace-Hopper setups, where the CPUs are enmeshed in the same NVLink fabric as the GPUs.

Stomx said:
Obviously packing 10x more chips on the board NVIDIA will have 10x larger profit.

AMD did this as well.

Stomx said:
Who prohibits Intel and AMD do the same with these new much more powerful CPUs which now becoming really competitive 1:1 CPU:GPU in the HPC field?

I'm curious whether you've heard about AMD's MI300A, which combines CPU and GPU cores in the same package.

Stomx said:
What, AMD afraid unlucky number 4 just to start the expansion?

Not sure what you're even referring to, here. Zen 4 powered the Genoa, Bergamo, and Siena EPYCs, which have been very successful for them. If you include the Zen+ refresh, then it was actually their 5th generation of EPYCs and the new Turin will be Gen 6.

Sources:

JRStern · Sep 25, 2024

bit_user said:
This page gives an overview of some (all?) of Intel's QoS technologies. Given that Xen has implemented them, I would assume most other hypervisors have, as well.

https://wiki.xenproject.org/wiki/Intel_Platform_QoS_Technologies

Exactly how the hypervisor presents QoS to administrators is another matter, but you can now be aware of some of the technologies, their capabilities and limitations, that are used under the hood.

Thanks. Very interesting. Really NOT the kind of thing you really want to do by hand, though.
Not sure it should be run at the hypervisor level, or at least not only, an on-prem system with no VM should still have access to these. Even the statistics are useful, the tuning probably needs to be dynamic and automated to be really useful.

Stomx · Sep 25, 2024

bit_user said:
Did you know that AMD and Intel also make GPUs?

Not true of Grace-Hopper setups, where the CPUs are enmeshed in the same NVLink fabric as the GPUs.

AMD did this as well.

I'm curious whether you've heard about AMD's MI300A, which combines CPU and GPU cores in the same package.

Not sure what you're even referring to, here. Zen 4 powered the Genoa, Bergamo, and Siena EPYCs, which have been very successful for them. If you include the Zen+ refresh, then it was actually their 5th generation of EPYCs and the new Turin will be Gen 6.

Sources:

https://www.naddod.com/blog/nvidia-dgx-gh200-ai-supercomputer

https://www.nextplatform.com/2024/0...g-road-to-the-hybrid-cpu-gpu-instinct-mi300a/

https://www.tomshardware.com/pc-com...ms-up-to-16x-lead-over-nvidias-competing-gpus

Thanks for the picturesque response.
Looks like you completely missed my point and the basics behind it.

The point #1 is that no Intel or AMD, the main server CPU producers create purely CPU based motherboards with more than 2 CPUs on them but they could put 4, 6, 8 on them. Specifically this would be good for regular end users. And Intel or AMD do not promote the fast ways you would connect existing motherboards into single parallel system for end users. Seems they gave up to the notion or often a myth that GPUs always beat CPU in computer power and wall-plug efficiency.

The point #2 is that typical 8 GPUs on the supercomputers baseboard we tested do not give on our tasks speedups more than 5-8 compared to the CPU on the same board. These tasks are usual heavy FP64 vector and scalar simulations not the recent fashion AI. And some supercomputers are from the TOP500 list (your first picture show diagram for the typical and most powerful so far one). There are no recent CPU/GPU design incarnations there, when they will appear we will for sure see and test them too.

Point #3 is that the total electric power consumption by these GPUs on the baseboard is essentially on par with the power consumed by the same number of CPUs if they were used instead of GPU.

And seems you also missed my joke point #4 with number 4 as unlucky number in some countries why we might not see 4 processor motherboards

Intel made them years ago but not anymore

thestryker · Sep 25, 2024

Stomx said:
The point #1 is that no Intel or AMD, the main server CPU producers create purely CPU based motherboards with more than 2 CPUs on them but they could put 4, 6, 8 on them. Specifically this would be good for regular end users. And Intel or AMD do not promote the fast ways you would connect existing motherboards into single parallel system for end users. Seems they gave up to the notion or often a myth that GPUs always beat CPU in computer power and wall-plug efficiency.

Intel has been the only home for the 8S x86 server for over a decade and 4S has been dead almost as long. It's a very niche market due to the cost and lack of broad usefulness. CPUs by nature of being generalized compute can never be as efficient as anything purpose built. Intel has done custom Xeons for customers with high enough volume and continued to add accelerators to their products but these things are just enhancements on the base capability rather than pure replacements for something specialized.

Stomx said:
There are no recent CPU/GPU design incarnations there, when they will appear we will for sure see and test them too.

The June 2024 Top 500 list has:
GH200 at #6 though the CPU and GPU are separate cores despite being the same package.

MI300A at #46, 47 and 48 (These should all be El Capitan test systems I believe) which is most certainly monolithic CPU/GPU combined core.

bit_user · Sep 28, 2024

Stomx said:
Looks like you completely missed my point and the basics behind it.

Yes, I didn't find it very clear, but thanks for the additional clarifications.

Stomx said:
The point #1 is that no Intel or AMD, the main server CPU producers create purely CPU based motherboards with more than 2 CPUs on them but they could put 4, 6, 8 on them.

Like Stryker said, Intel still maintains up to 8-socket scalability, but at a considerable price premium. I think you'll find that it's generally not cost-effective to scale that high, which is why > 2-socket servers aren't popular.

It's getting seriously difficult to fit more than 2 CPUs and all their RAM on a single board. You can find some pictures of AMD server boards with 2 sockets and 48 DIMM slots. There's definitely not room for another two CPUs and their memory. Just eyeballing it, I couldn't say if there'd be enough root to fit 4 of them with only one DIMM per channel, so you might have to look at some kind of blade-base setup.

Gigabyte has a 48 DIMM 2P AMD EPYC Genoa GPU Server at SC22

At SC22, we saw a Gigabyte GPU server with 48x DDR5 DIMM slots. We also saw a STH video on display and an Ampere Altra Arm with NVIDIA server

www.servethehome.com

Keep in mind that Intel's top end CPUs have the same number of UPI channels, regardless of whether they support only dual-socket or 8-socket configurations. Even in the latest generation, it's still not enough for point-to-point connectivity for 8 CPUs. Also, if your data isn't equally distributed across the other CPUs' memory, you'll hit more inter-processor communication bottlenecks than you would in a 2-CPU configuration, where all of the UPI channels are used to link the two CPUs.

Stomx said:
Intel or AMD do not promote the fast ways you would connect existing motherboards into single parallel system for end users.

Intel had been pushing OmniPath, with versions of Skylake SP that even had it wired into the CPU package, itself. However, the market rejected OmniPath, as it was a proprietary format and required OmniPath networking gear.

As for other high-speed networking options, why do AMD and Intel need to be the ones pushing it? They're both members of the CXL consortium, and have both implemented 1.x support in Sapphire Rapids and Genoa, respectively.

Stomx said:
Seems they gave up to the notion or often a myth that GPUs always beat CPU in computer power and wall-plug efficiency.

They both have AVX-512. Does that count for nothing? Zen 5 even beefed it up, quite considerably, not only implementing at native 512-bit width, but also adding more dispatch ports.

Stomx said:
The point #2 is that typical 8 GPUs on the supercomputers baseboard we tested do not give on our tasks speedups more than 5-8 compared to the CPU on the same board. These tasks are usual heavy FP64 vector and scalar simulations not the recent fashion AI.

Well, GPUs don't only offer more raw compute - they also offer roughly an order of magnitude more memory bandwidth. So, I'd say you're in an interesting position to have a workload that's not only so heavily dependent on FP64, but also has such a high compute/memory_bandwidth ratio.

Plus, it matters specifically which GPUs you benchmarked. Compared to the A100, Nvidia's H100 made a huge leap in fp64 performance, in order to better compete with AMD, which went to full 64-bit registers and 1:1 performance vs. fp32 in CDNA.

If I had an app with poor scaling on GPUs, I'd definitely want to understand exactly why. It could just be as you say - that the task didn't use the GPUs very efficiently (branchy code? limited concurrency?), but it could also be that there's over-synchronization, excessive contention, etc. CPUs are designed to optimize those things a lot better than GPUs, with their weak memory models.

Stomx said:
And seems you also missed my joke point #4 with number 4 as unlucky number in some countries why we might not see 4 processor motherboards Intel made them years ago but not anymore

I got the cultural reference about 4 being unlucky, but it wasn't clear to me that you meant it in reference to quad-CPU configurations, rather than counting generations of EPYC CPUs, or something like that.

News Intel Launches Granite Rapids Xeon 6900P series with 120 cores — matches AMD EPYC’s core counts for the first time since 2017

Administrator

Reputable

Distinguished

Distinguished

Titan

Titan

Titan

Judicious

Distinguished

Titan

Splendid

Respectable

Judicious

Titan

Judicious

Prominent

Prominent

Titan

Distinguished

Prominent

Judicious

Titan

Share this page