News Intel Launches 144-core 'Sierra Forrest' Xeon 6 CPUs, Granite Rapids Follows in Q3

These high core count E-core based CPUs seem really good for cloud services especially for circumstances where they are already disabling SMT. With how well they seem to perform already it's going to be really interesting to see what Clearwater Forest ends up looking like (and whether or not Darkmont is much different than Skymont). STH has their initial Intel sponsored look done, but they did also get a retail sample to verify results with: https://www.servethehome.com/intel-xeon-6-6700e-sierra-forest-shatters-xeon-expectations/

edit:
Phoronix testing with the Intel reference system:
https://www.phoronix.com/review/intel-xeon-6780e-6766e

removed fixed bits:

Some incorrect stuff noted from article:
The P-core Granite Rapids models, which can have up to 86 P-cores, will launch in Q3 of 2024.
This is the Q1 2025 6700P SKU series, the 128 P-core 6900P is the one launching in Q3 2024.
 
Last edited:

jp7189

Distinguished
Feb 21, 2012
387
220
19,060
These aren't the atom cores of 10 years ago. These actually have about the same per clock performance as Broadwell CPUs.
..and still Intel marketing pulled out all the stops. Those charts compare their brand new 128 core chip against AMD's 2+ year old 64 core chip, and it barely wins on a few hand picked benchmarks. We'll have to wait for independent test before we know for sure, but it's looking like AMD is going to dominate on this gen...again.
 
..and still Intel marketing pulled out all the stops. Those charts compare their brand new 128 core chip against AMD's 2+ year old 64 core chip, and it barely wins on a few hand picked benchmarks. We'll have to wait for independent test before we know for sure, but it's looking like AMD is going to dominate on this gen...again.
Servethehome already did a look at the new E core server.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
Eek! Whoever downsampled the "original" versions used nearest-neighbor sampling and a scaling ratio that makes some of the text virtually unreadable!

Look at the headings in this chart. In some of them, you can barely make out the model number of the Xeon being used, because some the horizontal strokes are entirely skipped by the sampling!

eWmobdZvxRXFskyM82RWz.jpg

 

bit_user

Titan
Ambassador
After seeing the benchmarks on Phoronix, I think Sierra Forest effectively answered the threat from Bergamo (Zen 4c), but I doubt it's up to the challenge it's about to face from Zen 5c. At least the matchup shouldn't be as lopsided as when Sapphire Rapids went up against Genoa (Zen 4) and Bergamo.

The real test is one we can't directly run - how well it compares against the latest crop of ARM cloud-based server CPUs, like Graviton 4. This set of benchmarks attempts to put a stake in the ground, by including a 128-core Ampere Altra CPU, which is now circling the drain of obsolescence. Even so, Altra manages to notch a couple wins on efficiency, and places quite well in some other efficiency rankings.

Also, I should point out that the lowly EPYC 8534PN (Siena; small Zen 4c) scores 15% better on efficiency (i.e. Geomean vs. average power) than the best Sierra Forest entrant (Xeon 6766E).
 
Last edited:

peachpuff

Reputable
BANNED
Apr 6, 2021
690
733
5,760
Eek! Whoever downsampled the "original" versions used nearest-neighbor sampling and a scaling ratio that makes some of the text virtually unreadable!

Look at the headings in this chart. In some of them, you can barely make out the model number of the Xeon being used, because some the horizontal strokes are entirely skipped by the sampling!
eWmobdZvxRXFskyM82RWz.jpg
Looks like cga color palette 😂
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
cloud services especially for circumstances where they are already disabling SMT.
Not sure about that. First of all, hypervisors provide the option to avoid simultaneously scheduling two different VMs on the same physical core. As if that wasn't enough the Linux Kernel now has an option called "core scheduling", which does the same thing at the process level. This allows SMT to be restricted to cases that should already be "safe" (i.e. where paired threads are already in the same memory space). Google was the main contributor behind the "core scheduling" patch. If the Cloudy people were really so negative on SMT, then it'd be awfully surprising for Intel to have kept it in the server version of Lion Cove.

STH has their initial Intel sponsored look done, but they did also get a retail sample to verify results with
Nice!
 
  • Like
Reactions: NinoPino
Not sure about that. First of all, hypervisors provide the option to avoid simultaneously scheduling two different VMs on the same physical core. As if that wasn't enough the Linux Kernel now has an option called "core scheduling", which does the same thing at the process level. This allows SMT to be restricted to cases that should already be "safe" (i.e. where paired threads are already in the same memory space). Google was the main contributor behind the "core scheduling" patch. If the Cloudy people were really so negative on SMT, then it'd be awfully surprising for Intel to have kept it in the server version of Lion Cove.
None of the custom silicon I've seen for cloud providers has SMT (certainly doesn't mean there isn't though) which would imply optimizations done were more because that's the hardware they were using rather than something they were looking for. As for Lion Cove it has to cover a much more broad market and use cases. I think STH is also right on the money about per core software licenses automatically making SMT attractive for a certain chunk of the enterprise market.

It'll be interesting to see if Intel's gamble on more cores at lower performance pays off. Of course based on what Intel has said already regarding Skymont CWF ought to have very impressive levels of performance to go with its core count. If there's a turning point to be had that will likely be it.
 

NinoPino

Respectable
May 26, 2022
279
177
1,860
..and still Intel marketing pulled out all the stops. Those charts compare their brand new 128 core chip against AMD's 2+ year old 64 core chip, and it barely wins on a few hand picked benchmarks. We'll have to wait for independent test before we know for sure, but it's looking like AMD is going to dominate on this gen...again.
As already written Phoronix have some good benchmarks.
This time Intel did a good job.
 
  • Like
Reactions: jp7189

jp7189

Distinguished
Feb 21, 2012
387
220
19,060
As already written Phoronix have some good benchmarks.
This time Intel did a good job.
Thanks for that. The Phoronix coverage is what i was looking for. I read the STH review and it was a bit weird. I didn't disagree with their talk of traditional benchmarks not matching the use case of these chips, but it also felt like they were trying to convince me that if I look at it just right and at the right angle while squinting... instead of just giving me the raw apples to apples benchmarks against an appropriate range of competing chips. Phoronix did a great job of that.
 
Thanks for that. The Phoronix coverage is what i was looking for. I read the STH review and it was a bit weird. I didn't disagree with their talk of traditional benchmarks not matching the use case of these chips, but it also felt like they were trying to convince me that if I look at it just right and at the right angle while squinting... instead of just giving me the raw apples to apples benchmarks against an appropriate range of competing chips. Phoronix did a great job of that.
As a VMware Admin for a small cloud hosting providers I 100% agree with STH saying that traditional benchmarking isn't right for these massive core count chips. Both Phoronix and STH did some traditional testing with these as physical appliances. However, almost no one runs even their biggest DBs as physical appliances anymore. You still virtualize the server for access to the added reliability that virtualization provides and the shared storage. Overall MariaDB isn't going to be a good use of 128 physical cores. If your DB is so large that it needs that many cores it is probably better being split into multiple DBs each doing something different. Therefore you would have 4 MariaDB VMs running at the same time and each having 32 CPUs.
 

NinoPino

Respectable
May 26, 2022
279
177
1,860
As a VMware Admin for a small cloud hosting providers I 100% agree with STH saying that traditional benchmarking isn't right for these massive core count chips. Both Phoronix and STH did some traditional testing with these as physical appliances. However, almost no one runs even their biggest DBs as physical appliances anymore. You still virtualize the server for access to the added reliability that virtualization provides and the shared storage. Overall MariaDB isn't going to be a good use of 128 physical cores. If your DB is so large that it needs that many cores it is probably better being split into multiple DBs each doing something different. Therefore you would have 4 MariaDB VMs running at the same time and each having 32 CPUs.
In my opinion it is only a matter of time. CPUs with so many cores from AMD and Intel are just arrived and there is need to setup new benchmarks to aim the right target.
Till now reviewers focused more on HPC workloads while the workloads targeted by this CPUs are typically those previously accomplished by a whole datacenter.
 
In my opinion it is only a matter of time. CPUs with so many cores from AMD and Intel are just arrived and there is need to setup new benchmarks to aim the right target.
Till now reviewers focused more on HPC workloads while the workloads targeted by this CPUs are typically those previously accomplished by a whole datacenter.
Exactly. You aren't going to use Bergamo for HPC but it is great for consolidating 4 or 5 2nd Gen Xeon Scaleable servers into a single server.
 

jp7189

Distinguished
Feb 21, 2012
387
220
19,060
As a VMware Admin for a small cloud hosting providers I 100% agree with STH saying that traditional benchmarking isn't right for these massive core count chips. Both Phoronix and STH did some traditional testing with these as physical appliances. However, almost no one runs even their biggest DBs as physical appliances anymore. You still virtualize the server for access to the added reliability that virtualization provides and the shared storage. Overall MariaDB isn't going to be a good use of 128 physical cores. If your DB is so large that it needs that many cores it is probably better being split into multiple DBs each doing something different. Therefore you would have 4 MariaDB VMs running at the same time and each having 32 CPUs.
I don't disagree with the point they and you are making. I just want straight benchmarks. Maybe you wouldn't run mariadb on all cores, but how better to show relative performance between chips.

To your specific use case, if i knew a 96 core chip performed better over than a 144 core chip, I'd take the 96 cores and overscribe the VMs. On servers with 96+ cores you'll have tons of VMs and not all will be cranking at the same time, but less, faster cores are cheaper to operate. Have you seen VMware license costs since the Broadcom buyout? Yikes!
 
Last edited:
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
I don't disagree with the point they and you are making. I just want straight benchmarks. Maybe you wouldn't run mariadb on all cores, but how better to show relative performance between chips.
Being a CPU geek, I like to see how well a CPU copes with issues like global communication. So, I'll never want to see people stop running scalable benchmarks that use all the cores.

That said, if a more realistic usage pattern would be to configure the CPU into clusters and schedule VMs that comfortably fit within those clusters, then we really ought to focus on how it performs in such cases. In particular, EPYC should do best when the VMs fit within individual chiplets, since that essentially gives them a private L3 cache.
 
I don't disagree with the point they and you are making. I just want straight benchmarks. Maybe you wouldn't run mariadb on all cores, but how better to show relative performance between chips.

To your specific use case, if i knew a 96 core chip performed better over than a 144 core chip, I'd take the 96 cores and overscribe the VMs. On servers with 96+ cores you'll have tons of VMs and not all will be cranking at the same time, but less, faster cores are cheaper to operate. Have you seen VMware license costs since the Broadcom buyout? Yikes!
I know all to well about the license cost now. It is forcing us off of VMware. We were looking at about $250k just for 13 hosts. IIRC that was also per year as they got rid of the perpetual licenses.
 
  • Like
Reactions: jp7189 and bit_user