News Intel Granite Rapids CPU breaks cover with 120 P-cores in new benchmark — Xeon 6900P wields 120 cores, 240 threads, and 744MB cache

Status
Not open for further replies.
The Xeon 6900P also wields 744MB of cache, distributed in a 240MB L2 cache and 504MB L3 cache. It's a significant upgrade over Intel's previous Emerald Rapids processors. In a dual-socket configuration, we're looking at 240 cores, 480 threads, and close to 1.5TB cache.
I'm guessing the 1.5TB cache is a typo and should be 1.5GB.
 
  • Like
Reactions: bit_user
So Intel's next gen server CPU has less cores than AMD's current offering and is probably going to cost significantly more.
Golden Cove has better IPC than Zen 4. I'm not sure how they compare on instructions per Watt, however.

Anyway, Zen 5 appears to match or surpass Golden Cove (and presumably Redwood Cove) in IPC, which will make it a very interesting matchup between Granite Rapids and Turin (Zen 5).

Granite Rapids also has AMX and some accelerators that EPYC doesn't. Finally, a version of Granite Rapids could also appear with on-package HBM.

Now, I'm not saying AMD won't continue to capture more of Intel's datacenter market share, in this new generation. However, I think these are unquestionably the most competitive server CPUs Intel has had against AMD, since Rome (Zen 2; circa 2019) and its 64 cores bested Cascade Lake and its 28 cores.
 
Golden Cove has better IPC than Zen 4. I'm not sure how they compare on instructions per Watt, however.

Anyway, Zen 5 appears to match or surpass Golden Cove (and presumably Redwood Cove) in IPC, which will make it a very interesting matchup between Granite Rapids and Turin (Zen 5).

Granite Rapids also has AMX and some accelerators that EPYC doesn't. Finally, a version of Granite Rapids could also appear with on-package HBM.

Now, I'm not saying AMD won't continue to capture more of Intel's datacenter market share, in this new generation. However, I think these are unquestionably the most competitive server CPUs Intel has had against AMD, since Rome (Zen 2; circa 2019) and its 64 cores bested Cascade Lake and its 28 cores.
I agree with your analysis. Previous offerings from Intel were laughably bad. These at least seem to be able to (somewhat) compete, although I would wait for power draw information first. Intel has been throwing punches with AMD's consumer line but draws MUCH more power, so I wouldn't be surprised to see 60, 70 or even 100% more power being drawn compared to Zen 5 for speedups that just aren't worth the electricity and cooling cost for a datacenter
 
I agree with your analysis. Previous offerings from Intel were laughably bad. These at least seem to be able to (somewhat) compete, although I would wait for power draw information first. Intel has been throwing punches with AMD's consumer line but draws MUCH more power, so I wouldn't be surprised to see 60, 70 or even 100% more power being drawn compared to Zen 5 for speedups that just aren't worth the electricity and cooling cost for a datacenter
What process node will be used for these CPUs ?
 
Granite Rapids also has AMX and some accelerators that EPYC doesn't
With previous iteration of Server CPUs Intel only enabled those accelerators, if additional licenses were purchased. They were not enabled as is. Intel will only be able to compete with AMD on equal terms. Namely, if they actually also deliver GraniteRapids CPUs with all the bells and whistles (accelerators) enabled.
 
These at least seem to be able to (somewhat) compete, although I would wait for power draw information first. Intel has been throwing punches with AMD's consumer line but draws MUCH more power, so I wouldn't be surprised to see 60, 70 or even 100% more power being drawn compared to Zen 5 for speedups that just aren't worth the electricity and cooling cost for a datacenter
Well, this article quotes 500W, which aligns pretty well with what's been leaked about AMD's Turin:
So, perhaps it will be quite a competitive match up?

One difference seems to be that Intel will support MCR DRAM, while AMD won't. However, I think AMD is said to be supporting up to 16 channels, while Intel stays at just 12?
 
  • Like
Reactions: thestryker
What process node will be used for these CPUs ?
Don't quote me on that, but I *think* they might be using TSMC's 3nm process. Not that it matters anyway, gate width has not been a useful metric for performance of silicon products for a long, long time.
 
Well, this article quotes 500W, which aligns pretty well with what's been leaked about AMD's Turin:
So, perhaps it will be quite a competitive match up?

One difference seems to be that Intel will support MCR DRAM, while AMD won't. However, I think AMD is said to be supporting up to 16 channels, while Intel stays at just 12?
If the match up were competitive, that'd be great. I certainly wish so, it would just be better for customers at the end of the day. Not holding my breath, though.
 
So Intel's next gen server CPU has less cores than AMD's current offering and is probably going to cost significantly more.
Intel's P-core line will match Zen 5 at 128 cores and the E-core line goes up to 288 which should be more of a match for Zen 5c. Costs are going to vary immensely based on who's doing the buying so I wouldn't count on either company being inherently cheaper.
One difference seems to be that Intel will support MCR DRAM, while AMD won't. However, I think AMD is said to be supporting up to 16 channels, while Intel stays at just 12?
Still 12 channel from anything out there, but AMD has been pretty quiet overall so who knows: https://www.anandtech.com/show/2142...-processors-up-to-192-cores-coming-in-h2-2024
What process node will be used for these CPUs ?
It's hard to say for sure what will be the case in the end, but N3 has been mentioned in conjunction with Zen 5c (see above link) and consumer Zen 5 parts are N4.
 
  • Like
Reactions: bit_user
So in going from a 1,021 single-core score to 7,155 for 120 cores what is the limiting factor that only leads to a 7x improvement for 120 times more cores?
 
  • Like
Reactions: bit_user
So in going from a 1,021 single-core score to 7,155 for 120 cores what is the limiting factor that only leads to a 7x improvement for 120 times more cores?
I think GeekBench MT is known not to scale well. I'm looking at Ryzen 9 9950X scores, right now, and the ST scores are around 3400, while the MT scores are around 21400. That's a ratio of 6.3x for 32/16 threads/cores. Obviously, the MT clocks are going to be lower, but you'd expect at least something in the double digits, for code that scales reasonably well.

So, just going on pure conjecture, what I'd say is the Geekbench developers decided to make the MT benchmark stress various performance aspects that can be bottlenecks for MT performance. This would include things like inter-core and cross-NUMA communication. If that were true, then you'd be looking at a dual-CPU setup having to do a lot of communication between sockets, which could not only be a bottleneck but also just add a lot of latency to memory accesses.

To check this assumption, I thought I'd look up some other leading dual-CPU configurations. One I checked was the Xeon 8480+, which is a 56-core flagship model of the Sapphire Rapids generation (similar to Alder/Raptor Lake). The ST scores were around 1650, while MT were around 16500. So, that's only 10x scaling for 224/112 threads/cores.

Next, let's see what happens when we restrict it to a single CPU configuration. Since I didn't find entries for the Xeon 8480 with less than 112 cores that didn't have other weirdness, I decided to search for the Xeon W9-3495X, which is essentially the same CPU, but running at a higher clockspeed and limited to a single socket. In this case (also using Linux as the OS), I found one with ST scores of 2542 and MT scores of like 21364. I should note that a bunch of Dell 7960 workstations came in with lower scores, often dropping the MT score by more than half. This leads me to believe that many of these Dell systems are shipping with partially-populated memory channels, which is something I've seen them do, on machines of this class. That's an interesting datapoint, since it suggests that the MT score might indeed involve a lot of memory-based communication between threads.
 
  • Like
Reactions: slightnitpick
Status
Not open for further replies.