News Intel's unreleased Emerald Rapids CPU impresses in leaked benchmarks — 48-core chips deliver big gains over Sapphire Rapids predecessors

Status
Not open for further replies.
It's a pretty safe bet that EMR will clock higher than SPR given that it uses refreshed cores and the XCC CPUs are two tiles instead of four. Hopefully we'll get full core details on December 14th when the launch is supposed to be.

My dream is enough cheaper w25xx than w24xx that I might be able to justify the purchase.
 
  • Like
Reactions: bit_user
It's a pretty safe bet that EMR will clock higher than SPR given that it uses refreshed cores and the XCC CPUs are two tiles instead of four. Hopefully we'll get full core details on December 14th when the launch is supposed to be.
Faster memory will also be a big win, given that it's only running on 8 channels. We saw how much HBM helped certain workloads, so there are definitely memory bottlenecks at play.

Regarding the number of tiles, are you aware of analysis of the scaling from the single-tile to the quad-tile version of SPR?
 
  • Like
Reactions: thestryker
Regarding the number of tiles, are you aware of analysis of the scaling from the single-tile to the quad-tile version of SPR?
I never saw much in the way of anything on the MCC Xeons at all really. Chips and Cheese only did their coverage on XCC as far as I know.

I know all the split controllers and the required 10 EMIB connections were a big manufacturing problem. I'm not sure there will be a performance advantage (beyond the obvious clocks/DRAM bandwidth) to the two tiles vs four, but there will be a notable improvement to the manufacturing side (mostly packaging).

For some EMR details (it could change from here, but I doubt there will be any significant differences) https://www.semianalysis.com/p/intel-emerald-rapids-backtracks-on
 
  • Like
Reactions: bit_user
Calling Emerald Rapids and Raptorlake Refresh as both being a Refresh is a big stretch isn't it?

Emerald Rapids changes tile configuration and even number of tiles while Raptorlake Refresh barely increases clocks.

@thestryker According to Semianalysis, he's hinting that the new configuration is indeed to aim for better performance. That's why they went to a bigger two-Tile setup despite costing even more to manufacture than the four Tile SPR.

While some applications will only benefit from other aspects, many other server applications will benefit from the low-level changes the changed config brings.
 
  • Like
Reactions: bit_user
Calling Emerald Rapids and Raptorlake Refresh as both being a Refresh is a big stretch isn't it?
Yeah, I was on the fence about nitpicking that point, but decided the article went into enough detail that people would hopefully understand the distinction.

Emerald Rapids changes tile configuration and even number of tiles while Raptorlake Refresh barely increases clocks.
Based on what I've seen, Raptor Refresh doesn't even deserve to be called a refresh. If it's the same silicon as before, then it's a rebrand. That's all.

As for Emerald Rapids, does it have larger caches? Or are those unchanged from Sapphire Rapids.
 
As for Emerald Rapids, does it have larger caches? Or are those unchanged from Sapphire Rapids.
Cache varies by SKU, but the top end EMR has much more L3 cache than anything on SPR (6.25MB/core vs 2.625MB/core).
@thestryker According to Semianalysis, he's hinting that the new configuration is indeed to aim for better performance. That's why they went to a bigger two-Tile setup despite costing even more to manufacture than the four Tile SPR.
The wafer cost is higher, but packaging cost is much lower.
 
Cache varies by SKU, but the top end EMR has much more L3 cache than anything on SPR (6.25MB/core vs 2.625MB/core).
Nice!

I'll bet folks at Intel have a little uptick in their blood pressure every time they see claims about AMD's large L3 cache figures, even though it's segmented and not unified like Intel's L3 caches. I think AMD's is probably more like a L2.5 cache, on their multi-CCD CPUs?

The wafer cost is higher, but packaging cost is much lower.
Relatively speaking, but if it costs Intel that much more to put 4 tiles per package than 2, it seems like they haven't mastered Foveros quite like they've been portraying.
 
  • Like
Reactions: thestryker
Uh, 10 for the Xeon Max models, with HBM? Otherwise, I don't see how you get to 10.
Nope, the Max are 14.
10x EMIB on Sapphire Rapids

Sapphire Rapids is going to be using four tiles connected with 10 EMIB connections using a 55-micron connection pitch. Normally you might think that a 2x2 array of tiles would need equal EMIBs per tile-to-tile connection, so in this case with 2 EMIBs per connection, that would be eight – why is Intel quoting 10 here? That comes down to the way Sapphire Rapids is designed.

Because Intel wants SPR to look monolithic to every operating system, Intel has essentially cut its inter-core mesh horizontally and vertically. That way each connection through the EMIB is seen purely as the next step on the mesh. But Intel’s monolithic designs are not symmetric in either of those dimensions – usually features like the PCIe or QPI are on the edges, and not in the same place in every corner. Intel has told us that in Sapphire Rapids, this is similarly the case, and one dimension is using 3 EMIBs per connection while the other dimension is using 2 EMIBs per connection.
 
  • Like
Reactions: bit_user
Why is connection-count the dominant cost factor, rather than the actual number contact balls/pads or the number of chiplets?
There hasn't been enough information put out publicly (probably because TSMC hasn't fielded their version yet), but from what has the predominant problematic part is that the EMIB bridge has to be embedded in the substrate. There's some failure percentage when you connect the chips to the bridge as well and while it is likely extremely small the failure rate is multiplied with number of connections.

You're also absolutely right about number of chiplets tiles (edit: I should use Intel terminology here) and contacts having a part to play in packaging costs as well and in theory these costs should be lower.
 
Last edited:
  • Like
Reactions: bit_user
"These numbers won't make Emerald Rapids an AMD EPYC Genoa killer (or even tied with Genoa if we're being realistic)"

SPR-HBM is the Genoa killer for AI inference processing.

Similar story for the SPR-EE for vRAN.
 
  • Like
Reactions: rtoaht
Nice!

I'll bet folks at Intel have a little uptick in their blood pressure every time they see claims about AMD's large L3 cache figures, even though it's segmented and not unified like Intel's L3 caches. I think AMD's is probably more like a L2.5 cache, on their multi-CCD CPUs?
Sapphire Rapids is 1.875MB/core, while Emerald Rapids is 5MB/core.


Also since Skylake it hasn't been unified for L3. It's L1/L2 + L3.

For being on the same process it's a very decent improvement. Only downside is the basis for improvement is Sapphire Rapids. From what I hear the development time was very short so the team is doing much better. With Sierra Forest and Granite Rapids they will be in a much better position.

Sapphire Rapids in addition to making it very complex Intel fired most of the validation team back in the Kraznich era, hence the troubles.
 
  • Like
Reactions: bit_user
"is pure compute code (number crunching) that only works on x86 cores and can't be ported to arm or gpu..."

perhaps you need to do a google search on oneAPI.
OneAPI takes code and compiles it for the most relevant hardware, be it cpu,gpu,fpga, or any other, or all of them. It confirms my point, it exists because most heavily parallel software doesn't run as well on x86 cores as it does on anything else.
OneAPI is the exact opposite of something only working on x86.
 
Status
Not open for further replies.