News Intel Details Sierra Forest and Granite Rapids Architecture, Xeon Roadmap

Status
Not open for further replies.

dehjomz

Distinguished
Dec 31, 2007
73
51
18,610
Given Sapphire Rapids vastly delayed release and it's sketchy availability to average Mom & Pop Small Business retailers, I find Intel's timeline to be vastly optimistic.
Apparently intel already has the first stepping of arrow lake intel 20a working in the labs. I guess we’ll see if they deliver the new Xeons and the new lakes on time next year.
 
What’s the difference, if any, between the new Xeon Sierra-Glen E-cores and the new Crestmont E-cores in meteor lake?
I haven't seen anything that indicates there are any differences this go around, but hopefully there will be further clarification. The Xeon E-cores are going to be available in a 2 core cluster which I imagine the consumer chip versions will not.
 

bit_user

Polypheme
Ambassador
What’s the difference, if any, between the new Xeon Sierra-Glen E-cores and the new Crestmont E-cores in meteor lake?
This is just a wild guess, but I wonder if the main difference isn't the process node. Meteor Lake is being made on Intel 4, while Sierra Forest will be made on Intel 3. I can't think of an example where Intel changed the process node and didn't also call the core something different.

The Xeon E-cores are going to be available in a 2 core cluster which I imagine the consumer chip versions will not.
Their slide on that suggest they achieve this by simply disabling two of the four. Perhaps the ability to have partially-disabled clusters was done for yield reasons?

C4J8vmPk2z4dEK7tLW9cX.jpg


It's certainly possible that Arrow Lake could adopt the same technique. That would scale down much better, if you could reduce E-core count by de-populating the cores within clusters, rather than having to switch off entire clusters.
 
Last edited:

DaveLTX

Prominent
Aug 14, 2022
99
64
610
Intel already has memory controllers on the same chiplet as the CPU cores but unfortunately inter mesh communication isn't good.
Meshes just aren't great at latency, as compared to AMD's chiplets that uses ring buses
And besides, VM workloads reside on the private chiplet caches or out to DRAM, not go looking for caches in other cores.
 
Their slide on that suggest they achieve this by simply disabling two of the four. Perhaps the ability to have partially-disabled clusters was done for yield reasons?
They also spoke of the E-core based Xeons having a full list of SKUs instead of just high core count. It's possible this is just how they're doing lower core count models while using the same die across every SKU.
 
Last edited:

bit_user

Polypheme
Ambassador
Intel already has memory controllers on the same chiplet as the CPU cores but unfortunately inter mesh communication isn't good.
Meshes just aren't great at latency, as compared to AMD's chiplets that uses ring buses
The last server CPU Anandtech reviewed was Ice Lake, however it gives us a chance to compare a fairly recent mesh vs. Milan (Zen 3)'s interconnect topology.

If you click on these images and look at the numbers written in the cells, you can see that core-to-core communication latency is markedly better in Ice Lake.

Ice Lake SP 8380
bounce-8380.png


EPYC Milan 7763
Bounce-7763.png


Now, here's how they compare on memory latency:

Ice Lake SP 8380
latency-8380-snc1.png


EPYC Milan 7763
latency-7763-nps1.png


Source: https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-scalable-review/4

DRAM latency is definitely better in Ice Lake. Furthermore, in spite of having much more L3 cache, overall, Milan blows out of it sooner. That's because a chiplet can only populate its local L3 slice.

And besides, VM workloads reside on the private chiplet caches or out to DRAM, not go looking for caches in other cores.
The point of Intel's shared L3 approach is that you can get more flexible sharing of L3 across the cores. In the best case, that could enable you to get more benefit from the same amount of L3 as in EPYC.

As for the part about "looking for caches in other cores", cache-coherency demands that all caches be checked, when you have a cache miss in a given slice of L3.
 
Status
Not open for further replies.