News Intel to Expand Arrow Lake's L2 Cache Capacity

Admin · Aug 13, 2023

Increased L2 size of Arrow Lake may lead to performance increases in memory bandwidth-demanding applications.

Intel to Expand Arrow Lake's L2 Cache Capacity : Read more

bit_user · Aug 13, 2023

Keeping in mind that Arrow Lake CPUs will be made on Intel's 20A (2nm-class) fabrication process, the company might increase the size of all caches as it may not have a significant impact on die size and cost.

Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.

TerryLaze · Aug 13, 2023

bit_user said:
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.

Why did you feel compelled to write this up?!
"As it may not have a significant impact" is not a synonym for free,
so they didn't assume that it would be free and you just said that you would assume it to be free either, so what's your point here!?

bit_user · Aug 13, 2023

TerryLaze said:
Why did you feel compelled to write this up?!
"As it may not have a significant impact" is not a synonym for free,
so they didn't assume that it would be free and you just said that you would assume it to be free either, so what's your point here!?

The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.

TSMC's 3nm Node: No SRAM Scaling Implies More Expensive CPUs and GPUs

Big problems from tiny memory cells.

www.tomshardware.com

If you have said scaling data for Intel's 20A node, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.

TerryLaze · Aug 13, 2023

bit_user said:
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node.

If you have such data, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.

Diminishing means to make less not to make non existent, it still has nothing to do with free.

InvalidError · Aug 13, 2023

bit_user said:
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.

Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.

bit_user · Aug 13, 2023

InvalidError said:
Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.

Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that. These are effectively sales pitches by IFS, to try and lure fab customers. Such an increase in SRAM density would be a big selling-point.

Or, are you simply pointing out that they could do this sort of thing in some future node or iteration?

InvalidError · Aug 13, 2023

bit_user said:
Or, are you simply pointing out that they could do this sort of thing in some future node or iteration?

If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.

bit_user · Aug 13, 2023

InvalidError said:
If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.

Sounds cool, but I think:

The fab node would need to actually incorporate the extra layers to do it. More layers increases cost and shouldn't help yields, either.
They would probably need to provide these cells in their cell library.

Which is to say I think it won't sneak up as some sort of Easter egg, but rather will probably come along as an iterative refinement on one of their GAA nodes.

ThisIsMe · Aug 14, 2023

Even if there is no significant increase in sram density on the new process node there will likely still be some improvements. There will also be significant improvements in logic density likely freeing up die area, so a mere 50% increase in L2 cache size is plausible. Regardless, the increase would still be enabled by density improvements.

As for the other concerns, the additional performance benefits of the newer node may also allow for an L2 cache performance boost even after the performance costs of the size increase. It’s of course all speculation, but it’s hard to imagine intel backtracking on cache performance.

Nicholas Steel · Aug 14, 2023

Meanwhile, it is unclear whether Intel also plans to expand the size of the L3 cache of Arrow Lake's performance cores.

I believe you meant Efficiency Cores in this sentence.

ToBeGood · Aug 15, 2023

bit_user said:
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.

TSMC's 3nm Node: No SRAM Scaling Implies More Expensive CPUs and GPUs

Big problems from tiny memory cells.

www.tomshardware.com

If you have said scaling data for Intel's 20A node, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.

Yes we do have some estimate data for you sake.

Intel 4 Backside Power Delivery is 95% Cell Utilization Rate.

"The density of these cells is also quite impressive. By moving to backside power delivery, Intel was able to utilize 95% of the space within one of the denser spots within the E-core cell. Unfortunately, Intel didn’t give comparable numbers for E-cores on Intel 4, but in general, the utilization is not quite that high."

Intel Details PowerVia Chipmaking Tech: Backside Power Performing Well, On Schedule For 2024

www.anandtech.com

TerryLaze · Aug 15, 2023

bit_user said:
Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that.

Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.
Intel first used foveros with lakefield and as far as desktop CPUs go
meteor lake would have been the first but now it will probably be arrow lake.

Intel Details 3D Chip Packaging Tech for Meteor Lake, Arrow Lake and Lunar Lake

Scaling silicon into the third dimension.

www.tomshardware.com

bit_user · Aug 15, 2023

ToBeGood said:
Yes we do have some estimate data for you sake.

Intel 4 Backside Power Delivery is 95% Cell Utilization Rate.

"The density of these cells is also quite impressive. By moving to backside power delivery, Intel was able to utilize 95% of the space within one of the denser spots within the E-core cell. Unfortunately, Intel didn’t give comparable numbers for E-cores on Intel 4, but in general, the utilization is not quite that high."

Thanks, but that's not useful to me without a point of comparison.
: /

BTW, welcome! Thanks for contributing!
: )

bit_user · Aug 15, 2023

TerryLaze said:
Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.

They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.

Instead, what I'd expect is die-stacking to be used for migrating lower-bandwidth stuff (like L3), in order to free up room for more L2 cache on the compute die.

TerryLaze · Aug 15, 2023

bit_user said:
They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.

Instead, what I'd expect is die-stacking to be used for migrating lower-bandwidth stuff (like L3), in order to free up room for more L2 cache on the compute die.

Stacking CMOS ribbons might not even be part of foveros and be an much easier process, I'm just saying that I don't think that it would be a node specific announcement, just a we have this procedure now and will use it everywhere, like they did with foveros.

bit_user · Aug 15, 2023

TerryLaze said:
Stacking CMOS ribbons might not even be part of foveros

It's not. Foveros is die-stacking, but @InvalidError is talking about stacking transistors within the die.

TerryLaze said:
be an much easier process,

Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).

To get enough benefit from it to justify the extra cost and complexity, you'd probably want to look at using it for things besides SRAM. That would have implications for designers and layout algorithms and require support from EDA tools. In other words, a non-trivial enhancement.

InvalidError · Aug 15, 2023

bit_user said:
Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).

FinFET was not trivial, until it was. TSVs were not trivial until they were, now everyone and their dog are planning to use it for everything. The same will most likely happen to GAA and CMOS stacking is nothing more than extra GAA layers.

If you are already going to do four GAA layers to increase drive strength on high-fanout signals, there isn't much stopping you from doing a 2-2 split for stacked-CMOS SRAM cells or even 1-1-1-1 to stack two bits. For general logic, the same layer splits could be used to implement most two-inputs logic gates.

Updating layout and design tools for this should be relatively trivial. The main hurdle with implementing stacked CMOS beyond SRAM is power density: we're already near practical limits with FinFET and stacked GAA CMOS could easily increase that by 2-3X.

bit_user · Aug 15, 2023

InvalidError said:
GAA and CMOS stacking is nothing more than extra GAA layers.

If you are already going to do four GAA layers to increase drive strength on high-fanout signals, there isn't much stopping you from doing a 2-2 split for stacked-CMOS SRAM cells or even 1-1-1-1 to stack two bits. For general logic, the same layer splits could be used to implement most two-inputs logic gates.

What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?

InvalidError · Aug 15, 2023

bit_user said:
What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?

Every ribbon is surrounded by gate oxide and gate metal, the channels should be effectively shielded from each other. If anything besides long parallel traces might require separation for crosstalk reasons, it would be the gates. Though I doubt it would be any worse than fin-FET. For simple yet essential things like inverters (half of a SRAM bit cell) where N-MOS and P-MOS can share a single gate structure, performance should improve in just about every measurable way.

News Intel to Expand Arrow Lake's L2 Cache Capacity

Administrator

Titan

Titan

Titan

Titan

Titan

Titan

Titan

Titan

Distinguished

Distinguished

Prominent

Titan

Titan

Titan

Titan

Titan

Titan

Titan

Titan

Share this page