News Intel to Expand Arrow Lake's L2 Cache Capacity

Status
Not open for further replies.

bit_user

Titan
Ambassador
Keeping in mind that Arrow Lake CPUs will be made on Intel's 20A (2nm-class) fabrication process, the company might increase the size of all caches as it may not have a significant impact on die size and cost.
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.
 
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.
Why did you feel compelled to write this up?!
"As it may not have a significant impact" is not a synonym for free,
so they didn't assume that it would be free and you just said that you would assume it to be free either, so what's your point here!?
 

bit_user

Titan
Ambassador
Why did you feel compelled to write this up?!
"As it may not have a significant impact" is not a synonym for free,
so they didn't assume that it would be free and you just said that you would assume it to be free either, so what's your point here!?
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.

If you have said scaling data for Intel's 20A node, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.
 
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node.

If you have such data, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.
Diminishing means to make less not to make non existent, it still has nothing to do with free.
 

InvalidError

Titan
Moderator
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.
Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.
 

bit_user

Titan
Ambassador
Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.
Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that. These are effectively sales pitches by IFS, to try and lure fab customers. Such an increase in SRAM density would be a big selling-point.

Or, are you simply pointing out that they could do this sort of thing in some future node or iteration?
 

InvalidError

Titan
Moderator
Or, are you simply pointing out that they could do this sort of thing in some future node or iteration?
If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.
 

bit_user

Titan
Ambassador
If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.
Sounds cool, but I think:
  1. The fab node would need to actually incorporate the extra layers to do it. More layers increases cost and shouldn't help yields, either.
  2. They would probably need to provide these cells in their cell library.

Which is to say I think it won't sneak up as some sort of Easter egg, but rather will probably come along as an iterative refinement on one of their GAA nodes.
 

ThisIsMe

Distinguished
May 15, 2009
197
51
18,710
Even if there is no significant increase in sram density on the new process node there will likely still be some improvements. There will also be significant improvements in logic density likely freeing up die area, so a mere 50% increase in L2 cache size is plausible. Regardless, the increase would still be enabled by density improvements.

As for the other concerns, the additional performance benefits of the newer node may also allow for an L2 cache performance boost even after the performance costs of the size increase. It’s of course all speculation, but it’s hard to imagine intel backtracking on cache performance.
 
  • Like
Reactions: bit_user

ToBeGood

Prominent
Aug 15, 2023
13
1
515
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.

If you have said scaling data for Intel's 20A node, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.
Yes we do have some estimate data for you sake.

Intel 4 Backside Power Delivery is 95% Cell Utilization Rate.

"The density of these cells is also quite impressive. By moving to backside power delivery, Intel was able to utilize 95% of the space within one of the denser spots within the E-core cell. Unfortunately, Intel didn’t give comparable numbers for E-cores on Intel 4, but in general, the utilization is not quite that high."

 
Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that.
Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.
Intel first used foveros with lakefield and as far as desktop CPUs go
meteor lake would have been the first but now it will probably be arrow lake.


 

bit_user

Titan
Ambassador
Yes we do have some estimate data for you sake.

Intel 4 Backside Power Delivery is 95% Cell Utilization Rate.

"The density of these cells is also quite impressive. By moving to backside power delivery, Intel was able to utilize 95% of the space within one of the denser spots within the E-core cell. Unfortunately, Intel didn’t give comparable numbers for E-cores on Intel 4, but in general, the utilization is not quite that high."
Thanks, but that's not useful to me without a point of comparison.
: /

BTW, welcome! Thanks for contributing!
: )
 

bit_user

Titan
Ambassador
Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.
They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.

Instead, what I'd expect is die-stacking to be used for migrating lower-bandwidth stuff (like L3), in order to free up room for more L2 cache on the compute die.
 
They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.

Instead, what I'd expect is die-stacking to be used for migrating lower-bandwidth stuff (like L3), in order to free up room for more L2 cache on the compute die.
Stacking CMOS ribbons might not even be part of foveros and be an much easier process, I'm just saying that I don't think that it would be a node specific announcement, just a we have this procedure now and will use it everywhere, like they did with foveros.
 

bit_user

Titan
Ambassador
Stacking CMOS ribbons might not even be part of foveros
It's not. Foveros is die-stacking, but @InvalidError is talking about stacking transistors within the die.

be an much easier process,
Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).

To get enough benefit from it to justify the extra cost and complexity, you'd probably want to look at using it for things besides SRAM. That would have implications for designers and layout algorithms and require support from EDA tools. In other words, a non-trivial enhancement.
 

InvalidError

Titan
Moderator
Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).
FinFET was not trivial, until it was. TSVs were not trivial until they were, now everyone and their dog are planning to use it for everything. The same will most likely happen to GAA and CMOS stacking is nothing more than extra GAA layers.

If you are already going to do four GAA layers to increase drive strength on high-fanout signals, there isn't much stopping you from doing a 2-2 split for stacked-CMOS SRAM cells or even 1-1-1-1 to stack two bits. For general logic, the same layer splits could be used to implement most two-inputs logic gates.

Updating layout and design tools for this should be relatively trivial. The main hurdle with implementing stacked CMOS beyond SRAM is power density: we're already near practical limits with FinFET and stacked GAA CMOS could easily increase that by 2-3X.
 

bit_user

Titan
Ambassador
GAA and CMOS stacking is nothing more than extra GAA layers.

If you are already going to do four GAA layers to increase drive strength on high-fanout signals, there isn't much stopping you from doing a 2-2 split for stacked-CMOS SRAM cells or even 1-1-1-1 to stack two bits. For general logic, the same layer splits could be used to implement most two-inputs logic gates.
What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?
 

InvalidError

Titan
Moderator
What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?
Every ribbon is surrounded by gate oxide and gate metal, the channels should be effectively shielded from each other. If anything besides long parallel traces might require separation for crosstalk reasons, it would be the gates. Though I doubt it would be any worse than fin-FET. For simple yet essential things like inverters (half of a SRAM bit cell) where N-MOS and P-MOS can share a single gate structure, performance should improve in just about every measurable way.
 
Status
Not open for further replies.