Increased L2 size of Arrow Lake may lead to performance increases in memory bandwidth-demanding applications.
Intel to Expand Arrow Lake's L2 Cache Capacity : Read more
Intel to Expand Arrow Lake's L2 Cache Capacity : Read more
Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.Keeping in mind that Arrow Lake CPUs will be made on Intel's 20A (2nm-class) fabrication process, the company might increase the size of all caches as it may not have a significant impact on die size and cost.
Why did you feel compelled to write this up?!Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.
The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.Why did you feel compelled to write this up?!
"As it may not have a significant impact" is not a synonym for free,
so they didn't assume that it would be free and you just said that you would assume it to be free either, so what's your point here!?
Diminishing means to make less not to make non existent, it still has nothing to do with free.The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node.
If you have such data, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.
Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.Depends on what you're comparing against. Comparing "20A" against Intel 7, there might still be some significant margins for shrinking SRAM cells. However, we know SRAM is no longer scaling down well. So, I wouldn't assume it's "free", in terms of die area.
Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that. These are effectively sales pitches by IFS, to try and lure fab customers. Such an increase in SRAM density would be a big selling-point.Since SRAM memory arrays have pretty low power density, that would be one place where CPU and GPU designers should be able to easily stack CMOS ribbons at least two-high to make SRAM cells with a ~3T planar footprint. There should be a one-time rough doubling of SRAM density to be had there, more if they stack more active layers further down the line to the point of being able to stack SRAM bits.
If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.Or, are you simply pointing out that they could do this sort of thing in some future node or iteration?
Sounds cool, but I think:If you have the ability to do ribbon FETs, you have the ability to dope an arbitrary number of layers an arbitrary distance from the base silicon, which means the ability to stack CMOS too. From there, it is only a matter of being able to achieve sufficient yields doing so and also having the incentive to actually do it.
I believe you meant Efficiency Cores in this sentence.Meanwhile, it is unclear whether Intel also plans to expand the size of the L3 cache of Arrow Lake's performance cores.
Yes we do have some estimate data for you sake.The wording in the article diminishes the area-impact, which I felt wasn't appropriate, absent any data on SRAM scaling for the 20A node. Of all writers here, Anton should definitely be aware of that.
TSMC's 3nm Node: No SRAM Scaling Implies More Expensive CPUs and GPUs
Big problems from tiny memory cells.www.tomshardware.com
If you have said scaling data for Intel's 20A node, please provide it. If not, then you've failed to invalidate my calling the article's statement into question. Not helpful.
Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.Could Intel's 20A process enable such techniques without Intel having touted it, in their public announcements about it and 18A? I rather doubt that.
Thanks, but that's not useful to me without a point of comparison.Yes we do have some estimate data for you sake.
Intel 4 Backside Power Delivery is 95% Cell Utilization Rate.
"The density of these cells is also quite impressive. By moving to backside power delivery, Intel was able to utilize 95% of the space within one of the denser spots within the E-core cell. Unfortunately, Intel didn’t give comparable numbers for E-cores on Intel 4, but in general, the utilization is not quite that high."
They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.Since their stacking technology is foveros and is node independent there wouldn't be a reason to tout it for every specific node they bring out.
Stacking CMOS ribbons might not even be part of foveros and be an much easier process, I'm just saying that I don't think that it would be a node specific announcement, just a we have this procedure now and will use it everywhere, like they did with foveros.They could certainly die-stack L3 cache, like AMD has shown. If it turns out they can extend L2 cache through die-stacking, that would be an interesting development. Since L2 is higher bandwidth, I think it might be more challenging and/or less efficient to put on another die. You have to consider how many wires would need to cross from die to die.
Instead, what I'd expect is die-stacking to be used for migrating lower-bandwidth stuff (like L3), in order to free up room for more L2 cache on the compute die.
It's not. Foveros is die-stacking, but @InvalidError is talking about stacking transistors within the die.Stacking CMOS ribbons might not even be part of foveros
Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).be an much easier process,
FinFET was not trivial, until it was. TSVs were not trivial until they were, now everyone and their dog are planning to use it for everything. The same will most likely happen to GAA and CMOS stacking is nothing more than extra GAA layers.Requires more layers in the die, which has implications on fabrication time (= cost) and yield. Suffice to say that's not trivial. It's the kind of technique that I think would qualify as a minor node bump (or part, thereof).
What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?GAA and CMOS stacking is nothing more than extra GAA layers.
If you are already going to do four GAA layers to increase drive strength on high-fanout signals, there isn't much stopping you from doing a 2-2 split for stacked-CMOS SRAM cells or even 1-1-1-1 to stack two bits. For general logic, the same layer splits could be used to implement most two-inputs logic gates.
Every ribbon is surrounded by gate oxide and gate metal, the channels should be effectively shielded from each other. If anything besides long parallel traces might require separation for crosstalk reasons, it would be the gates. Though I doubt it would be any worse than fin-FET. For simple yet essential things like inverters (half of a SRAM bit cell) where N-MOS and P-MOS can share a single gate structure, performance should improve in just about every measurable way.What's not obvious to me about this leap you're taking is that the stacked transistors have sufficiently low interference. Wouldn't you need extra dielectric layers between different transistors than you would between layers of a single transistor?