News SRAM scaling isn't dead after all — TSMC's 2nm process tech claims major improvements

Admin · Oct 31, 2024

TSMC's N2 fabrication process shrinks SRAM bit cell size, increases SRAM density to ~38 Mb/mm^2.

SRAM scaling isn't dead after all — TSMC's 2nm process tech claims major improvements : Read more

Pierce2623 · Oct 31, 2024

If true, this is legitimately awesome news. I really thought SRAM scaling would continue to be in a really bad place which meant newer chips had much less possibility for downsizing since ~90% of a consumer CPU is SRAM.

tennis2 · Oct 31, 2024

chips built on N2 are expected to reduce power usage by 25% to 30% (at equivalent transistor count and frequency), boost performance by 10% to 15% (with the same transistor count and power), and achieve a 15% increase in transistor density (maintaining the same speed and power).

These should be OR statements please.

usertests · Oct 31, 2024

There are mistakes in the table. I'll leave it to you to figure out what they are.

80251 · Oct 31, 2024

Would this make any difference to AMD's 3D V-Cache? Or is that a different technology from SRAM?

bit_user · Oct 31, 2024

Pierce2623 said:
~90% of a consumer CPU is SRAM.

Absolutely not. Where did you ever find such a figure?

Steve Nord_ · Oct 31, 2024

Admin said:
TSMC's N2 fabrication process shrinks SRAM bit cell size, increases SRAM density to ~38 Mb/mm^2.

Get those soft error hardness tests at 62-68°C prepped. (Will TPUs get through post-quantum auth?)

Pemalite · Oct 31, 2024

Having to scroll horizontally on a table is ridiculous when the table is only using 20% of my displays horizontal real estate.
You know you can optimize a website for mobile without sacrificing desktop usability? You can resize things for different displays.

bit_user · Oct 31, 2024

Pemalite said:
Having to scroll horizontally on a table is ridiculous when the table is only using 20% of my displays horizontal real estate.
You know you can optimize a website for mobile without sacrificing desktop usability? You can resize things for different displays.

The authors who write the articles aren't the ones that control the layout, sadly. The publisher controls this, and it's the same across virtually all of their sites.

That said, I do think perhaps it would've worked better for the author to transpose the table. It's 6x8 (HxW), so it would've resulted in less scrolling to do that. More importantly, you're usually wanting to see how the nodes compare on a per-metric basis, which you could do without scrolling if it were transposed.

Pierce2623 · Nov 1, 2024

bit_user said:
Absolutely not. Where did you ever find such a figure?

Not sure what happened there. It was supposed to be 30% and even that is a little high. Keep in mind I’m not referring only to cache but all the storage in all the various buffers and registers too.

bit_user · Nov 1, 2024

Pierce2623 said:
Not sure what happened there. It was supposed to be 30% and even that is a little high. Keep in mind I’m not referring only to cache but all the storage in all the various buffers and registers too.

Huh. Yeah, 30% sounds more reasonable.

Pierce2623 · Nov 1, 2024

bit_user said:
Huh. Yeah, 30% sounds more reasonable.

T yeah the various buffers have gotten so large that they’re currently using a much larger proportion of the SRAM on an AMD chip than they were in the Zen2 period. On the other hand, Intel has grown the caches much more since 2019, BUT on the “Cove” p cores the various buffers are also absolutely massive. Then on a third hand, Apple is flat out using more SRAM and way more silicon overall, per P core, than either x86 company.

bit_user · Nov 1, 2024

Pierce2623 said:
Then on a third hand, Apple is flat out using more SRAM and way more silicon overall, per P core, than either x86 company.

Well, not per P-core. Lunar Lake is made on the exact same node as the Apple M3, so we have a perfect comparison. Lion Cove, the P-core used in Lunar Lake, is 4.53 mm², while the M3's P-core is only 2.49 mm². You can see this yourself, in the die shots of each.

https://www.reddit.com/r/hardware/comments/1fuuucj/lunar_lake_die_shot/

It's a common myth among the PC Master Race that Apple's superior performance and efficiency is merely from using new nodes and large caches. Yes, their phone SoCs do consistently feature larger caches than their rivals, but that's a small part of their overall performance advantage. As for the M-series, let's continue to compare & contrast the base M3 with Lunar Lake, shall we?

I believe the M3's P-cores each have 128 kiB of L1D, while Lion Cove has 48+192 kiB of L0 + L1D. So, Intel has the larger L1 cache by 87.5%.

Moving to L2, the M3 appears to share 16 MiB among all four P-cores. Lion Cove has either 2.5 MiB or 3.0 MiB, depending on the version. I'm pretty sure the one in Lunar Lake has 2.5, though I'm not certain about Arrow Lake. So, Apple does 60% better, on P-core L2.

Next, Lunar Lake provides each P-core with a 3 MiB slice of L3 cache. The E-cores don't get any, meaning the total is just 12 MiB. The Apple M3 has no L3 cache, so chalk up a 12 MiB win for Intel!

Finally, Apple has what it calls SLC (System Level Cache), which is 8 MiB, from what I'm reading. Lunar Lake has 8 MiB of what Intel calls a "Side Cache", which sounds basically equivalent to Apple's SLC. So, this tier is a draw.

Briefly touching on the E-cores, I believe Apple gives them a shared 4 MiB slice of L2, whereas Intel allocates only 3 MiB for them. So, Apple with a slight win of 33%.

I've summarized these in a table:

Structure	Apple (per-core)	Apple (total)	Intel (per-core)	Intel (total)
P-core L1D	128	512	240	960
P-core L2		16384	2560	10240
E-core L2		4096		3072
L3			3072	12288
SLC		8198		8198
Total		29190		34758

Excluding whatever L1 cache the respective E-cores have, we see that Lunar Lake has 5.44 MiB (or 19.1%) more cache than the M3! So, whatever truth there might've been to the "cache" myth, it clearly doesn't hold for M3 vs. Lunar Lake. Intel clearly went even bigger. What Lunar Lake lacked at the L2 level, it more than made up for with abundant L3.

BGA DRAM · Nov 27, 2024

bit_user said:
Well, not per P-core. Lunar Lake is made on the exact same node as the Apple M3, so we have a perfect comparison. Lion Cove, the P-core used in Lunar Lake, is 4.53 mm², while the M3's P-core is only 2.49 mm². You can see this yourself, in the die shots of each.

https://www.reddit.com/r/hardware/comments/1fuuucj/lunar_lake_die_shot/

It's a common myth among the PC Master Race that Apple's superior performance and efficiency is merely from using new nodes and large caches. Yes, their phone SoCs do consistently feature larger caches than their rivals, but that's a small part of their overall performance advantage. As for the M-series, let's continue to compare & contrast the base M3 with Lunar Lake, shall we?

I believe the M3's P-cores each have 128 kiB of L1D, while Lion Cove has 48+192 kiB of L0 + L1D. So, Intel has the larger L1 cache by 87.5%.

Moving to L2, the M3 appears to share 16 MiB among all four P-cores. Lion Cove has either 2.5 MiB or 3.0 MiB, depending on the version. I'm pretty sure the one in Lunar Lake has 2.5, though I'm not certain about Arrow Lake. So, Apple does 60% better, on P-core L2.

Next, Lunar Lake provides each P-core with a 3 MiB slice of L3 cache. The E-cores don't get any, meaning the total is just 12 MiB. The Apple M3 has no L3 cache, so chalk up a 12 MiB win for Intel!

Finally, Apple has what it calls SLC (System Level Cache), which is 8 MiB, from what I'm reading. Lunar Lake has 8 MiB of what Intel calls a "Side Cache", which sounds basically equivalent to Apple's SLC. So, this tier is a draw.

Briefly touching on the E-cores, I believe Apple gives them a shared 4 MiB slice of L2, whereas Intel allocates only 3 MiB for them. So, Apple with a slight win of 33%.

I've summarized these in a table:

Structure Apple (per-core) Apple (total) Intel (per-core) Intel (total)
P-core L1D
128
512
240
960
P-core L2
16384
2560
10240
E-core L2
4096
3072
L3
3072
12288
SLC
8198
8198
Total
29190
34758

Excluding whatever L1 cache the respective E-cores have, we see that Lunar Lake has 5.44 MiB (or 19.1%) more cache than the M3! So, whatever truth there might've been to the "cache" myth, it clearly doesn't hold for M3 vs. Lunar Lake. Intel clearly went even bigger. What Lunar Lake lacked at the L2 level, it more than made up for with abundant L3.

Considering that Apple Silicon shares AMX units in a core cluster like the AMD Bulldozer, and Apple's software recognizes the AMX unit as part of a single core, the difference in essential core area is not large.
x86 is like an Apple CPU with an AMX unit per core, and that is the right answer in the server market.

bit_user · Nov 27, 2024

BGA DRAM said:
Considering that Apple Silicon shares AMX units in a core cluster like the AMD Bulldozer, and Apple's software recognizes the AMX unit as part of a single core, the difference in essential core area is not large.
x86 is like an Apple CPU with an AMX unit per core, and that is the right answer in the server market.

I'm a little confused by your response, since the Lunar Lake x86 die used in my analysis has no AMX. If anything, Apple's inclusion of AMX makes Intel look even worse!

H725 · Feb 21, 2025

Can you share your source for the bitcell size? Which bitcell is this, the HD?

bit_user · Feb 21, 2025

H725 said:
Can you share your source for the bitcell size? Which bitcell is this, the HD?

There's a link, right in the middle of the article:

a noteworthy aspect of TSMC's N2 is that this production node also shrinks HD SRAM bit cell size to around 0.0175 µm^2 (enabling SRAM density of 38 Mb/mm^2), down from 0.021 µm^2 in the case of N3 and N5, according to a paper that TSMC will present at the upcoming IEDM conference this December.

The authors here tend to be pretty good about including source links. They're usually right at/near the top, but sometimes you need to hunt for them.

H725 · Feb 21, 2025

bit_user said:
There's a link, right in the middle of the article:

a noteworthy aspect of TSMC's N2 is that this production node also shrinks HD SRAM bit cell size to around 0.0175 µm^2 (enabling SRAM density of 38 Mb/mm^2), down from 0.021 µm^2 in the case of N3 and N5, according to a paper that TSMC will present at the upcoming IEDM conference this December.

The authors here tend to be pretty good about including source links. They're usually right at/near the top, but sometimes you need to hunt for them.

Thank you. The article says there's going to be an improvement of density, but not that the new bitcell is 0.0175um2, which is not the information I have. Was this just a scale down of the 0.021, done assuming that the increase in density is due to the bitcell being optimized?

bit_user · Feb 21, 2025

H725 said:
Thank you. The article says there's going to be an improvement of density, but not that the new bitcell is 0.0175um2, which is not the information I have. Was this just a scale down of the 0.021, done assuming that the increase in density is due to the bitcell being optimized?

I've seen Anton apparently make extrapolations like that. Sadly, he never reads our forum posts and I have no contact info for him, so we'll probably never know how he got that number. I doubt he has an inside track, though. Any sources he has, he'd probably have cited.

News SRAM scaling isn't dead after all — TSMC's 2nm process tech claims major improvements

Administrator

Commendable

Glorious

Distinguished

Distinguished

Titan

Commendable

Distinguished

Titan

Commendable

Titan

Commendable

Titan

Commendable

Titan

Titan

Titan

Share this page