News SRAM scaling isn't dead after all — TSMC's 2nm process tech claims major improvements

If true, this is legitimately awesome news. I really thought SRAM scaling would continue to be in a really bad place which meant newer chips had much less possibility for downsizing since ~90% of a consumer CPU is SRAM.
 
chips built on N2 are expected to reduce power usage by 25% to 30% (at equivalent transistor count and frequency), boost performance by 10% to 15% (with the same transistor count and power), and achieve a 15% increase in transistor density (maintaining the same speed and power).

These should be OR statements please.
 
  • Like
Reactions: bit_user
Having to scroll horizontally on a table is ridiculous when the table is only using 20% of my displays horizontal real estate.
You know you can optimize a website for mobile without sacrificing desktop usability? You can resize things for different displays.
 
Having to scroll horizontally on a table is ridiculous when the table is only using 20% of my displays horizontal real estate.
You know you can optimize a website for mobile without sacrificing desktop usability? You can resize things for different displays.
The authors who write the articles aren't the ones that control the layout, sadly. The publisher controls this, and it's the same across virtually all of their sites.

That said, I do think perhaps it would've worked better for the author to transpose the table. It's 6x8 (HxW), so it would've resulted in less scrolling to do that. More importantly, you're usually wanting to see how the nodes compare on a per-metric basis, which you could do without scrolling if it were transposed.
 
  • Like
Reactions: P.Amini
Huh. Yeah, 30% sounds more reasonable.
T yeah the various buffers have gotten so large that they’re currently using a much larger proportion of the SRAM on an AMD chip than they were in the Zen2 period. On the other hand, Intel has grown the caches much more since 2019, BUT on the “Cove” p cores the various buffers are also absolutely massive. Then on a third hand, Apple is flat out using more SRAM and way more silicon overall, per P core, than either x86 company.
 
Then on a third hand, Apple is flat out using more SRAM and way more silicon overall, per P core, than either x86 company.
Well, not per P-core. Lunar Lake is made on the exact same node as the Apple M3, so we have a perfect comparison. Lion Cove, the P-core used in Lunar Lake, is 4.53 mm², while the M3's P-core is only 2.49 mm². You can see this yourself, in the die shots of each.

It's a common myth among the PC Master Race that Apple's superior performance and efficiency is merely from using new nodes and large caches. Yes, their phone SoCs do consistently feature larger caches than their rivals, but that's a small part of their overall performance advantage. As for the M-series, let's continue to compare & contrast the base M3 with Lunar Lake, shall we?

I believe the M3's P-cores each have 128 kiB of L1D, while Lion Cove has 48+192 kiB of L0 + L1D. So, Intel has the larger L1 cache by 87.5%.

Moving to L2, the M3 appears to share 16 MiB among all four P-cores. Lion Cove has either 2.5 MiB or 3.0 MiB, depending on the version. I'm pretty sure the one in Lunar Lake has 2.5, though I'm not certain about Arrow Lake. So, Apple does 60% better, on P-core L2.

Next, Lunar Lake provides each P-core with a 3 MiB slice of L3 cache. The E-cores don't get any, meaning the total is just 12 MiB. The Apple M3 has no L3 cache, so chalk up a 12 MiB win for Intel!

Finally, Apple has what it calls SLC (System Level Cache), which is 8 MiB, from what I'm reading. Lunar Lake has 8 MiB of what Intel calls a "Side Cache", which sounds basically equivalent to Apple's SLC. So, this tier is a draw.

Briefly touching on the E-cores, I believe Apple gives them a shared 4 MiB slice of L2, whereas Intel allocates only 3 MiB for them. So, Apple with a slight win of 33%.

I've summarized these in a table:

StructureApple (per-core)Apple (total)Intel (per-core)Intel (total)
P-core L1D
128​
512​
240​
960​
P-core L2
16384​
2560​
10240​
E-core L2
4096​
3072​
L3
3072​
12288​
SLC
8198​
8198​
Total
29190
34758

Excluding whatever L1 cache the respective E-cores have, we see that Lunar Lake has 5.44 MiB (or 19.1%) more cache than the M3! So, whatever truth there might've been to the "cache" myth, it clearly doesn't hold for M3 vs. Lunar Lake. Intel clearly went even bigger. What Lunar Lake lacked at the L2 level, it more than made up for with abundant L3.
 
Last edited:
Well, not per P-core. Lunar Lake is made on the exact same node as the Apple M3, so we have a perfect comparison. Lion Cove, the P-core used in Lunar Lake, is 4.53 mm², while the M3's P-core is only 2.49 mm². You can see this yourself, in the die shots of each.

It's a common myth among the PC Master Race that Apple's superior performance and efficiency is merely from using new nodes and large caches. Yes, their phone SoCs do consistently feature larger caches than their rivals, but that's a small part of their overall performance advantage. As for the M-series, let's continue to compare & contrast the base M3 with Lunar Lake, shall we?

I believe the M3's P-cores each have 128 kiB of L1D, while Lion Cove has 48+192 kiB of L0 + L1D. So, Intel has the larger L1 cache by 87.5%.

Moving to L2, the M3 appears to share 16 MiB among all four P-cores. Lion Cove has either 2.5 MiB or 3.0 MiB, depending on the version. I'm pretty sure the one in Lunar Lake has 2.5, though I'm not certain about Arrow Lake. So, Apple does 60% better, on P-core L2.

Next, Lunar Lake provides each P-core with a 3 MiB slice of L3 cache. The E-cores don't get any, meaning the total is just 12 MiB. The Apple M3 has no L3 cache, so chalk up a 12 MiB win for Intel!

Finally, Apple has what it calls SLC (System Level Cache), which is 8 MiB, from what I'm reading. Lunar Lake has 8 MiB of what Intel calls a "Side Cache", which sounds basically equivalent to Apple's SLC. So, this tier is a draw.

Briefly touching on the E-cores, I believe Apple gives them a shared 4 MiB slice of L2, whereas Intel allocates only 3 MiB for them. So, Apple with a slight win of 33%.

I've summarized these in a table:
StructureApple (per-core)Apple (total)Intel (per-core)Intel (total)
P-core L1D
128​
512​
240​
960​
P-core L2
16384​
2560​
10240​
E-core L2
4096​
3072​
L3
3072​
12288​
SLC
8198​
8198​
Total
29190
34758


Excluding whatever L1 cache the respective E-cores have, we see that Lunar Lake has 5.44 MiB (or 19.1%) more cache than the M3! So, whatever truth there might've been to the "cache" myth, it clearly doesn't hold for M3 vs. Lunar Lake. Intel clearly went even bigger. What Lunar Lake lacked at the L2 level, it more than made up for with abundant L3.
Considering that Apple Silicon shares AMX units in a core cluster like the AMD Bulldozer, and Apple's software recognizes the AMX unit as part of a single core, the difference in essential core area is not large.
x86 is like an Apple CPU with an AMX unit per core, and that is the right answer in the server market.
 
Considering that Apple Silicon shares AMX units in a core cluster like the AMD Bulldozer, and Apple's software recognizes the AMX unit as part of a single core, the difference in essential core area is not large.
x86 is like an Apple CPU with an AMX unit per core, and that is the right answer in the server market.
I'm a little confused by your response, since the Lunar Lake x86 die used in my analysis has no AMX. If anything, Apple's inclusion of AMX makes Intel look even worse!
 
Can you share your source for the bitcell size? Which bitcell is this, the HD?
There's a link, right in the middle of the article:

a noteworthy aspect of TSMC's N2 is that this production node also shrinks HD SRAM bit cell size to around 0.0175 µm^2 (enabling SRAM density of 38 Mb/mm^2), down from 0.021 µm^2 in the case of N3 and N5, according to a paper that TSMC will present at the upcoming IEDM conference this December.
The authors here tend to be pretty good about including source links. They're usually right at/near the top, but sometimes you need to hunt for them.
 
There's a link, right in the middle of the article:
a noteworthy aspect of TSMC's N2 is that this production node also shrinks HD SRAM bit cell size to around 0.0175 µm^2 (enabling SRAM density of 38 Mb/mm^2), down from 0.021 µm^2 in the case of N3 and N5, according to a paper that TSMC will present at the upcoming IEDM conference this December.​
The authors here tend to be pretty good about including source links. They're usually right at/near the top, but sometimes you need to hunt for them.
Thank you. The article says there's going to be an improvement of density, but not that the new bitcell is 0.0175um2, which is not the information I have. Was this just a scale down of the 0.021, done assuming that the increase in density is due to the bitcell being optimized?
 
Thank you. The article says there's going to be an improvement of density, but not that the new bitcell is 0.0175um2, which is not the information I have. Was this just a scale down of the 0.021, done assuming that the increase in density is due to the bitcell being optimized?
I've seen Anton apparently make extrapolations like that. Sadly, he never reads our forum posts and I have no contact info for him, so we'll probably never know how he got that number. I doubt he has an inside track, though. Any sources he has, he'd probably have cited.