News AMD Shows New 3D V-Cache Ryzen Chiplets, up to 192MB of L3 Cache Per Chip, 15% Gaming Improvement

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
What's very curious and IMPORTANT to note here is they locked the processor at 4GHz. We all know the 5900X can run at considerably higher speeds here.

10:1 They are thermal throttling. Note the lack of thermals.

Call me crazy but eventually sockets will have cooling from below and above working in a compression fashion where each heat sink (front and back) pull against each other.
It's also a very early engineering sample, I wouldn't read too much into that.
 
Call me crazy but eventually sockets will have cooling from below and above working in a compression fashion where each heat sink (front and back) pull against each other.
I think this wound only happen with very specialized PCBs (and CPUs) where the CPU is soldered to the board. Making a CPU removable necessitates (at least currently) having the CPU too far off the PCB to make cooling it from the underside practical.

...but I still won't call you crazy. 😉
 
10:1 They are thermal throttling. Note the lack of thermals.

Call me crazy but eventually sockets will have cooling from below and above working in a compression fashion where each heat sink (front and back) pull against each other.
With one extra layer of silicon and interconnects between the CCD and IHS, thermals are almost certain to be a little more challenging.

As for cooling the bottom of the socket, I wouldn't expect too much out of that since heat has to go through the bed-of-pins down to the motherboard, through layers and whatever may be on the back. The thermal resistance from the die, through the CPU substrate and everything else to the back of the motherboard will be horrible. The most heatsinking I could imagine making sense there would be upgrading the mounting backplate to a small heatsink mainly to help cool the Vcore power and ground planes so they don't contribute to CPU temperature and maybe the socket just a little bit.
 
  • Like
Reactions: helper800
With one extra layer of silicon and interconnects between the CCD and IHS, thermals are almost certain to be a little more challenging.

As for cooling the bottom of the socket, I wouldn't expect too much out of that since heat has to go through the bed-of-pins down to the motherboard, through layers and whatever may be on the back. The thermal resistance from the die, through the CPU substrate and everything else to the back of the motherboard will be horrible. The most heatsinking I could imagine making sense there would be upgrading the mounting backplate to a small heatsink mainly to help cool the Vcore power and ground planes so they don't contribute to CPU temperature and maybe the socket just a little bit.

I imagined a hole in the PCB of the motherboard and the backside of the interposer exposed. The pads would be exposed on an outer ring. But you are correct, there would be a lot more thermal resistance from the die side.

SRAM cache tends to be expensive, power wise, when it's fired. Associative cache fires off a bunch of transistors at the same time looking for the correct cache element. I'm wondering which would consume more current. The CPU itself or a L1/L2 cache miss. The scheduler in the background is likely already setting up the L3 cache to run in parallel as the instructions go into pipe incase it is tapped.
 
SRAM cache tends to be expensive, power wise, when it's fired.
The SRAM itself uses almost no power. It is the tagRAM (the bit responsible for keeping track of what cache line is caching which memory chunk) that uses tons of power doing the lookups and if you are going to make the L3$ 6X as large, you can mitigate tagRAM power and latency by making each cache line 2-8X as big. With such a large cache, you should also be able to afford reducing the way-ness and associativeness of each L3$ block to simplify the tagRAM without hurting the hit rate much.
 
  • Like
Reactions: digitalgriffin
Since AMD was speaking about gaming performance, this 64MB 128MB cache should be a mainstream thing at least on select gaming SKUs. Lower-end SKUs will likely use cache die defects for 32-56MB 64-112MB extra L3$.

(Edit: with the structural silicon, there may be smaller cache die or even models with none at all. On 7nm, 128MB would be pretty close to a whole CCD-sized die, not just a center sliver shown in illustrations.)
I thought that they implied the full 64MB cache stack on one CCD was about less than 1/2 the size of a 7nm core chiplet so if they were aiming for the full 128MB , there would be plenty of room (since it would be a 64MB stack on each chiplet). Lisa herself said the 64MB SRAM used in the demo was 36mm^2 which is just under 1/2 the die area of a full Zen 3 chiplet.
 
Does anyone here remember the IBM Power4?
https://en.wikipedia.org/wiki/POWER4
Two cores and 32MB of (off chip) L3 cache circa 2001 (or two whole decades ago). That makes for 16MB/core (no multi-threading AFAIK at that time). This 192MB L3 cache has 192/16 = 12MB/core = 6MB/thread (or 16MB/core = 8MB/thread for a 12 core CPU).

I always marveled at those cache sizes on the Power4/Power5 at that time.
 
Last edited:
So the original article just got an update, for the people who did not see this:
Update 6/1/2021 10am PT: AMD has confirmed to Tom's Hardware that Zen 3 Ryzen processors with 3D V-Cache will enter production later this year. The technology currently consists of a single layer of stacked L3 cache, but the underlying tech supports stacking multiple dies. The technology also doesn't require any specific software optimizations and should be transparent in terms of latency and thermals (no significant overhead in either). We also obtained further fine-grained details, stay tuned for additional coverage.

This is indeed very interesting. So if I got this right, it means we will have a Zen3+ (Refresh) with 3d v-cache to fight directly with Alder Lake, right? Also no issues with thermals...

I do have some questions, like will it run on existing AM4 motherboards? How about the B450s?

This is great news!
 
While I think AMD is doing very well, I am starting to wonder if this every increasing cache size is sustainable. AMD's solution on CPU and GPU seems to be quite similar, slap an oversized cache to improve performance. Cache takes up a lot of die space and even if you can stack it, there is still a limit to the height and width of the chip.

i'm not AMD will have people on their payroll making a tons of money that will have the metrics and the data to see if the R&D and millions spent will be a good decision for the business.
 
The 5950X (and 5900X) now appears to be in stock at most places (~$800 at a micro center near you) at the moment. Can't wait for this 128MB L3 update (and review thereof), must save money starting now.
 
While I think AMD is doing very well, I am starting to wonder if this every increasing cache size is sustainable. AMD's solution on CPU and GPU seems to be quite similar, slap an oversized cache to improve performance. Cache takes up a lot of die space and even if you can stack it, there is still a limit to the height and width of the chip.
The die space here is non-existent. Since it is a different die it takes up no space of the processor.
 
Everybody in discussion seems to be focused on Intel and Alder Lake, but is that ignoring the elephant Apple in the living room? To some extent Alder Lake with its big.Little enhancements strikes me as a play more for the mobile market than ultra performance users.

How much of this new AMD/chiplets is a response to M1 and staying ahead of what Apple is up to?
 
  • Like
Reactions: digitalgriffin
To some extent Alder Lake with its big.Little enhancements strikes me as a play more for the mobile market than ultra performance users.
Intel's heterogeneous cores arrangement is because it cannot do 16 high-performance cores on 10nm within a reasonable TDP. AMD is planning to do heterogeneous cores too with Zen 4D as the power-efficient cores to go along Zen 5.
 
  • Like
Reactions: ezst036
Everybody in discussion seems to be focused on Intel and Alder Lake, but is that ignoring the elephant Apple in the living room? To some extent Alder Lake with its big.Little enhancements strikes me as a play more for the mobile market than ultra performance users.

How much of this new AMD/chiplets is a response to M1 and staying ahead of what Apple is up to?

Eh, Apple is not really applicable, for most users they wont care, or wouldn't be willing to go through all of the downsides of switching platforms for something theyll likely barely notice. People make M1 out to be a huge thing (and it is in terms of general mobile market trends and whats possible with arm), but Apple will never sell that chip to anyone else. You would have to be willing to pay the apple tax and go through the aforementioned platform shift to be able to use it, and it isn't always faster in everything. I'd say none of this is a response to apple, the stuff you're looking at now has been in the pipeline for 4 or 5 years, sometimes it can get fast tracked and its only two or three years, but as fast as things can change with silicon, for the most part development is slow.
 
Intel's heterogeneous cores arrangement is because it cannot do 16 high-performance cores on 10nm within a reasonable TDP. AMD is planning to do heterogeneous cores too with Zen 4D as the power-efficient cores to go along Zen 5.

I agree. But it's curious why there would be a focus low power cores unless it was purely for energy savings.

I would have to see the Inst Set support for the low powered cored to figure out how it would affect things like gaming and encoding/decoding type task. I'm guessing AVX and other MIMD support on the low powered cores is non existent. A low powered core might be useful for something like handling a network stack, or system IO interrupts. But again, this is only relevant for low power users like laptops.
 
Everybody in discussion seems to be focused on Intel and Alder Lake, but is that ignoring the elephant Apple in the living room? To some extent Alder Lake with its big.Little enhancements strikes me as a play more for the mobile market than ultra performance users.

How much of this new AMD/chiplets is a response to M1 and staying ahead of what Apple is up to?

Apple's M1 is a low powered mobile chip with some major limitations. Impressively efficient for what it is, sure - but not even close to competing in the desktop market. Keep in mind that processor can only support a maximum of 8GB shared memory for the entire system. It has great bandwidth since it's GDDR - but AMD already has well proven designs of APUs supporting 16GB GDDR in both the PS5 and XboXSeX.
I'm sure if Apple starts to become a threat to AMD, they're more than capable of pushing back. There really isn't anything stopping AMD from buying their way into process parity with the TSMC 5nm node that Apple is using.
Apple already knows that if they push M1 too hard they're going to lose their "creative" market for people who need a real computer with the capacity to actually render videos or to even load a set of large pictures into photoshop. Maybe the M2 will be better, but at the end of the day M1 is still just a souped-up phone processor that has been overhyped by Apple's marketing empire.
Or to put it another way, M1 is competing with AMD in the same way that their depressing new washed-out faded-denim iMac is "Vibrant purple" - effective marketing against the 10% of the population who is colorblind, but not doing much to penetrate the other 90% of customers.

Also, the ARM instruction set simply was never designed to be good at a typical High-End Workstation/Server workload (ie tasks that benefit from an expanded instruction set / AVX), which is where the real money is at for AMD and Intel - and the customers most likely to need a giant cache.
 
I would have to see the Inst Set support for the low powered cored to figure out how it would affect things like gaming and encoding/decoding type task. I'm guessing AVX and other MIMD support on the low powered cores is non existent. A low powered core might be useful for something like handling a network stack, or system IO interrupts. But again, this is only relevant for low power users like laptops.
The instruction set is likely the same, just optimized for lower power and smaller die size: you can do AVX512 on a quarter-width ALU, you just need to break it down into four extra steps.

As for it being "only relevant for laptops", I have 3100+ threads on my desktop and I'm pretty sure 3050+ of them would be perfectly fine running on low-power cores instead of making a core turbo to 4.2GHz for 10 microseconds a couple of times each second for a combined total of 5% core activity. It would probably reduce my PC's baseline power draw measured at the wall by 5-10W. While this may not sound like much, you need to keep in mind that we're also at a point where regulations are forcing the migration to 12VO to save ~10W per system.
 
  • Like
Reactions: artk2219
Intel has been innovating. The problem is they don't have anything to show for it because of the woes trying to get off 14nm.
I was going to argue with you, but Intel's been trying to innovate for 3 years. If they'd have been trying to innovate for 6 years (before Ryzen) though, they probably wouldn't be in half the mess they're in right now.

They had the cash in 2015 to build a 10nm or better fab from scratch. This ineptitude isn't merely misfortune. Great for us though since the CPU industry now has two legitimate companies. I'm very happy with Intel's assist on AMD's resurrection.
 
  • Like
Reactions: artk2219
I love the concept, but I have to say I'm wary of thermals in the way they described how they'll be doing the "ground leveling" of the surface.

Throwing more memory if the price increase is low... Sure, why not? Will it be cheap though? Hm... Doubt it, so this may only be a feature for TR-class or Ry-69x0-class. I'd love it if they released a Zen3 refresh with this just bolted on as a "preview" for enthusiasts. I'm sure that nieche market would pay whatever premium they ask for.

I seriously doubt this will make its way to lower SKUs. With the "G" and mobile APUs they made clear they'll be uplifting their monolithic dies and then the chiplets will cover the higher end of the performance spectrum, but will keep them separated. Or that's what I think.

Cheers!

They placed the v-cache so it sits on top of the L3 cache not the logic areas i.e. the center of the chiplet. So I expect thermals to be mostly the same.
 
While I think AMD is doing very well, I am starting to wonder if this every increasing cache size is sustainable. AMD's solution on CPU and GPU seems to be quite similar, slap an oversized cache to improve performance. Cache takes up a lot of die space and even if you can stack it, there is still a limit to the height and width of the chip.

Cache wont be ever increasing. There is a trade off so I don't expect them to keep just upping the cache sizes not at least until they are on a new process node like say TSMC 3nm next.
 
I agree. But it's curious why there would be a focus low power cores unless it was purely for energy savings.

I would have to see the Inst Set support for the low powered cored to figure out how it would affect things like gaming and encoding/decoding type task. I'm guessing AVX and other MIMD support on the low powered cores is non existent. A low powered core might be useful for something like handling a network stack, or system IO interrupts. But again, this is only relevant for low power users like laptops.

Alderlake's low power cores(Gracemont) aren't gimped. It'll have full support for AVX2, without resorting to needing 2 cycles per AVX2 instruction. Performance per core should be in the Skylake level or even better.

The current "Tremont" cores are in the Ivy Bridge/Haswell range. They've got full SSE4.1 support two generations ago in Goldmont.

Especially in the low power space, the 2+8 configuration means the low power cores will be there to significantly boost multi-threaded performance. The leaks were talking about 2x, which I think applies to mobile(I don't think desktop will end up being 2x as fast in MT over Rocketlake).

Sure it lacks HT, and probably clock around 3.5GHz or less but 8 Skylake-class cores is a big help.
 
Status
Not open for further replies.