News AMD Ryzen 9 7950X3D Tested in Blender, Geekbench 5

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
They already have high-bandwidth interconnects between the CCD and IOD, because every time a CCD gets a cache miss, it needs to snoop & potentially fetch the cacheline from other CCD. According to this, Zen 4 beefed up the bandwidth to about 1.5 TB/s:




No, you probably wouldn't move all your L3 there, but I did offer that it could be L4.

Be careful about throwing around terms like HBM, because that means something very specific and it's not at all what I was talking about.
I know exactly what I'm talking about: point is that it would not be L3 anymore. Whatever you end up calling it, is up for the implementation chosen. As for what it could be if it lived in the I/O die, doesn't really matter as it would be slower anyway.

Regards.
 
I know exactly what I'm talking about:
Not if you start throwing around terms like HBM, while I was talking about stacking SRAM on the I/O Die. HBM is something very specific, and it's not that.

As for what it could be if it lived in the I/O die, doesn't really matter as it would be slower anyway.
But the compute die could run as fast as in the non-3D version and you'd get the benefit of 1.5 TB/s access to that L4 cache (about 20x faster than DDR5) for only a couple ns extra penalty vs. having it on the CCD.

I think the reason they didn't do it isn't because it's not a viable idea, but rather that it doesn't scale for EPYC. And that's the main market of their CCD chiplets - not gaming desktops!
 
Not if you start throwing around terms like HBM, while I was talking about stacking SRAM on the I/O Die. HBM is something very specific, and it's not that.


But the compute die could run as fast as in the non-3D version and you'd get the benefit of 1.5 TB/s access to that L4 cache (about 20x faster than DDR5) for only a couple ns extra penalty vs. having it on the CCD.

I think the reason they didn't do it isn't because it's not a viable idea, but rather that it doesn't scale for EPYC. And that's the main market of their CCD chiplets - not gaming desktops!
Nothing prevents AMD to slap HBM in the I/O design as part of it; HBM is already stacked, so you can follow a similar approach to the VCache with it; it would be cool to see, but I doubt it's economically viable. Plus, it would make the die hella big and probably won't pay off as a "cache" anyway. SRAM would follow similar suite, since it would enlarge the I/O die to where it becomes not viable. It would still be faster than main memory access for sure, but less "convenient" than VCache and about same manufacturing risks.

And yes, the second argument is really it, I'd say: they need the L3 cache to be bigger in order to hide latency from their weaker IMC. I can't remember where I read this, but AMD can't make a better IMC because Intel holds most of the key patents to an optimal/fast design, or something. Take that as you will.

I'll stop here; nothing much more to discuss around this topic. Much like the "HBM equipped APU", I doubt AMD will ever do the cool things until they can fund them as "side" projects to experiment.

Regards.
 
Nothing prevents AMD to slap HBM in the I/O design as part of it; HBM is already stacked, so you can follow a similar approach to the VCache with it;
HBM is DRAM, which makes it bad to use as cache. Its only advantage is throughput, but latency will be almost as bad as going to DDR5.

Furthermore, I don't know if you can use less than a full stack of HBM, if you actually want to get decent bandwidth out it. And stacking 8 dies atop the I/O Die is a very different proposition than stacking 1 atop it.
 
I'll stop here; nothing much more to discuss around this topic. Much like the "HBM equipped APU", I doubt AMD will ever do the cool things until they can fund them as "side" projects to experiment.

Stacking a big chunk of cache onto the cpu die all 3D-like is kinda cool...
 
  • Like
Reactions: bit_user
Stacking a big chunk of cache onto the cpu die all 3D-like is kinda cool...
It is, for sure!

I mean other, potentially less viable (economically) things you can come up with in terms of adding memory subsystems into the package. As I said in my example: the HBM-equipped APU for consumers. Most companies, when they do these "risky" type projects, they find ways to still make money out of it to at least break even. I don't think AMD is at that point where they can have that luxury just yet.

Regards.
 
As I said in my example: the HBM-equipped APU for consumers.
I had long thought it would be HBM, but Apple has shown that LPDDR5 makes a lot more sense. I just wonder who among the x86 set will do it first: Intel or AMD.

I had correctly predicted that Apple would be first to do the in-package DRAM thing. Where I was wrong is that I expected at least one of the x86 makers to have done it years ago. I know Intel used in-package DRAM in Xeon Phi and now Xeon Max, but I mean specifically for consumer laptop CPUs.
 
  • Like
Reactions: -Fran-
So now the glaring weakness of thee 5800X3D is resolved: You no longer have to give up have your productivity performance, you only sacrifice a little. These results are amazing, and it's weird they making it sound like it's bad with words like 'unimpressed'.

Now the question is did they solve part 2: Is the 7950X3D as big an improvement in gaming as the 5800X3D was.
Ah, C'mon. Glaring weakness - Half multicore/productivity performance. This just isn't true.
 
The (risky) bet AMD is making is that apps like games will either have patches which apply their own thread-affinities to the chiplet with extra cache, or that a Windows 11 patch can monitor cache usage stats by threads and steer them appropriately. I'm skeptical it'll work terribly well, but I guess we'll soon find out.

Yeah, totally agree. My thoughts were the same. If MS and Intel can work together on thread schedular, I'm sure AMD can too, and direct certain apps to the right CCX (with Vcache)
 
  • Like
Reactions: bit_user
Ah, C'mon. Glaring weakness - Half multicore/productivity performance. This just isn't true.
The leaked 7950X3D's multicore score is like 86% of the 7950X's. That's lower than I would've expected, but maybe whoever posted those results wasn't using the presumed Win 11 patches for improving the scheduler.

I think the "half performance" was in reference to how the 5000-series only offered 3D VCache on CPUs with half of the max # of cores.

Yeah, totally agree. My thoughts were the same. If MS and Intel can work together on thread schedular, I'm sure AMD can too, and direct certain apps to the right CCX (with Vcache)
In both cases, the problem they face is how a lot of apps are threaded, these days. The classic "work stealing" approach queues up work items for a pool of worker threads to consume. Each worker thread can see a wide variety of different work items, which will throw a wrench into any attempts to pin them to an optimal class of cores.

In the case of AMD's X3D processors, the downside of sub-optimal thread assignment should be much less than Intel's.
 
Last edited:
  • Like
Reactions: Roland Of Gilead
I think the "half performance" was in reference to how the 5000-series only offered 3D VCache on CPUs with half of the max # of cores.

Yes - If you wanted best productivity performance, you had to get the 5950x, which was often quite a bit slower than the 5800X3D in gaming performance.
Whereas in the intel world, you didn't have to make that compromise. The best productivity chip was also the fastest gaming chip, or close enough.

There were a lot of people who disliked the 5800X3D because of this.
 
  • Like
Reactions: bit_user
I know Intel used in-package DRAM in Xeon Phi and now Xeon Max, but I mean specifically for consumer laptop CPUs.
It just makes more sense in a laptop, where people are less likely to be concerned about upgrading. Desktops though? It seems like they're still in the domain of people who want upgradability and more choice when they build the machine.
I AM surprised that it hasn't reached more laptop parts, though. Though after a bit of digging, it seems that most DDR5 is build on 12 or 14nm process - which is likely a lot cheaper to manufacture.
 
  • Like
Reactions: bit_user
It just makes more sense in a laptop, where people are less likely to be concerned about upgrading.
Not only that, but laptops are where it's more common for people to use the iGPU (and doing so makes the thing smaller and lighter). An iGPU is where we see the greatest demand for main memory bandwidth, and putting the DRAM in-package makes it cheaper to add channels (as Apple has done in their M-series Pro and Max).

Intel is rumored to be scaling up their iGPU, in Arrow Lake. They're rumored to have over 3x as many cores as the current P-series (going from 96 to 320 EU, I think), which means they're going to have to find a lot more memory bandwidth, somewhere.