News Ryzen 9 7950X3D surfaces with 192MB L3 cache — 3D V-Cache ES CPU has 64MB more than retail CPU

Status
Not open for further replies.
Even a regular 7950X3D is a very impressive, sophisticated processor. If you want "it all", gaming, professional workloads, it delivers like no others. I am looking forward to seeing the 8000 series and its comparative performance. Maybe an 8800X3D?
 
Even a regular 7950X3D is a very impressive, sophisticated processor. If you want "it all", gaming, professional workloads, it delivers like no others. I am looking forward to seeing the 8000 series and its comparative performance. Maybe an 8800X3D?
Expected to be called 9000 series.

If AMD wants to impress, they release a 9960X3D: 8 Zen 5 cores with 3D V-Cache, and 16 Zen 5c cores. None of the scheduling issues seen with 7950X3D/7900X3D, they would still only have to add 1 cache chiplet, and it would lead in multi-threading.
 
Expected to be called 9000 series.

If AMD wants to impress, they release a 9960X3D: 8 Zen 5 cores with 3D V-Cache, and 16 Zen 5c cores. None of the scheduling issues seen with 7950X3D/7900X3D, they would still only have to add 1 cache chiplet, and it would lead in multi-threading.

But presumably at a significant clock speed deficit as neither 3D cache nor C cores have shown they can clock as high as regular cores yet. That could be a problem.
 
This seems to be a side effect of Windows' VBS (Core Isolation) feature, where enabling it will cause the system to mis-report L3 cache size as 2 x 96MB.

I tested it on my own 7950X3D and was able to reproduce the issue, as shown here.
View: https://imgur.com/a/54d8cdF


Enabling VBS shows 192MB in Task Manager, or 2 x 96MB in CPU-Z.
Disabling VBS (and rebooting) shows 128MB in Task Manager, and 96 MB + 32 MB in CPU-Z.
 
Expected to be called 9000 series.

If AMD wants to impress, they release a 9960X3D: 8 Zen 5 cores with 3D V-Cache, and 16 Zen 5c cores. None of the scheduling issues seen with 7950X3D/7900X3D, they would still only have to add 1 cache chiplet, and it would lead in multi-threading.
How do you figure that it wouldn't have any scheduling issues?!
Software doesn't understand, on its own, that the c cores are any different from any other cores, also I very much doubt they can fit 16c cores into the space of 8 normal cores, that would need a space benefit of at least 50% and the rumors are for about 35% ,so they would have to add one ccx more or change the size of it changing all of the tooling, and that's very doubtful as well.
 
  • Like
Reactions: thestryker
But presumably at a significant clock speed deficit as neither 3D cache nor C cores have shown they can clock as high as regular cores yet. That could be a problem.
Clock speed deficit, sure. Significant? Nah. The 3D cache cores would be the gaming cores, but still have respectable clocks despite losing a few hundred MHz turbo, and then the 16 C cores would be faster for multithreading than another 8 cores clocked higher.

How do you figure that it wouldn't have any scheduling issues?!
Software doesn't understand, on its own, that the c cores are any different from any other cores, also I very much doubt they can fit 16c cores into the space of 8 normal cores, that would need a space benefit of at least 50% and the rumors are for about 35% ,so they would have to add one ccx more or change the size of it changing all of the tooling, and that's very doubtful as well.
The 7950X3D problem is that half the cores have tripled L3 cache, but the other half clock higher. The decision on whether or not it's beneficial to choose the extra cache or clocks varies by game or program. It's not obvious to a scheduler.

The hypothetical 8X3D+16 chip would have more cache and higher clocks on the X3D side. So when it uses the highest clocking cores, they also end up having the extra cache. No issues.

The Zen 4c core shrinks by 35.4% compared to a Zen 4 core. But the L3 cache also shrinks despite having the same 32 MB on the CCD, presumably from using a different cell library. I think this fact is often forgotten. I didn't remember it until I looked it up:

https://www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale

The end result? An 8-core Zen 4 chiplet with 32 MB L3 cache (1 CCX) = 66.3mm^2. A 16-core Zen 4c chiplet with 32 MB L3 cache (2 CCXs) = 72.7mm^2. That's only 9.65% larger.
 
  • Like
Reactions: TheJoker2020
How do you figure that it wouldn't have any scheduling issues?!
Software doesn't understand, on its own, that the c cores are any different from any other cores, also I very much doubt they can fit 16c cores into the space of 8 normal cores, that would need a space benefit of at least 50% and the rumors are for about 35% ,so they would have to add one ccx more or change the size of it changing all of the tooling, and that's very doubtful as well.
Unlike Intel's hybrid approach, the ZenC cores are the same uArc as Zen with clock speed and cache being different. For the Bergamo server chips AMD already makes those CCDs with 16 cores per CCD. While that is bigger than an 8 core Zen CCD, the advantage of using chiplets is that you can put CCDs of different physical size on the same package.
 
  • Like
Reactions: TheJoker2020
Clock speed deficit, sure. Significant? Nah. The 3D cache cores would be the gaming cores, but still have respectable clocks despite losing a few hundred MHz turbo, and then the 16 C cores would be faster for multithreading than another 8 cores clocked higher.
You have to keep in mind that the platform is already at the max power that it can draw, and the main cores at least will be clocked higher so no saving any power from there, unless AMD thinks that they will sell well without any IPC increase at all...
Software will not automatically run only on the 16c cores.
I assume the person you quote here was talking about the CPU as a whole.
The 7950X3D problem is that half the cores have tripled L3 cache, but the other half clock higher. The decision on whether or not it's beneficial to choose the extra cache or clocks varies by game or program. It's not obvious to a scheduler.
16 cores without any lag in-between them should still be better for many games, or at least some games, even if they are slower.
 
Unlike Intel's hybrid approach, the ZenC cores are the same uArc as Zen with clock speed and cache being different. For the Bergamo server chips AMD already makes those CCDs with 16 cores per CCD. While that is bigger than an 8 core Zen CCD, the advantage of using chiplets is that you can put CCDs of different physical size on the same package.
He was talking about games though, do you know of ANY game that won't run on the e cores?
 
I just checked mine and I've got one, too. everything from widows 11 to cpu-z reports 192MB or 2x96MB. this might be more common than than anyone outside of AMD die mfg knows
 
He was talking about games though, do you know of ANY game that won't run on the e cores?
While the games can run on the Intel e cores, you still have scheduling issues as the p and e cores are completely different uArcs. Not to mention if you just run games on the e cores even if you got max boost the entire time (for the 14900k that is 4.4 GHz) you will lose performance. In reality you would revert to 2017/18 gaming performance at best with your high-end GPUs.
 
While the games can run on the Intel e cores, you still have scheduling issues as the p and e cores are completely different uArcs.
ARM and x86 would be completely different uArchs, if they run the same code without any changes they are 'the same enough' .
They are a lot slower in gaming due to the lower cache and clocks, the same things that the c cores would face.
 
ARM and x86 would be completely different uArchs, if they run the same code without any changes they are 'the same enough' .
They are a lot slower in gaming due to the lower cache and clocks, the same things that the c cores would face.
E cores are Atom uArc which doesn't have the same instructions. Therefore scheduling is far more difficult on Intel's hybrid because you might not be able to have something running on the e cores because it doesn't support say AVX512. There are 0 uArc difference between Zen and ZenC. The ZenC cores just aren't as fast as Zen due to core clock and cache. If you just ran a game on the C cores it wouldn't be as fast as the Zen cores but faster than e cores.
 
  • Like
Reactions: TheJoker2020
How do you figure that it wouldn't have any scheduling issues?!
Software doesn't understand, on its own, that the c cores are any different from any other cores, also I very much doubt they can fit 16c cores into the space of 8 normal cores, that would need a space benefit of at least 50% and the rumors are for about 35% ,so they would have to add one ccx more or change the size of it changing all of the tooling, and that's very doubtful as well.
The AMD "c" cores have exactly the same instruction set as the "big cores", they even run at the same IPC, the difference is the Cache sizes which are not taken into account with the Windows Scheduler and the clock speeds which is the primary factor for the windows Scheduler with the secondary factor being whether to bounce threads between dies which it shouldn't do anyway because of the lower clock speeds, but would do if whatever is running uses lots of threads, and people already see that with AMD 12 core and above chips, and this is still a secondary Windows Scheduler factor.

This would then mean that the primary cores used would be the cores with 3D V-Cache and secondary cores would be the "c" cores because they will still run at lower clock speeds, therefore there would be no Windows Scheduling issues.

The die size differences between the Zen4 and the Zen 4c chiplets is not huge and there is the space if they really wanted to, but server CPU's make far higher profits and they simply didn't need to do so to compete with Intel. AMD could if they wanted to release a Zen 5 and Zen 5c product onto AM5 but it would require reworking the package.

The real question is whether they will want to or not on AM5 with Zen 5 and Zen 5c, and obviously putting 3D V-Cache onto the Zen 5 chiplet is the goal. There is the physical space, but creating a new package for this would be necessary and then there is the issue of whether there is enough memory bandwidth to not starve so many cores which there certainly would be with 2x Zen 5c chiplets on AM5.
 
Last edited:
You have to keep in mind that the platform is already at the max power that it can draw
Nope, the AM5 platform was released with a max 170W TDP (not the same as actual wattage) but AMD has not released a chip that goes above 120W TDP, so there is plenty of room still.

EDIT: The 7950 and 7900 are both 170W TDP chips, interestingly all of the 7000 series X3D chips are 120W TDP. I have linked below the chart that shows all of the 7000 series TDP's.

https://www.amd.com/en/products/processors/desktops/ryzen.html#tabs-0eb49394b2-item-446166865a-tab
 
Last edited:
The whole AMD why decided against having dual CCD 3D-Vcache configuration because it couldn't provide any performance gains in gaming.

The main point of 3D V-Cache was to have a uniform L3 cache pool where a large chunk of the code can be cached, like frequently storing accessed shader data on-die, thus improving the hit rates and reducing cache/memory related pipeline stalls as well.

But with two distinct cache pools, the latency benefit is lost.

Assuming that most of the games leverage only up to eight CPU cores, so now if we store data related to these threads on the other die, it would do the opposite of what it’s supposed to do, i.e. increases the latency.

Also, the cache residency would get ever worse, because splitting the cache also splits the data into two distinct pools that are physically apart.

The game needs to know which cache pool exists on which die, which it doesn't. So due to this the data related to CCD0 could be stored on CCD1 and/or vice versa, either randomly or due to the former CCD being full. Thus the game is fetched data with an induced latency from a different die instead.
 
Nope, the AM5 platform was released with a max 170W TDP (not the same as actual wattage) but AMD has not released a chip that goes above 120W TDP, so there is plenty of room still.
https://www.anandtech.com/show/1741...m5-power-specifications-170w-tdp-and-230w-ppt
The platform has a 230W max and the 7950x already hits that with stock settings.
power-multithread.png
 
Even a regular 7950X3D is a very impressive, sophisticated processor. If you want "it all", gaming, professional workloads, it delivers like no others. I am looking forward to seeing the 8000 series and its comparative performance. Maybe an 8800X3D?
It's great, but it really needs a hw scheduler like Intel has. It would really make a world of difference
 
https://www.anandtech.com/show/1741...m5-power-specifications-170w-tdp-and-230w-ppt
The platform has a 230W max and the 7950x already hits that with stock settings.
power-multithread.png
Thank you for pointing out my mistake, I obviously missed the fact that the Ryzen 7900 and 7950 are 170W TDP, I never looked into them personally so probably just forgot.

Interestingly, the 7950X3D, 7900X3D and 7800X3D are all 120W TDP.!

Also thank you for noting that TDP is not the same as Wattage (just like Intel CPU's), and neither are comparable.

It is also well worth noting that some motherboard manufacturers have been exposed as dumping much higher voltages into CPU's than is needed, or specified, this drives up the Wattage, so this is still to some degree motherboard dependent, and also cooling dependent to a small degree.
 
The whole AMD why decided against having dual CCD 3D-Vcache configuration because it couldn't provide any performance gains in gaming.

The main point of 3D V-Cache was to have a uniform L3 cache pool where a large chunk of the code can be cached, like frequently storing accessed shader data on-die, thus improving the hit rates and reducing cache/memory related pipeline stalls as well.

But with two distinct cache pools, the latency benefit is lost.

Assuming that most of the games leverage only up to eight CPU cores, so now if we store data related to these threads on the other die, it would do the opposite of what it’s supposed to do, i.e. increases the latency.

Also, the cache residency would get ever worse, because splitting the cache also splits the data into two distinct pools that are physically apart.

The game needs to know which cache pool exists on which die, which it doesn't. So due to this the data related to CCD0 could be stored on CCD1 and/or vice versa, either randomly or due to the former CCD being full. Thus the game is fetched data with an induced latency from a different die instead.
This is false because this is already handled by the Windows (or Linux/BSD) Scheduler, and do not forget that all CCD's on all chips that use 2 or more CCD's ALL have L3 cache, the X3D CCD's simply have more, and the data in the L3 cache is not randomly stored on another CCD, it is stored on the CCD where the compute is happening.!
 
Status
Not open for further replies.