News Ryzen 9 7950X3D surfaces with 192MB L3 cache — 3D V-Cache ES CPU has 64MB more than retail CPU

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
The amount of cache may simply line-up. Remember whole X3D variants are defected, not passing Quality coming off of Epyc line to begin with.
That's not how that works. The specs that define the Zen CCDs are exactly the same between Epyc and Ryzen. The cache totals are always the same between the two, per CCD, per number of enabled cores. The thing to disqualify die from Epyc use is it requiring above average voltage for the target frequency. They even still use CCDs with disabled cores in Epyc to a make up lower than 64 core models. Part of the reason for that is memory bandwidth is tied to CCD count, so they'll use the same number of CCDs but with less active cores. Ryzen CPUs just have a higher/wider variance V/F curve.
 
  • Like
Reactions: TheJoker2020
This is false because this is already handled by the Windows (or Linux/BSD) Scheduler, and do not forget that all CCD's on all chips that use 2 or more CCD's ALL have L3 cache, the X3D CCD's simply have more, and the data in the L3 cache is not randomly stored on another CCD, it is stored on the CCD where the compute is happening.!
No, both of your points are semi-accurate. While Windows does handle the scheduling, somewhere between the scheduler and the behavior of the CPU it doesn't want to keep any 1 core fully occupied too long under light loads. So a games threads get bumped to different cores, without limiting it to local CCD. Windows behaves as if its a monolithic CPU, because it's only mechanism to behave otherwise would be with multiple NUMA nodes. In times when it gets bumped to the other CCD, the data it needs isn't in the cache there and a latency hit occurs. The has been observed with single CCD CPUs having better and more consistent FPS than the dual CCD models, even when the dual CCD models have higher clocks. This was happening before X3D was around between the 5800X and 5950X. It's also why dual X3D didn't make sense, since its just more empty cache. The Xbox game bar method was pretty much their only user friendly solution for forcing Windows to keep games on 1 CCD.

There is a potential future where games needing more than 16 threads would have cause to have data in both CCDs. I've only seen Starcitizen using 19-20 threads on a 5950X. They had to patch it to ignore Intel's E-cores because the asymmetric perf causes nothing but stutters.
 
In times when it gets bumped to the other CCD, the data it needs isn't in the cache there and a latency hit occurs.
I cannot say with certainty that you are wrong, but I doubt that this is true and it has been a long time since I have looked in depth at things like this, my understanding was that the data in the L3 HAD to follow the core(s) that were processing that data, if you know otherwise please drop a link and I will read it and catch up on this detail, the obvious caveat being that if more threads are using that L3 than there are in the CCD, this was far more likely to be an issue when there were 2x CCX's per CCD, unified L3 cache per CCD fixed this.
 
I cannot say with certainty that you are wrong, but I doubt that this is true and it has been a long time since I have looked in depth at things like this, my understanding was that the data in the L3 HAD to follow the core(s) that were processing that data, if you know otherwise please drop a link and I will read it and catch up on this detail, the obvious caveat being that if more threads are using that L3 than there are in the CCD, this was far more likely to be an issue when there were 2x CCX's per CCD, unified L3 cache per CCD fixed this.
That's the point I was making. The data a thread is using is initially isolated to a specific CCD where said thread is spawned. If nothing ever hits the other CCD there is no reason to move data into it. If a thread is suddenly balanced in to the other CCD it makes an attempt to pull data from it's local L3 just to find that it isn't there. It then is either going to check an adjacent CCD's L3 or system memory to copy it to local L3, This time to check and fail incurs latency on top of actually pulling it from a further source.

I can't recall a link where this is explicitly described, but you can basically discern this from the existence of inter-CCD latency charts that are really about Epyc/Threadripper. That charts purpose is to tell you how much longer it takes to look for data in other CCDs. Desktop Ryzen is literally just those scaled way down. Those things happening are why in some cases a 5800X will have more consistent performance (1% lows) than a 5950X even though the latter has higher clocks. Many sites have referenced Ryzen's cache/memory subsytem latency deficiencies inherent to a chiplet design. It's the primary thing x3D over comes for latency sensitive tasks.
 
Status
Not open for further replies.