In order to get me to upgrade my 7950X3D they would have needed to add the cache to both CPU dies in order to ditch the requirement for software to pick which cores to utilize. Hopefully in the future they will be able to add the cache to both chiplets to make all cores equal.
I'm afraid that need to choose CCDs won't go away.
Nor is it actually caused by the presence of the V-cache, it's just exacerbated by it.
Caches exploit locality to cut down on the latency on RAM access and that locality is significantly reduced as you go off CCD.
That data can be found and used from the caches of other CCDs is great and one of the reasons EPYCs are doing so well with up to 16 CCDs (with or without V-cache) currently. But any code not aware of the CCD topology and trying to manage it consciously is likely to do much worse than code that does (while most hyperscaler workloads are scaled-in, not HPC scale-out and thus do not need to exploit locality as much).
Now in the HPC arena, where those EPYC 9684X (1152 MB of L3 cache in 12 CCD on Zen 4) or their Zen 5 successors will be used, developers carefully tune their software very carefully to take the greatest advantage of those dearly paid resources. And the presence of V-cache only means, they have to tune more to get more, because getting it wrong means shuffling more data between CCDs than getting it from RAM.
Game developers won't likely invest a similar effort for a niche that is extremely small and not likely to pay extra money for that effort.
So if your game (or OS) were to just randomly choose CPU cores assuming they are all the same and it doesn't matter, game performance, which is mostly about "fluidity" or about consistent and predictable "real-time" reactivity will suffer.
At a scheduling time slice it could choose a secondary hyperthread over a real CPU core, or it could hit an "efficiency" core or it could choose a core that doesn't have the data (nor the code) you're trying to execute in any of its three levels of cache.
If if you thought this is bad, consider the additional complexity of energy constrained computing like in mobile, where putting extra load on a currently unused core might cause all existing cores to drop their clocks to conform to thermal constraints, while the other CCD might still be cool about it...
AMD knows the reality of gaming: they sell tons of 8-core APUs for consoles, still the most important target for game developers, even if they also support PC gaming. And those console CPUs tend to be monolithic 8-core CPUs, relatively weak compared to their desktop counterparts and without any turbo complexity built-in, because that only makes life more difficult They know that GPUs dictate most of the game performance, but also the sort of game you can actually sell. There could be games never written, because they would require 128 cores to run.
Now don't get me wrong: I also would have gone and likely bought the dual V-cache CCD, but that's because my main focus isn't actually on gaming but technical architecture. And the peak clock loss from the V-cache seems to be much less in this generation because they placed it underneath the CPU. Perhaps it could even be zero, if you could selectively turn off the V-cache for workloads that won't profit from it.
But from my practical experience with my Ryzen 9 7950X3D (vs 5950X, 5800X, 5800X3D, 5700U, 7840H, 7945HX and various others, which I also own), I believe their choice is wise and right for the
vast majority: chips depend on vast scales to be affordable at all, so bespoke parts like a dual V-cache desktop part simply won't reach retail. Perhaps some Youtuber will get it done anyway or AMD might even sell something like that as an EPYC.
The clock loss from the V-cache on the 7950X3D really never comes into play (not gaming) on sustained computational workloads, because thermal limits drop clocks below what the V-cache CCD could still burn even if it didn't have one, once all cores on the V-cache less CCD go full throttle: so it doesn't really "suffer" from having less peak clocks, nothing will run with 16x 5.7 GHz, except NOP perhaps. That theoretical disadvantage should be even less noticeable on the 9950X3D, perhaps not even measurable on a synthetical benchmark.
Yet core game workloads won't ever stray beyond the 8 V-cache cores, if they want to optimize fluidity. Sometimes they need help in making the right choice from an OS, but that's also needed when you have any dual (or more) CCD CPU in your system and workloads disregarding topology.
Ironically it's an area where Intel's
monolithic CPUs are making a comeback. I got into Haswell and Broadwell Xeons when they became cheap enough to afford. And I noticed that I'm not the only one, there is a lot of really cheap and new "gamer" boards out there, supporting LGA 2011 CPUs which have flooded the recycled parts market in recent years.
Turns out that these are becoming quite good at gaming even with their relatively modest peak clocks, as games evolve to take advantage of multi-cores. I've observed some games really going out and loading all cores on these machines and a Xeon E5-2696 v4 has
55MB of L3, which it is putting to good use while the quad DDR4 memory subsystem also isn't that bad compared to a dual channel DDR5. One of these days I hope to get so bored, I'll put my RTX 4090 into that Xeon system to see how much performance actually suffers vs say a Ryzen 7 5800X3D, which offers very near the same multithreaded performance (and near identical CPU Wattage) with only 8 cores, but much better IPC and higher clocks.
And I bought that Broadwell 22-core Xeon E5-2696 v4 for €160, while my first Haswell 18-core E5-2696 v3 was still €700 (vs. around €5000 MSRP).
Of course these were extremely costly chips to make and become attractive for gaming only after the hyperscalers moved on.