I disagree about the sweeping statement of "eliminating one layer of latency within the compute die".
The only situation where that happens is if you only have one compute die (CCD), since that will then no longer have multiple CCXes, so the latency layer of inter-CCX communication is removed.
If you have multiple CCDs, communication between cores on them will still make a hop by the I/O die, just as between two CCXes do on Zen 2 (even if they're physically located on the same CCD). Even though the higher latency is incurred less frequently (since there are now 7 neighboring cores instead of 3), it is by no means "eliminated".
Also, the fact that you double the L3 cache accessible to each core - and therefore less likely to have to go to system memory - doesn't remove any latency layers, just alters the chances of hitting them.
The place I would anticipate the most advancement is from combining the Infinity Fabric links of two CCXes into a single link. This could give a single core double the memory and I/O throughput (to the I/O die), as long as there is little contention for the resources with other cores.
EDIT: I stand corrected. It seems I had misunderstood this - getting a couple of terms mixed - when I first read about Ryzen 3000 during launch. The text I remembered was referring to intra-package, not intra-die communication (lack of direct links between CCDs). Shouldn't read technical material quickly...