Cache can certainly be accessed without being on the the CCD, but the issue is how much of a performance hit takes place when it has to venture out of its own CCD to utilize that cached data, as well as how it has to work with it (e.g., is it simply copying it to a more optimal location, is it working on it directly for an extended period of time). Usually once the latency and throughput gets bad enough that it imposes more of a performance hit than simply not using it, then it will not be used.
For example, with the X3D parts, even for tasks that heavily benefit from extra cache far more than clock speed, if they are set to run on the 2nd CCD, they will not attempt to utilize the extra cache on the first CCD. What needs to be considered is even if that latency can be halved, would it be able to use that cache natively for the core, or will it have to treat it as a pseudo L4 cache where it is simply copying that data to its own local cache before doing any work with that data.