I cannot say with certainty that you are wrong, but I doubt that this is true and it has been a long time since I have looked in depth at things like this, my understanding was that the data in the L3 HAD to follow the core(s) that were processing that data, if you know otherwise please drop a link and I will read it and catch up on this detail, the obvious caveat being that if more threads are using that L3 than there are in the CCD, this was far more likely to be an issue when there were 2x CCX's per CCD, unified L3 cache per CCD fixed this.