Nehalem’s 8MB and 16 way associative L3 cache is inclusive of all lower levels of the cache hierarchy and shared between all four cores. Although Intel has not discussed the physical design of Nehalem at all, it appears that the L3 cache sits on a separate power plane than the cores and operates at an independent frequency. This makes sense from both a power saving and a reliability perspective, since large caches are more susceptible to soft errors at low voltage. As a result, the load to use latency for Nehalem varies depending on the relative frequency and phase alignment of the cores and the L3 itself and the latency of arbitration for access to the L3. In the best case, i.e. phase aligned operation and frequencies that differ by an integer multiple, Nehalem’s L3 load to use latency is somewhere in the range of 30-40 cycles according to Intel architects. The advantage of an inclusive cache is that it can handle almost all coherency traffic without disturbing the private caches for each individual-core. If a cache access misses in the L3, it cannot be present in any of the L2 or L1 caches of the cores. On the other hand, Nehalem’s L3 also acts like a snoop filter for cache hits. Each cache line in the L3 contains four “core valid” bits denoting which cores may have a copy of that line in their private caches. If a “core valid” bit is set to 0, then that core cannot possibly have a copy of the cache line – while a “core valid” bit set to 1 indicates it is possible (but not guaranteed) that the core in question could have a private copy of the line. Since Nehalem uses the MESIF cache coherency protocol, as discussed previously, if two cores have valid bits, then the cache line is guaranteed to be clean (i.e. not modified). The combination of these two techniques lets the L3 cache insulate each of the cores from as much coherency traffic as possible, leaving more bandwidth available for actual data in the caches.