It's probably talk of Unified 128 KB (Split 64 KB Each L1)the L1D and L1I caches remain at 64KB
It's probably talk of Unified 128 KB (Split 64 KB Each L1)the L1D and L1I caches remain at 64KB
I am confused now.... :?
I am confused now.... :?
Wat are the Advantage & disadvantages of having a larger L1 cache?
Wat are the Advantage & disadvantages of having a larger L1 cache?
Intel uses a large L2 cache because its CPUs' cacheing system is inclusive.
and Inclusive cache is faster too.
Then why did AMD chose an exclusive cache relation?
Wat are the Advantage & disadvantages of having a larger L1 cache?
No, it's the other way around.
AMD uses inclusive caching because it has a relatively large L1 cache.
Intel uses exclusive caching because it has a relatively small L1 cache.
For the same reasons, AMD has a much lower L2 <-> L1 bus bandwidth than Intel, Intel needs to swap data accross the 2 caches more often, hence it has do it faster.
In fact, the first K7 had already a 128KB L1 cache, but used off-die exclusive L2 cache.
On die inclusive L2 cache was introduced with the Thunderbird core.
AMD chose an exclusive cache arrangement back in the days of K7 thunderbird.
The reason is that it already sported a very large L1 cache, which was only half the size of Intel's Coppermine L2 cache (which was 256KB).
By going inclusive, AMD got a cache size advantage, because then it could have a usable 256+128= 384KB of cache.
With Intel's exclusive approache instead, the "usable" size of the cache was in fact 256KB (this because the 32KB of L1 cache were just mirroring part of the content of the exclusive L2).
For this reason (inclusive cache), AMD could also use a cheap and relatively slow 64 bit L2 <-> L1 bus and higher L2 cache latency, instead Intel had a massive 256bit bus and very low cache latency.
So, back in those days, it made a lot of sense for AMD to use inclusive caches.
But with today's L2 cache sizes (2-4MB), it doesn't make much sense IMO to use inclusive caches anymore.
Well, maybe it does, with AMD's relatively small 512KB L2.
My guess is that K8L L3 cache will be exclusive. (while the L2 might still be inclusive)
This because the L3 has to serve 2 cores... so if it was inclusive, it would not mirror data which is present in *any* of the 2 caches.
But then, if one core has some data in its L2, and the other core needs it... then the latter would have to fetch this data from the 1st core cache, or from memory... this is less efficient, than just having that data sit comfortably in the shared (and exclusive) L3.
AMD chose an exclusive cache arrangement back in the days of K7 thunderbird.
The reason is that it already sported a very large L1 cache, which was only half the size of Intel's Coppermine L2 cache (which was 256KB).
By going exclusive, AMD got a cache size advantage, because then it could have a usable 256+128= 384KB of cache.
With Intel's inclusive approache instead, the "usable" size of the cache was in fact 256KB (this because the 32KB of L1 cache were just mirroring part of the content of the inclusive L2).
For this reason (exclusive cache), AMD could also use a cheap and relatively slow 64 bit L2 <-> L1 bus and higher L2 cache latency, instead Intel had a massive 256bit bus and very low cache latency.
So, back in those days, it made a lot of sense for AMD to use exclusive caches.
But with today's L2 cache sizes (2-4MB), it doesn't make much sense IMO to use exclusive caches anymore.
Well, maybe it does, with AMD's relatively small 512KB L2.
My guess is that K8L L3 cache will be inclusive. (while the L2 might still be exclusive)
This because the L3 has to serve 2 cores... so if it was exclusive, it would not mirror data which is present in *any* of the 2 caches.
But then, if one core has some data in its L2, and the other core needs it... then the latter would have to fetch this data from the 1st core cache, or from memory... this is less efficient, than just having that data sit comfortably in the shared (and inclusive) L3.
EDIT: i mixed up the 2 terms, sorry
Hmm, well, for 4 cores, 2MB of inclusive L2 cache are not that much, in fact.
But if it was exclusive, then the processor should feature fast internal L2 cache buses, so that each core could directly retrieve data from the L2 cache of another.
I guess such an arrangement is also possible, and potentially higher performing than the inclusive one, but definitely more complex to implement.