Phenom II's L3 cache is: 48-way set-associative
i7's L3 cache is: 16-way set-associative
C2Q no L3 cache
http://www.agner.org/optimize/optimizing_cpp.pdf
http://www.agner.org/optimize/#manuals
When you play a game and there is a lot of action.
Physics, AI, a lot of verecies drawing the picture and more
Much data and many functions are used to calculate the picture.
Data and functions are scattered in this scenario.
You need a cache that doesn't evict data or code and goes to memory
i7's L3 cache is: 16-way set-associative
C2Q no L3 cache
http://www.agner.org/optimize/optimizing_cpp.pdf
http://www.agner.org/optimize/#manuals
8.2 Cache organization
It is useful to know how a cache is organized if you are making programs that have big data
structures with non-sequential access and you want to prevent cache contention. You may
skip this section if you are satisfied with more heuristic guidelines.
Most caches are organized into lines and sets. Let me explain this with an example. My
example is a cache of 8 kb size with a line size of 64 bytes. Each line covers 64 consecutive
bytes of memory. One kilobyte is 1024 bytes, so we can calculate that the number of lines is
8*1024/64 = 128. These lines are organized as 32 sets × 4 ways. This means that a
particular memory address cannot be loaded into an arbitrary cache line. Only one of the 32
sets can be used, but any of the 4 lines in the set can be used. We can calculate which set
of cache lines to use for a particular memory address by the formula: (set) = (memory
address) / (line size) % (number of sets). Here, / means integer division with truncation, and %
means modulo. For example, if we want to read from memory address a = 10000, then we
have (set) = (10000 / 64) % 32 = 28. This means that a must be read into one of the four
cache lines in set number 28. The calculation becomes easier if we use hexadecimal
numbers because all the numbers are powers of 2. Using hexadecimal numbers, we have a
= 0x2710 and (set) = (0x2710 / 0x40) % 0x20 = 0x1C. Reading or writing a variable from
address 0x2710 will cause the cache to load the entire 64 or 0x40 bytes from address
0x2700 to 0x273F into one of the four cache lines from set 0x1C. If the program afterwards
reads or writes to any other address in this range then the value is already in the cache so
we don't have to wait for another memory access.
Assume that a program reads from address 0x2710 and later reads from addresses
0x2F00, 0x3700, 0x3F00 and 0x4700. These addresses all belong to set number 0x1C.
There are only four cache lines in each set. If the cache always chooses the least recently
used cache line then the line that covered the address range from 0x2700 to 0x273F will be
evicted when we read from 0x4700. Reading again from address 0x2710 will cause a cache
miss. But if the program had read from different addresses with different set values then the
line containing the address range from 0x2700 to 0x273F would still be in the cache. The
problem only occurs because the addresses are spaced a multiple of 0x800 apart. I will call
this distance the critical stride. Variables whose distance in memory is a multiple of the
critical stride will contend for the same cache lines. The critical stride can be calculated as
(critical stride) = (number of sets) × (line size) = (total cache size) / (number of ways).
If a program contains many variables and objects that are scattered around in memory then
there is a risk that several variables happen to be spaced by a multiple of the critical stride
and cause contentions in the data cache. The same can happen in the code cache if there
are many functions scattered around in program memory. If several functions that are used
in the same part of the program happen to be spaced by a multiple of the critical stride then
this can cause contentions in the code cache. The subsequent sections describe various
ways to avoid these problems.
When you play a game and there is a lot of action.
Physics, AI, a lot of verecies drawing the picture and more
Much data and many functions are used to calculate the picture.
Data and functions are scattered in this scenario.
You need a cache that doesn't evict data or code and goes to memory