Warning: Long post. This is a culmination of several pieces from things I've previously written so I hope it flows properly.
Anyhow, The FX at 2.8GHz, dual channel DDR2"-800 and 4MB L3 cache will give Intel's Conroe a pain in the @ss.
What you are doing is actually assuming L3 cache will actually bring a sizable performance increase. You are correct that AM2 processors are far from bandwidth-starved. The advantage of the OMC implementation has always been the low latencies which keep the memory a lot closer to the processor. However, in order for the L3 cache to actually help offload the L2 cache, the latency needs to be very low, as comparable to the L2 cache's latency as possible. Z-RAM is already known to be unable to maintain L2 type latencies. With the latencies of the L3 cache, you really aren’t going to see much benefit from it since going to RAM won’t take much longer. If AMD processors don’t see much benefit going from 512k to 1MB of L2 cache, L3 cache is even less likely to make a difference.
Even discounting the OMC, the lack of benefit to AMD using large caches is evident by analysing their cache architecture.
Many people assume that increasing the caches automatically means greater performance. In fact, this is only true in some circumstances. AMD’s architecture especially doesn’t see large performance increases with larger L2 and L3 cache. AMD uses an exclusive cache architecture which puts a lot of focus on the size of the L1 cache. The size of the other caches are less important and increasing them don’t result in much performance benefit.
The makers of CPU-Z have a great section on cache architectures in their K8 analysis.
http://www.cpuid.com/reviews/K8/index.php
AMD made the choice of an exclusive relationship for the first time on the Thunderbird. The CPU architecture fits on this choice, with a big L1 cache and a 8-entries victim buffer.
This choice allowed AMD to build CPUs with a L2 cache size from 64 to 512KB with the same core, and even the Duron that has a 64KB L2 cache provides very good performance. In another hand, the increase of the L2 size does not provide a big jump in performance.
Intel on the other hand, uses an inclusive cache architecture which puts the emphasis on L2 cache size. The difference between inclusive and exclusive caches is why Intel always has a smaller L1 cache than AMD, simply because it isn’t as important for them. However, increases in L2 cache size creates more noticeable performance benefits on Intel’s inclusive cache.
The summary is here:
The exclusive relationship is the most flexible, as it allows lot of different configurations in keeping a good performance index. The drawback is that the performance does not increase very much with the L2 size. The inclusive relationship can only be chosen for performance purpose, knowing for example that increasing the L2 will create a performance boost.
I am not going to judge which design is better, because they are both valid solutions to the same problem each with their own advantages and disadvantages. However, this is also why I take offence to people blindly criticizing Intel using large caches. Of course they use them, because they are designed for them and benefit from them. You can hardly make it sound disgraceful for Intel to use large L2 caches, when that is the advantage of the inclusive cache design. The ability for large caches to relieve the FSB is just a double benefit. Even if Intel used an OMC, they would still have large caches because they work for them.
The flaw in large caches is of course the increased transistor count. However, Intel has always been able to leverage their manufacturing strengths by making rapid process transitions. If the large caches made Intel processors phenomenally more expensive than their AMD counterparts then obviously Intel has a major problem with the inclusive approach. However, Intel is able to keep costs down. Even now, Intel is planning major price cuts, up to 50% on its dual core processors making the cost of its approach a non-issue.
http://www.digitimes.com/mobos/a20060213PR218.html
For interest, the two disadvantages of the inclusive cache architecture that CPUID mentions are the need to maintain the right ratio between the L1 and L2 cache sizes and the smaller total cache size due to the L1 being duplicated in the L2 cache. The first disadvantage was the reason why the Northwood Celerons performed so poorly, because they were very constrained by their tiny L2 cache. Given the performance of the Pentium M and the large size of the L2 cache in current processors, Intel has found the correct ratio between the L1 and L2 cache size. (The ratio was mainly the bigger the L2 cache the better anyways). Even the 256k L2 cache on the Prescott Celeron Ds were a huge improvement. The second concern about reduction in total cache size due to duplication is no longer a concern, because the L2 cache size has grown to such a size that the kB that is wasted by duplicating the L1 cache is irrelevent. In any case, both disadvantages of inclusive cache that CPUID mentions are now moot.
As well, the article from The Inquirer doesn’t give any time frame for the addition of L3 cache. The FX-62 won’t have it. I seriously doubt that AMD will be able to integrate 4MB of additional cache economically using the 90nm process, so it’ll have to wait for 65nm. This would put introduction at the end of this year at the earliest or possibly the beginning of 2007.