Reynod :
Anyone like to comment about the L3 cache which is runnng a bit slow, and has latency issues ... getting choked as a consequence ... or maybe something else interesting?
Something the B3 spin is likely to address?
Anyone with a brain feel free to comment ... or is this one another thread gone down in flames?
In my humblest opinion, I think what we're seeing in Barcelona's L3 is a result of cache misses. Theoretically speaking, the smaller the cache, the higher chance cache misses will occur. K10's L3 cache is surprisingly small for it to be shared across four cores. With only 2Mb, each core gets roughly about 0.5Mb of L3. Although L3 can be dynamically adjusted, it is still too small. In Core's architecture, each die roughly gets about 0.5~6Mb of cache, and it is only shared across two cores. The work load of adjusting dynamically is a lot lighter than Barcelona's, where a cache needs to be dynamically adjusted to fit 4 cores.
In the event of cache misses, the core must wipe the affected area clean, and reload data from the main memory. This takes processing times. This is probably an explanation for Barcelona's latency.
Also, this is AMD's first attempt at shared cache. For Intel, the first generation of shared cache was already implemented back in Yonah. So Intel had a lot of experience compared to AMD. Therefore Intel's Core 2 does not suffer from excessive cache misses. Oppositely, not only this is AMD's first attempt, AMD also implemented it across four cores. This is probably one of the reasons for L3's lackluster performance.
I've been saying this for a while. From a "elegance" standpoint, Barcelona is nothing short of beauty. Compared to Nehelam's die shot, Barcelona is indeed more aesthetically pleasing. But from a design standpoint, Barcelona is nothing short of a disaster.
Now, will B3 fix this? No. The problems I described above are considered major design issues, and AMD cannot fix this with only a revision or two. The entire cache system may need to be redesigned to alleviate latency penalties. I've seen David Kanter from RWT talking about 8Mb of L3, but I've yet to seen it on AMD's roadmap. Increasing L3's size might alleviate the problem too.
As for the TLB errata, it has been rumored that it is more of a symptom, rather than a cause (which I'm more inclined to agree). In other words, this errata is a problem regarding manufacturing, rather than design. If it was a design issue, all K10 would be affected. But since only the higher clocked ones are affected (2.4Ghz +), it is more likely a physical errata. IMO, this is probably because Barcelona was manufactured on a process that's unfit for it. Just like Intel's Prescott, where the design simply overwhelmed the process technology at that time, AMD simply did not have the right process to manufacture such a big, and sophisticated die.
I'll scope out some links, and update this later.