Improved branch prediction, however, doesn’t seem to deliver the performance gains we would’ve liked to see. Instruction fetch is the next area to consider — though we can’t benchmark it directly. Agner’s tests, however, may shed some light on the problem. According to his work, the fetch units on Bulldozer, Piledriver, and Steamroller, despite being theoretically capable of handling up to 32 bytes (16 bytes per core) tops out in real-world tests at 21 bytes per clock. This implies that doubling the decode units couldn’t help much — not if the problem is farther up the line. Steamroller does implement some features, like a very small loop buffer, that help take pressure off the decode stages by storing very small previously decoded loops (up to 40 micro-instructions), but the fact that doubling up on decoder stages only modestly improved overall performance implies that significant bottlenecks still exist.
At the same time, however, it’s also clear that dual decoders wasn’t the fix that many AMD enthusiasts were hoping it would be. L1 cache contention remains problematic, as does the low set associativity. Integer throughput is poor partly because only two of Steamroller’s four integer pipelines are practically useful for most work. The long pipeline ensures that branch prediction misses will always hit the chip hard. The chip’s L2 latency remains much higher than its Intel counterpart and its memory controller is much slower.
The question of whether next year’s Carrizo can “fix” the Bulldozer architecture depends entirely on which design attributes are holding the core back. The only thing we know for certain about the core at this point is that Excavator includes support for AVX2. If Steamroller’s low performance is primarily caused by the shared fetch unit, than decoupling that system and adding 256-bit registers for AVX2 could significantly improve the core’s integer performance. If, on the other hand, the chip’s low performance is directly related to its long pipeline and high cache contention in the L1, it’s going to be much harder to solve.
EDIT: When you have disparity of orders of magnitude performance difference, latency becomes less an issue. As, the sheer computing power overcomes the minute latency differences.[/quotemsg]
EDIT: When you have disparity of orders of magnitude performance difference, latency becomes less an issue. As, the sheer computing power overcomes the minute latency differences.
In each case, the use of Mantle did not increase the best playable settings for each card within the multiplayer environment. The more interesting thing to note is that the DirectX 11 performance of the Catalyst 14.1 and 14.2 Beta drivers appears to be lower than the performance offered by the game's launch drivers from last year. When we look at today's Mantle performance in comparison to our launch day performance, there is little to no benefit from a gameplay experience perspective to enable Mantle at this time.
the cache redesign/improving - is it really that expensive? will it get in the way of hsa performance?
In each case, the use of Mantle did not increase the best playable settings for each card within the multiplayer environment. The more interesting thing to note is that the DirectX 11 performance of the Catalyst 14.1 and 14.2 Beta drivers appears to be lower than the performance offered by the game's launch drivers from last year. When we look at today's Mantle performance in comparison to our launch day performance, there is little to no benefit from a gameplay experience perspective to enable Mantle at this time.
the cache redesign/improving - is it really that expensive? will it get in the way of hsa performance?
Some of those people work at Amd those same people said that is the main reason a lot of people were canned including the CEO, Amd will have a competitive CPU when they ditch the module design until then we can expect I3 performance from a quad core Amd for quite some time with more power draw. I do not expect anything more, and like my last comment i hope they're getting rid of this design, most people stating otherwise probably isn't even running Amd on their main rig.
Some of those people work at Amd those same people said that is the main reason a lot of people were canned including the CEO, Amd will have a competitive CPU when they ditch the module design until then we can expect I3 performance from a quad core Amd for quite some time with more power draw. I do not expect anything more, and like my last comment i hope they're getting rid of this design, most people stating otherwise probably isn't even running Amd on their main rig.