bemused_fredAMD took a gamble on computing moving away from single-threaded workloads, along with shifting floating point work over to GPUs. Obviously, computing hasn't moved away quickly enough. Sure, Bulldozer was very much flawed in terms of starving itself of resources by having a narrow front end (four instructions per core OR module, whereas the new decoupled decoder should theoretically allow for eight), and the caching system was a mess, but some of these issues have been or are being worked out. Steamroller features a larger L1 instruction cache along with the previously mentioned decoupled decoder, but the latter probably sucks down a bit of power which could explain the reduced clock speeds. As far as I can see, power management is on a module level, so individual cores can't be turned off to save power, which must cause the design to use more power than it would had it been a more traditional design, but only when an odd number of cores are in use I suppose.