[citation][nom]dyc4ha[/nom]I will admit that I am a noob for saying this, but I still cant see why they are sticking with their psuedo-octocore architecture... I wished they just improved on the PhenomIIs X6 and have real hexcore cpus. That might close the gap with Intel[/citation]
I've already explained several times some of the reasons for that and so have at least one or two other members in this article's comments section.
Phenom II's micro-architecture, 10h (Stars), was outdated (it was an adaptation of the Athlon 64 CPUs from 2003). It didn't have support for many modern instructions and in tasks that use at least some of them, it could lag behind greatly (AVX and some other floating point instructions are a great example of this where Phenom II might be up to four times slower than it would be if it supported them, maybe even more in some cases). The modular architecture is superior in many ways, but AMD has made several ridiculous mistakes along the way. Bulldozer, although arguably more of a proof-of-concept than a fail, was an incredibly poor implementation of almost every reasonably possible aspect of the CPU. I could go on and on about what was wrong with it, but the modular concept was not one of the problems.
Hardware/configuration-wise, Bulldozer's greatest issues are its design methods (auto-design tools simply aren't as good as transistor-by-transistor designs from expert engineers), huge cache latency (especially on the L3, it might be so high that the L3 cache doesn't actually help performance), insuficient x86 decoding functionality per module, desktop models being configured as server-oriented CPUs, Windows not being optimized to deal with that poor configuration properly, and many other flaws in the designs such as soft-ended flip-flops, crap branch prediction, and much more. I find it a little impressive that Bulldozer did as well as it did despite the huge problems that it has.
For example, if you simply disable one core per module to alleviate the x86 decoder deficiency (there isn't enough for two cores, but there is enough for one core) would increase performance of the remaining cores significantly while dropping power consumption significantly, a huge bonus to lightly threaded power efficiency. Doing this with the FX-81xx models turns them into a quad module, quad core CPU instead of a quad module, eight core CPU and makes it more consumer/desktop-oriented because it puts a greater focus on lightly threaded performance and lightly threaded power efficiency than on highly threaded. Doing this to the FX-81xx CPUs gives you a quad-core CPU that at stock is a little faster than a Phenom II x4 CPU of the same clock frequency while being far more power-efficient and having much more overclocking headroom.
Instead of outright disabling you can cut-down the P States of each core and prioritize them properly with software such as PS Check and K10 Stat so that they have an intermediate of highly threaded and lightly threaded performance focus, kinda like the Phenom II x6s compared to the FX-41xx CPU and the FX-81xx CPUs. That'd been a good successor to Phenom II x6.