blazorthon :
FX-6100 or FX-8120. Disable one core per module to get three or four single core modules, three for the 6100 and four for the 8120. You get a roughly 25% boost in performance at the same frequency as you previously had with all modules sharing their resources between two cores instead of a single core per module. That puts them a little ahead of Phenom II in performance per Hz while making them capable of significantly higher frequencies while using about the same or even somewhat lower amounts of power.
It also gives them even higher overclocking headroom to the point where they can easily compete with non-K edition i5s and i7s in gaming performance and BLCK/Turbo overclocking performance, although not in power efficiency (still, it's better than before). It's not perfect, but it's quite something. Vishera seems like it has a significant enough performance increase to let it challenge the K edition CPUs in the same method of usage.
Also, for highly threaded performance, this is not necessary and would likely be detrimental to performance. Considering that the highly threaded performance is already very high on BD CPUs, they don't need much of a boost, although a scheduling fix would likely help and software could be optimized for BD as well, although that's a little less reasonable.
There's another performance enhancement that you can do. There is a little known piece of software called PSCheck, it's AMD's tool to control the p-stats and voltages of each core / module on a CPU. Basically K10stat for Bulldozer uArch CPUs.
Now a little bit of info first, everything starting with the K10 arch and forward has multiple multipliers / voltages for each CPU core. That number in the BIOS only represents the initial speed for everything upon system boot, its the worst place to do overclocking (best place for voltages / enable cores). Using special software you can manipulate the exact multipliers / voltages of each of the p-stats on a CPU and then force different cores into different p-states.
The above means you can overclock just two cores while under-clocking / under-volting all unused cores to save TDP. This gives you a higher headway then what you'd reach trying to simultaneously overclock everything at once. It also means you can get ridiculously better single ~ dual core performance increase's.
Things to note,
#1 Windows is schizophrenic about task scheduling. It'll keep moving a thread around the CPU to whatever core it sees as "unused", in practice this is what prevents turboboost and other dynamic overclocking from engaging. Windows is preventing the other cores from going idle and thus enabling the overhead for the overclock. You gotta be smarter then windows is, use the processor affinity flag to force your program to work on the cores your going to overclock, this will let the other cores go idle.
#2 Other power management programs will often try to "outsmart" you and mess up your dynamic overclocking efforts. Disable all additional power management programs, your taking direct control over your CPU's speed.
Results:
I've got my 3550MX (2.0 Ghz stock, 2.7 boost) to run at 3.0 Ghz on up to two cores while running 800Mhz on the other two. Also got it to run 2.2~2.5Ghz on all four simultaneously though it gets very hot doing so. This resulted in a very large increase in benchmark numbers beating out many desktop CPU's at single threaded tasks. Amazing for a 45w CPU without any L3.