You forgot one important thing here.
The way developers utilize CPU's and how AMD designed theirs, could change in the next few years, meaning newer programs make better use of the FX Modules and thus bring in better performance numbers in AMD's favor.
You seem to write out software from the future, like saying this:
Speaking as a developer:
We don't code to cores. We create threads, and let the Windows scheduler figure out the rest.
And I AGAIN note that if AMD simply labeled the second core of a BD Module a logical core, it would behave (as far as scheduling goes) EXACTLY like HTT, fixing AMD's performance problem.
Who cares if a cpu rocking 8 cores or 20 cores but if most games/apps use 2-4 cores then its pointless.
The problem with that statement, is in 2 years, that will most definitely change.
Not really. Games already use dozens of threads (40+ on average in a few sample games I bothered to check), but the main problem is, the heavy lifting is limited to a few key threads (specifically, the main Rendering, AI, and Physics threads). Theres also a LOT of interplay that goes on, which forces a lot of the code to happen in a serial manner.
If your code is serial, you aren't going to scale. Period.
Simple example: Look how AI has to work under the hood. Some event happens, the AI then reacts.
For instance, the AI "sees" a target, then chases after it.
What that means is, you need the results of the rendering engine in order to determine if the AI has line of sight. You need at least a landscape geometry and depth buffer in order to do this. So you can't process this event until AFTER the geometry has been made.
Another example: the AI "hears" a gunshot, and goes to investigate.
Same problem. You have to establish whether or not the AI can "hear" the sound effect. So you need to establish distance. Next, assuming 3D audio effects, you need to establish whether the sound reaches the AI with enough power to be distinguishable. So again, you can't process audio until AFTER the scenescape has been created. So you have a situation where the audio engine needs data from the rendering engine, then the AI needs that data to process. Notice theres an order of things that has to occur?
Now, lets throw in the physics engine. See how messy things are getting in regards to program flow?
Now, all these threads are going on at the same time. So you have a lot of pre-empting as events start to happen. You fire a gun, now the Audio engine needs to calculate how the sound travels, and the AI engine needs to figure out what effects that has on the active AI objects. Etc. This involves a LOT of inter-thread interaction, which implies a lot of synchronization. This kills performance, and limits performance gains.
At the end of the day, your speedup is limited to how much of the program can be made parallel. If 60% of the program is serial, 60% of the program will NEVER benefit from more cores, and your max speedup is limited to 40%, no matter how many cores you put into a system.