To those trying to espouse some sort of inverse hyper-threading ... its simply not possible with the way today's software is encoded in binary. Instructions are sent to a CPU by the OS, those instructions then reside inside the CPU's cache until the CPU can get around to executing them. Two CPU's can't pull from the same pool of instructions because each and every thread in a CISC system assumes its the only one executing on the CPU. This was implemented in HW with the 80386 and is known as v86 mode. Binary instructions are nothing but math operations, compares and data movement done on registers inside the CPU. This context is unique to each thread and multiple threads can not share contexts. Every time a thread is switched in / out of the CPU its context must also be switched with it, this is extra data I/O that needs to take place.
Now lets look at a multi core CPU, something with 12 cores for example. That is 12 separate set of registers meaning a system can maintain 12 different sets of simultaneous contexts without having to swap I/O. Take this further and you get Intel's hyper-threading which is nothing more then assigning each core two separate sets of registers / stacks and have the execution engines shared between those registers. If you check at any point in time you have a few hundred separate process's running with each process having a few threads usually. That is a thousand contexts that must be tracked and swapped in / out for execution. Its how modern day multi-tasking happens, your CPU is actually processing hundreds of threads every second while constantly swapping in and out the contexts for each thread which reduces efficiency. So having more cores is never a ~bad~ thing even if no single core is running full steam its still dividing up the task switching work load.
Of course none of this seems apparent when your doing single measured benchmarks because you have a small handful of threads occupying all the CPU's time. But in real world scenarios where the user has 12~16 tabs open (each tab is a thread), email client, network sharing, virus scanning, malware scanning, and a game running in foreground, those extra cores definitely get a work out. I'd love to start seeing benchmarks with two, four and eight games running at the same time. Just to see how different architectures handle such complicated workloads.
Of course the absolute king of multi-tasking orientated CPU's is the Sun SPARC T2 or T3. A single T2 has eight cores with each core having two integer execution engines, one floating point execution engine, one memory management unit and eight sets of registers. Each CPU can maintain the context of 64 different threads while executing 16 integer and 8 floating point operations at once. The 5440 has four of these CPU's inside of it along with 256 GB of memory. And the T3 their working on promises to bring this to a whole new level. Of course the CPU only runs at 1.6~2.0 ghz.