viridiancrystal :
3) The need to, the point where the current amount of used threads is not simply not good enough. (Apparently not an issue)
Incorrect.
First and foremost, the silly assumption of more threads = more performance, which is probably false. The more threads you introduce, the more arbitarilly you have the break up your computations, and because you CAN NOT GUARANTEE WHEN THE OS IS GOING TO SCHEDULE A THREAD, the more you break up your computations, the longer the computation is going to take, by virtue of threads not getting run in a timely manner.
Secondly, games typically use 40+ threads easy. Heck, most game launchers are typically in the teens. Problem is, as noted above, its not efficient to make every thread a high workload thread, so you see uneven performance when you look at work done per CPU core in Task Manager.
you keep saying its impossible from a programming point. here you are stating the truth, no one wants to bother spending the money. Its not that it simply CAN'T BE DONE, its simply "not cost effective"
No, what I said was that its not cost effective to constantly re-develop game engines on a game-by-game basis to sqeeze every ounce of power possible out of them. That is independent of the threading issue.
When Intel quits making dual core cpus, they will offer vendors some dough to rework the software. Thats one way they keep their edge, market software to their hardware. TSX for example won't make it own its own, Intel will have to push it with money, make their cpus look superior so they can sell more. Same with AMD's OpenCL, needs to feend vendors some money for them to use it.
Please, stop it. I've explained, as clearly as I can, using REAL WORLD EXAMPLES [MIT in the 80's], why threading beyond a certain point leads to negative results. And every software engineer knows this.
Understand also what OpenCL does: It is mearly a framework that allows co-development using both the CPU and GPU, rather then just the CPU. On its own, it brings no inherent threading benefit. When you look at the tasks that made early use of OpenCL, you see programs that by their nature, do scale well (encoding, for instance, can be broken up into small chunks). CUDA has similar issues; you need to feed it very large datasets to see any real performance out of it (which makes sense, given the GPU architecture). Without a large enough dataset, performance on CUDA is horrid. But hundreds of GPU cores, given enough work, will outperform four CPU cores, no matter how fast they are.
Now, if you care to argue, go find someone who's written software for both commercial and integrated system, has designed game engines, and has developed internals for a few proprietary OS's. I doubt anyone with that criteria is going to seriously disagree with me on this topic. Based on the current way computers themselves are designed, you are NOT going to scale well in 90%+ of all tasks. And the stuff that does scale well tends to be due to very large datasets, rather then by virtue of design.