Also, I still want you to answer my question: If parallelism is so difficult, why did we ever leave single core processors? [/quote]
Because with the Pentium 4, it was discovered you can't increase clocks much above 4GHz without running into server power/heat constraints. Thus, moving to lower speed, but multiple cores, made more sense.
And for most tasks, you can usually get loading on two cores without issue; one thread to handle all the work, and one thread to manage everything else. The problem is getting more scaling beyond that, which is very hard to achieve.
Understand that programs typically have dozens of threads that run. Of those, only a handful will be parallel in nature, but of those, maybe one does any reasonable amount of work. So you see Task Manager core loading that looks something like 60%-40%-5%-1% on a quad, or 60%-50% on a dual. And again, that assumes a single task doing a lot of work; run two or more apps that do significant amounts of work, and the benefits of quads becomes much more clear.
mr_clean00 :
Explain battlefield 3.
explain Metro 2033
explain Civ V.
those are all very much multi-core friendly. All you ever do is say "oh, they run on a dual core system, so they are only a dual core game and it will NEVER happen."
you want to cry foul, fine. explain how these games are multi-core friendly today when it will NEVER happen.
http://www.sweclockers.com/image/diagram/2506?k=142b45af179c625ccd8f53fea7385155
this is obviously fake since no game uses more than 2 cores, because the dual core system is unplayable. Single player benchmarks has shown that its perfect in every way.
If you want me to stop it, explain how games will NEVER be multi-core friendly when they are that way TODAY.
Kinda funny to me because it seems your on a crusade to keep programming simple, going against the way that hardware is headed. From tsx to lock-free to parallel. nothing is acceptable because it would require re-learning what you alredy know.
Fight the future, fight innovation, fight advancements in programming, Fight to protect the dual core systems.
BF3 multi shows significant benefits to using a quad, true, especially in 64-player servers. But that makes SENSE, given the extra networking code and having to manage the players in the game. Multiplayer for that reason is ALWAYS going to be more effective with more cores, simply because you are doing more processing.
Your little graph is also a little bit odd in its results: A significant gain via HTT? But at the same time, the 1100T and FX8150 lag behind to 2500k? Doesn't that kinda disprove your argument? Looking at those numbers, couldn't one conclude that HTT is better then adding more processing cores? At a quick glance, it appears clock speed/IPC is far more important to performance for that benchmark (explaining why the 2600k beats the 2500k, and why BD and the 1100T lag behind). And I note, no pictures of how the cores are actually loaded up...
-Fran- :
There's plenty of programming that can go into hard threads.
Just by adding complex physics you render almost any multi-CPU/GPU setups useless. Path calculation is another good example. Hell, I remember a friend of mine, that was calculating how a simple fracture can be calculated. To actually calculate it and don't die waiting you need a really nice amount of FPU units.
Now... You just need a good engine that can spawn hard threads, let you synchro with some level of ease and actually use them. Soft threads are too easy to use, but they have horrible diminishing returns and memory constraints.
Anyway, there is a wall and we all know it, but I truly believe we're still very far from hitting it
Cheers!
No developer, outside of consoles, is going to use hard threads. Why? Because PC's are non-deterministic. At any point in time, I do not know how much of the system resources are in use, or the performance hit if it try to use them. Thats why loading threads to cores is left to the OS scheduler.
No, physics and pathfinding are two good examples of loads that DO scale nicely. In face, they scale so well, the results are generally calculated faster via a GPU-like architecture with hundreds of slower cores, rather then a few fast ones.