It is. And this has been known for about 30 years now. There is simply too many processing elements which have to be done, in order, to make games easy to thread in a manner that results in a performance benefit. The parts of the game that are parallel (rendering and advanced physics) are already offloaded to the GPU, leaving the CPU with little more to do except run the main game engine. Hence why most games start to see less scaling after the second core, and almost no scaling after 4: One thread handles the main game engine, one handles rasterization. Any remaining threads are SIGNIFICANTLY less workload, so they don't register big numbers in Task Manager. [In a 50 thread game, I'd expect 2-3 threads, at most, to account for 99% of the workload].
Secondly, going out of your way to balance core usage is silly and performance wasting, so to some extent, you are always going to see some degree of uneven core usage, even in tasks that do scale well.
Third, Task Manager has a maximum resolution of 1 second per sample. Which begs a theoretical question: If you have 1 thread that runs for 1 second total over 1 second, but jumps cores four times over that period, you get a Task Manager usage graph that could look like (assuming no other processes intensive threads are running):
Core0: ~40%
Core1: ~35%
Core2: ~20%
Core3: ~5%
Oh wait, I've described just about every game released in the past 5 years or so...Hence why I am always skeptical of Task Manager numbers, as resolution is a significant issue. [Same basic argument I have against FPS as a gaming benchmark]. Unless you know how many threads are actually run, which ones do significant amount of work, and what cores they are dispatched to, Task Manager usage graphs tell you NOTHING.
Tools like GPUView, which shows you this information, allows you to see, at least, how many process heavy threads are being run, how often they run, how much work they do, and what cores they get dispatched to. Very few people have used this in any real way for games yet, though if I can get the SDK to properly install, I might do some investigation in this area in the future...But just a few quick examples (Since noob won't let the SC2 argument die a quite death):
http://graphics.stanford.edu/~mdfisher/GPUView.html
http://graphics.stanford.edu/~mdfisher/Images/GPUViewLargeRenderTime.png
Note SC2 does the majority of its work on two threads (light workload threads aren't displayed by default). But notice how many times the thread jumps cores (different color = different core), which is to be expected to some extent (The OS can come and kick out a user thread anytime it wants to). And note the second thread doesn't begin until later in the processing cycle (probably the render thread, based on the fact the render entry is d3d9.dll). I would imagine Task Manager would looks pretty close to what I described above: Two threads loading three cores to some extent. Hence why I don't view Task Manager of a real good indication of threading within an application.
BTW, this is WOW, which is a LOT worse in this regard:
http://graphics.stanford.edu/~mdfisher/Images/GPUViewWoWAllThreads.png
Probably due to a CPU bottleneck, but still, one thread doing 99% of the work...[but again, without knowing the HW, kinda hard to know whats REALLY going on].
---------------------------------
My point is basically this: Our way of evaluating games (FPS, Task Manager graphs, etc) is hopelessly flawed, and I'm glad sites are starting to move in the right direction in regards to testing (looking at frame-by-frame latencies, GPUView usage statistics, etc).
And yes, I went overboard on this one. I know. (REALLY slow day at work today). I'll do some testing if I can get GPUView and the Windows SDK to install over the weekend; anyone have any games they want me to look at? (I'm probably going to do L4D2 as a test, to see how many threads are REALLY doing work. Task Manger would indicate four or more; I'm less convinced...)
If its so parallel threading happy, y u no offload on the GPU ? Will save thread synch headache for the programmer.
Three reasons:
1) The GPU is already burdened with Rendering (and in some cases, physics). Giving the GPU more work would slow the entire process via a GPU bottleneck.
2) GPU shaders are relatively weak compared to a CPU core, so for non-scaling tasks, performance is SIGNIFICANTLY weaker.
3) API's that give visibility to the GPU resources are not widely used: CUDA is NVIDIA specific (even if it IS a powerful API, it won't be used for commercial software is another path needs to be made for non-NVIDIA GPU's), OpenCL is not widely adopted, etc.