There are a couple of things to consider here that haven't been mentioned yet.
Tthe slight speed-increase on single-thread applications, when moving from one to two cores, is easy to explain. On two cores, all the processes running in the background are handled by a different core, so you only measure the application itself.
The AVG-test is also interesting. Most times, this will be a process runnning silently in the _background_. Now ask yourself this: do you _want_ it to occupy all four cores, potentially slowing your entire system down? Or should it stick to a single core, and leave the other three free for whatever it is that you're doing in the foreground?
In some cases, multi-threading is a good choice. But in other cases, it seems to undermine the concept of multitasking, especially when the application fails to set a low priority for itself.
Some benchmarks, like archivers, will probably always scale badly, simply because compression algorythms are hard to split up without sacrificing compression rate. Ofcourse, if you compress multiple archives at the same time, they will scale just as well as any other application.