oh, I do know how to automate a bench. It's just that most GUI softwares don't provide scripting, requiring supervisors to be automated, which adds overhead.
It is, for example, admitted that using stuff like FRAPS skews results somewhat.
On the other hand, running command-line lame.exe allows you to get:
- encoding time (at a thousandth of a second precision)
- used CPU time (because you don't use lame.exe as real-time very often)
- presets used
and all that is generated on-the-fly by lame, in text only (you can redirect it to a file - you know, pipes?) or merely displayed on-screen (no unneeded disk access).
Now, if you'd rather measure processor efficiency with a software that WILL introduce hard disk access overhead, graphics subsystem overhead, TCP/IP use overhead - because it decided to pop up a wizard or phone home at this instant - and generally use up system resources other than CPU time (which will impact CPU efficiency anyway), then go ahead.
You may have done 15 years of Unix programming; lucky you, you don't know how screwed up the NT kernel's thread scheduler and memory controller are.
The main advantage of those apps is that they produce REPRODUCIBLE results - while most of those apps, for the reasons I underlined before, just don't.
Make a test is reproducible is one of the points of a benchmark, especially in an environment as complex as a PC; using your 'generic apps' is like making different drivers test drive cars at different point in time, with their cell phone turned on and changing weather to decide which car is fastest.
On the other hand, using lame and vorbis and Xvid encoding means that it's always the same driver in a well controlled environment testing the cars.