@randyshipp:
The Winner/Strong/Weak table provides weight, and a very basic indication of scale. We stopped relying on placing alone a long time ago. When the WBGP started it was very JavaScript heavy, so with equal weight given to each test the results can easily skew in favor of a browser which excels in a single category. We implemented the Winner table in order to inject a sense of weight. With the Winner table, each type of test is given equal weight, no matter how many tests there are per category. We later expanded the Winner table to include browsers that are weak and then browsers that are also strong (but not the winner) in order to inject a limited sense of scale.
The 5-for-1st, 4-for-2nd, etc. scoring system based on placing tables has the points too close together and doesn't reflect scale or weight in any way. We actually had people veto that very scoring system in the past. Without scale, lackluster performers can actually come ahead of where they should be. The best examples are tests in which 1st and 2nd are far ahead of 3rd, 4th and 5th. Under that point system, the third place finisher still achieves 3 points, even though the actual scale of victory puts it FAR behind. This potentially allows poor performers to coast into higher final scores. Without weight, we're back to allowing whoever can win all the tests in a single test-heavy category to gain an unfair advantage. All of this was discussed after WBGP1&2, and the general consensus was that the 5-1, 4-2, etc. point system is just too simplistic at best, and completely morphs the outcomes at worst. We had many other suggestions for a scoring system, but most were unfathomably complicated, based on completely subjective weighting standards, or too rigid (they simply wouldn't work out because we are always expanding/tweaking the test suite).
That said, I would love to have an accurate and holistic numerical scoring system and I'd be into looking at it again now that the WBGP has grown up a bit. Just remember that the WBGP is always evolving, so any system which hinges on the current test line-up remaining the same, will not work. And I don't just mean new tests, I mean new areas of testing beyond performance, efficiency, and conformance. When the WBGP started it was strictly for performance, we added efficiency and conformance later on, and there is more to look at still - clearly there is a demand for other areas to be included such as security, feature set, UI efficieny, etc. If anyone has a suggestion for somehow to take all those aspects and benchmarks (which use differing measurements and scoring scales) and formulate a mathematical way of achieving an aggregate numerical score which properly reflects scale and weight and is expandable for the future, you've got my undivided attention. If it can be made to work properly, it would be killer - email me at aovera AT bestofmedia.com if you have an idea.
But so far the best way to summarize all those 30+ benchmark results is to display an unmolested placing table and juxtapose it against the Winner/Strong/Weak chart. This way does, however, place the emphasis on 1st place victories and 'Winner' as the key deciding factor, which is where I think you and some others have the issue. But keep in mind that in the end we are declaring a winner of a race. Even in a photo-finish, whether the difference comes down to 32 centimeters, 32 milliseconds or 32 FPS, somebody crosses the finish line first, and this way still allows us to see that clearly.
PS - The hypothetical situation you describe does not apply here. To put this in perspective, in order to win WBGP3 IE9 came in first significantly more times than Chrome did to win WBGP1, or Opera to win WBGP2. With the updated benchmark suite which fleshes out areas such as CSS performance, HTML5 hardware accel, and WebGL, IE9's WBGP4 win is now on par with the proportion of Chrome and Opera's previous wins. Also, in this case, IE9 has the most wins and least losses in both the performance and total placing tables.