My counter arguments to this are:
- This example has too small of a sample size to be useful. I'm nit picking here sure, but if the upwards spike was intermittent, then it doesn't matter over the long run.
- Consider this, the average benchmark tends to be 60 seconds. If the performance average is 100 FPS, that's a sample size of 6000 frames. Even if we had a case where one second was 200 FPS, the overall FPS would only increase by 1.666...
- Unless there's a blip of looking at an empty skybox, most games won't exhibit a behavior of suddenly shooting up in FPS. Also I can't imagine a scenario where one CPU would suddenly have a blip and another wouldn't.
- Practically all benchmarks report an average, which is the number most people will use because it's right there. If you have a problem with that, then go tell benchmark developers to stop doing this.
However, I will say that the data set would be better if they added a frame time graph.
Textures don't reside in CPU cache. Also calibrating to some arbitrary FPS and seeing the quality settings you can get is not really a useful metric when benchmarking the processor. The goal is to see how much performance you can get out of the processor period, not a combination of performance and image quality.
As an example, if I'm getting 100 FPS, I've identified it's my CPU limiting performance, and I want to know which CPU gets me say 240 FPS on a game (because I happen to own a 240 Hz monitor), if everything is "calibrated" to 144, then how do I know which CPU to get?
They're using a geometric mean for the specific purpose of lessening the effect of those outliers. From
https://sciencing.com/differences-arithmetic-geometric-mean-6009565.html: