Thus trusting any OEM submission is a fool's errand in my opinion. Independent submissions from sites like Anandtech, Toms Hardware and users are what is important to look at as it likely to give more realistic views of performance.
That's not how SPEC works, though.
"SPEC does not perform these measurements itself, rather it allows members and in some cases other licensees to submit results and measurements for review and publication on SPEC's website."
I didn't find a description of their rules governing how the tests are to be run, however. I would agree that it'd be better if you had to submit your system to an independent, 3rd party lab, for testing.
I think trusting sites like Toms is not without its own downsides, such as:
- They don't all run the same tests or use the same testing conditions.
- Aside from laptops, they test very few full systems, and most are consumer-oriented.
- They tend to run either opaque commercial benchmark suites or single-application benchmarks. SPECbench strikes a nice sweet spot in between, but I haven't seen anyone but Anandtech use it.
- These sites tend to rely on manufacturers for review samples, which could be cherry-picked golden samples and creates a form of dependence.
The thing about SPEC is that while it does rely on industry submissions, it's also industry-funded. If SPEC went away, I really don't see anything remotely comparable that would step in and fill the breach. The closest thing out there is Phoronix Test Suite/OpenBenchmarking.org, but it doesn't have well-defined, narrow suites like SPEC does, which means no composite scores like SPECint/SPECfp. Also, its submission rules are basically nonexistent and Michael can't test very much, by himself. Plus, he's dependent on vendor-supplied samples, just like most of the other sites.
So, if we want SPECbench, then we have to accept that the scores on spec.org are going to be tested under favorable conditions. I'm alright with that, especially knowing that they try to hold a firm line on certain things. I only look at those scores to get a rough idea of a system's performance, anyhow. Probably not at the granularity where the kinds of manipulations you're talking about could majorly swing them, but maybe.
What I care a lot more about is 3rd parties' ability to run their test suites. Again, unless it's one group running multiple tests, I don't put a ton of stock in exact score comparisons, but it's definitely useful to have a consistent benchmark with scores you
can compare between different reviewers.
If those independent sites used Intel specific compilers I would question why they would do such a thing. Using any OEM tool for a benchmark is likely to sway results of an independent cross system/platform test.
Why is it bad to use an Intel compiler on an Intel CPU or an AMD compiler on an AMD CPU? As long as those compilers don't have overly-targeted optimizations, intended specifically to rig such benchmarks, I think it's not necessarily a problem to use them.
You should consider that these types of benchmarking efforts largely grew out of the HPC industry, where using a vendor-supplied compiler is the norm. Heck, back in the 80's and 90's, even (non-x86) workstations and servers would tend to have their own OS and their own compilers. That's the era SPEC grew out of.
There are plenty of generic x86 compilers these days.
It would be interesting for them to define a benchmark suite that used a standard set of binaries, such as from RHEL or Debian upstream. Whether this better reflects customers' use cases really depends on the type of system and the customer. Since most of the submissions they get are servers and workstations, it's not unreasonable to expect many customers will either be compiling their own software or will be buying hardware from a software vendor's list of supported machines. In the latter case, that enables the software vendor to target a much newer ISA baseline than the 20-year-old x86-64 standard, for instance, or the ARMv8-A that's more than a decade old, and maybe even tune for a specific CPU target or use that CPU vendor's compiler.