Thanks for the comparison and putting things into perspective. While modern digital audio equipment sounds perhaps very similar, I would like to add some of my own experiences when selecting audio gear:
1. Some (many?) people won't be able to distringuish nor appreciate the fine nuances between different devices, which doesn't mean these differences do not exist.
2. Most of the test tracks you chose are rather problematic in revealing the "true sound". Why? Except perhaps (???) for the classical piece, I bet they are all mixed from multi-track recordings, where each track may have been recorded at a different time / day even. It's essentially a "synthetic sound" created by the sound engineer and mixer to fit the taste of himself and the audience. For all we know it may have been mixed using $20 tin cans as headphones - and thus tuned to their accoustic properties.
When I choose my audio gear, I always bring along some CDs where I know the accoustic properties of the recording venue, and the recorded material. I do listen to life music, and that is my reference.If you were, for example, located in Boston or Chicago, try out the local symphony orchestras and listen to life concerts. Then buy some good recording using the respective orchestra and venue, preferably a performance you have heard in concert. Naturally, if you are not into classical music, forget it.I definitely will listen to Jazz music, as I've listened to quite a few concerts and am quite familiar with the unamplified sound of various instruments. Vocal music will also be a part of the repertoire to better detect differences, including recordings of singers I've heard in life concert, singing the specific repertoire (both classical material without amplification, as well as amplified Jazz, Rock etc. material).
When testing amplifiers, I always have the "Faure - Requiem, Rundfunkchor Leibzig, Lucia Popp, soprano; Simon Estes, bass; Sir Colin Davis; on Philips" with me. Most consumer amplifiers won't be able to play this CD at critical listening volume.
3. The first test track you mention - Skyrim - offers "deep vocals and bass-heaviness, intermixed with high treble female vocals". Remember the "Loudness" button on many amps from the 80s and 90s (and perhaps later)? Enhanced bass and trebble makes almost any system sound better. In my experience, this track will favor the "it sounds all the same" camp, as it's void of the subtleties of individual voices or solo instruments (woods, string, or similar).
4. All the above observations are perhaps irrelevant to many/most listeners who don't share my musical preferences. My main point of critique - where I would totally divert from the test methodology - is the short listening to a single track, and then switch to different equipment. At a certain level of fidelity, the differences between equipment is not so much in what you hear, but what's missing. You need to listen a long time to make out what's missing. Often the difference between one piece of equipment and the other is that you get tired during the listening process, that the listening process with model A is exhausting while with model B it is not. You don't notice that by listening to 5 or 10 minute track. However, if you listen, for example, to a symphony from start to end (provided you like that genre), you will know if the experience was uplifting or tiresome.
5. As much as I like to listen to headphones, there is no comparison to speakers. Using speakers, a high fidelity audio system and suitable listening room is capable of reproducing stereo recordings in a 3-dimensional manner, where I can clearly identify each instrument from left to right and front to back WITHOUT EXHAUSTING MYSELF. There is no guess work involved. I will be able to identify the recording location (if it's one that I am familiar with) with its individual accoustic properties, as well as the nuances of each individual instrument. I do hope you will be able to repeat this test (perhaps incorporating some suggestions) with speakers.
Last not least, I do think this test is quite valid for the music material you chose, except that perhaps with longer listening sessions for each test the results may have changed (higher hit versus miss ratio, or at least higher consistency). It might be interesting to test this theory.