After looking around, I'm starting to think it's the p4 2.8ghz's hyper-threading that makes its score so high in SSE.
This is very doubtful as even one SSE instruction takes up the entire FP calculation resource on the P4 and Athlon for a single clock. I don't see how allowing multiple threads to run would possibly speed this up because the instruction to operation ratio is 1:4 (i.e. for one instruction fetched and decoded, 4 operations need to be carried out). SMT is ultimately to circumvent the bottlenecks for decoding instructions and other things such as data dependencies and memory latency. With such a streaming nature, I doubt SIMD benefits much, if any, from SMT.
"We are Microsoft, resistance is futile." - Bill Gates, 2015.