Of course, adding new levels doesn’t explain why a Vista system or component that used to score 4.0 or higher is now obtaining a score of 2.9. In most cases, large score drops will be due to the addition of some new disk tests in Windows 7 as that is where we’ve seen both interesting real world learning and substantial changes in the hardware landscape.
With respect to disk scores, as discussed in our recent post on Windows Performance, we’ve been developing a comprehensive performance feedback loop for quite some time. With that loop, we’ve been able to capture thousands of detailed traces covering periods of time where the computer’s current user indicated an application, or Windows, was experiencing severe responsiveness problems. In analyzing these traces we saw a connection to disk I/O and we often found typical 4KB disk reads to take longer than expected, much, much longer in fact (10x to 30x). Instead of taking 10s of milliseconds to complete, we’d often find sequences where individual disk reads took many hundreds of milliseconds to finish. When sequences of these accumulate, higher level application responsiveness can suffer dramatically.
With the problem recognized, we synthesized many of the I/O sequences and undertook a large study on many, many disk drives, including solid state drives. While we did find a good number of drives to be excellent, we unfortunately also found many to have significant challenges under this type of load, which based on telemetry is rather common. In particular, we found the first generation of solid state drives to be broadly challenged when confronted with these commonly seen client I/O sequences.
An example problematic sequence consists of a series of sequential and random I/Os intermixed with one or more flushes. During these sequences, many of the random writes complete in unrealistically short periods of time (say 500 microseconds). Very short I/O completion times indicate caching; the actual work of moving the bits to spinning media, or to flash cells, is postponed. After a period of returning success very quickly, a backlog of deferred work is built up. What happens next is different from drive to drive. Some drives continue to consistently respond to reads as expected, no matter the earlier issued and postponed writes/flushes, which yields good performance and no perceived problems for the person using the PC. Some drives, however, reads are often held off for very lengthy periods as the drives apparently attempt to clear their backlog of work and this results in a perceived “blocking” state or almost a “locked system”. To validate this, on some systems, we replaced poor performing disks with known good disks and observed dramatically improved performance. In a few cases, updating the drive’s firmware was sufficient to very noticeably improve responsiveness.
To reflect this real world learning, in the Windows 7 Beta code, we have capped scores for drives which appear to exhibit the problematic behavior (during the scoring) and are using our feedback system to send back information to us to further evaluate these results. Scores of 1.9, 2.0, 2.9 and 3.0 for the system disk are possible because of our current capping rules