• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

News DirectStorage Testing Shows PCIe 3 Drives Are Basically as Fast as PCIe 5

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
A bit of background because the conclusion reached is not actually accurate in context and is too generalized though the overall point is somewhat valid.

The reason the results are so similar is not because there is no difference but due to a bottleneck. The game uses CPU decompression and not GDeflate's GPU decompression which is faster. If the CPU isn't changing and your I/O performance is already good enough your results will not get much better than the CPU which is your baseline in this case. This is why a high end NVME will get the same results with directstorage on or off in this game with a powerful CPU being the limiter. Thus it cannot saturate PCIe4 or 5 enough to create a meaningful difference. As you said, in effect, with current hardware availability most people will see "unnoticeable results" due to differences being too close to really matter.

If you want to compare try doing the avocado benchmark between different drives. People were getting much better results on higher end equipment in that bench reported. However, without a proper study one can't say if the difference is goign to be enough to jump from PCIe4 to 5, because again you would still eventually be bottlenecked by GPU performance of decompression at some point.
 
Decent improvement just from a new API. Right now, that's what DirectStorage is without GPU decompression. When real software starts using GPU GDeflate, the story will probably change. A game designed from the ground-up to use GPU GDeflate will be something.
 
I'm thinking "getting to where the game is running" might not have as much effect as in-game. Also, there's probably a lot of factors as to why it might have a limited effect.

The bottleneck has been I/O. With deflate, you have to get the data streamed from the storage device and via the CPU, plopped into RAM. You probably then have to transfer parts into GPU RAM. It's like decompressing a file from a USB thumb stick on different PCs. How much does a faster processor affect the decompression rate when there's an I/O limit?

During the initial loading screen, a lot of the CPU and GPU power is available. Streaming assets in-game might have different factors, like shipping data directly to a GPU. So, texture pop-in can be reduced with gdeflate. The GPU is waiting for data to arrive anyways.
 
I've also seen comparisons with SATA SSDs. I think if you're just using your PC primarily for gaming, you can use an NVME SSD for boot/windows and just get bigger storage SATA SSDs and really not be phased too much by load times.
 
It's 3 seconds for that game, games 5 years from not might be 30 seconds+. Future proof your build if the price is justified.
That's not the way it works. Wait time is the inverse of transfer speed. So the faster transfer speed becomes, the smaller the improvements in wait time. e.g. Consider time to transfer 1 GB of data

125 MB/s HDD = 8 sec
250 MB/s SATA 2 = 4 sec (+125 MB/s gives a 4 sec reduction)
500 MB/s SATA 3 = 2 sec (+250 MB/s = 2 sec reduction)
1 GB/s 1st gen PCIe SSD = 1 sec (+500 MB/s = 1 sec reduction)
2 GB/s NVMe SSD = 0.5 sec (+1 GB/s = 0.5 sec reduction)
4 GB/s = 0.25 sec (+2 GB/s = 0.25 sec reduction)
8 GB/s = 0.125 sec (+4 GB/s = 0.125 sec reduction)

See how every time transfer speed doubles, the resulting reduction in wait time is half the previous step?

For 99% of use cases, SATA 3 is already indistinguishable from the newest NVMe SSDs. Yeah the newer SSDs are faster, but they're all so fast that the speed difference is between a 0.1 sec wait time and 0.01 sec wait time in the most favorable cases (sequential read/writes); and practically no difference in least favorable cases (random small file read/write). The only time it's really worth spending extra money on a faster-than-SATA SSD is if you regularly work with large files. e.g. real-time video editing.

The manufacturers, stores, and reviewers just don't like pointing this out because it's bad for business to say "yeah the cheap stuff is good enough". They need enthusiast spending the extra money on the newer, more expensive stuff to drive their respective businesses. Both Toms Hardware and AnandTech did a review highlighting this way back in the early 2010s, and reached that same conclusion. After you make the jump from HDD to SSD, there isn't really much gained by going to faster SSDs. Linus Tech Tips did a blind test where they set up 3 identical computers with different SSDs and asked people to pick out the fastest one. Half of them picked the slowest one (the one running off a SATA SSD).

The only way for the 3 sec difference in load time to balloon out to 30 sec in 5 years, is if the amount of data the game loads increases 10-fold in 5 years. e.g. 50 GB installation of Elden Ring becomes 500 GB for a hot new game in 2028. And not because there are more zones to explore, but because 10x more data is needed for each zone load.

The more likely way it'll play out is that a 3 sec diff between 3rd ant 5th gen PCIe, will become a 0.75 sec diff between 5th and 7th gen PCIe. We're already so far down the diminishing returns curve that none of this stuff really matters unless you're doing some super-specialized task which can leverage the enhanced speeds.

Incidentally, the same problem affects MPG with respect to fuel consumption. MPG is the inverse of fuel consumption, so it's the small improvements at the low MPG ranges (trucks and SUVs) which result in the most fuel savings. The improvements at the high MPG ranges (econoboxes and hybrids, and for MPGe EVs), despite the very big jump in MPG numbers, saves very little fuel/energy.
 
That's not the way it works. Wait time is the inverse of transfer speed. So the faster transfer speed becomes, the smaller the improvements in wait time. e.g. Consider time to transfer 1 GB of data

125 MB/s HDD = 8 sec
250 MB/s SATA 2 = 4 sec (+125 MB/s gives a 4 sec reduction)
500 MB/s SATA 3 = 2 sec (+250 MB/s = 2 sec reduction)
1 GB/s 1st gen PCIe SSD = 1 sec (+500 MB/s = 1 sec reduction)
2 GB/s NVMe SSD = 0.5 sec (+1 GB/s = 0.5 sec reduction)
4 GB/s = 0.25 sec (+2 GB/s = 0.25 sec reduction)
8 GB/s = 0.125 sec (+4 GB/s = 0.125 sec reduction)

See how every time transfer speed doubles, the resulting reduction in wait time is half the previous step?

For 99% of use cases, SATA 3 is already indistinguishable from the newest NVMe SSDs. Yeah the newer SSDs are faster, but they're all so fast that the speed difference is between a 0.1 sec wait time and 0.01 sec wait time in the most favorable cases (sequential read/writes); and practically no difference in least favorable cases (random small file read/write). The only time it's really worth spending extra money on a faster-than-SATA SSD is if you regularly work with large files. e.g. real-time video editing.

The manufacturers, stores, and reviewers just don't like pointing this out because it's bad for business to say "yeah the cheap stuff is good enough". They need enthusiast spending the extra money on the newer, more expensive stuff to drive their respective businesses. Both Toms Hardware and AnandTech did a review highlighting this way back in the early 2010s, and reached that same conclusion. After you make the jump from HDD to SSD, there isn't really much gained by going to faster SSDs. Linus Tech Tips did a blind test where they set up 3 identical computers with different SSDs and asked people to pick out the fastest one. Half of them picked the slowest one (the one running off a SATA SSD).

The only way for the 3 sec difference in load time to balloon out to 30 sec in 5 years, is if the amount of data the game loads increases 10-fold in 5 years. e.g. 50 GB installation of Elden Ring becomes 500 GB for a hot new game in 2028. And not because there are more zones to explore, but because 10x more data is needed for each zone load.

The more likely way it'll play out is that a 3 sec diff between 3rd ant 5th gen PCIe, will become a 0.75 sec diff between 5th and 7th gen PCIe. We're already so far down the diminishing returns curve that none of this stuff really matters unless you're doing some super-specialized task which can leverage the enhanced speeds.

Incidentally, the same problem affects MPG with respect to fuel consumption. MPG is the inverse of fuel consumption, so it's the small improvements at the low MPG ranges (trucks and SUVs) which result in the most fuel savings. The improvements at the high MPG ranges (econoboxes and hybrids, and for MPGe EVs), despite the very big jump in MPG numbers, saves very little fuel/energy.

Um...not the way it works? You literally gave a example of exactly what I said. Larger files = more data to load, more data means longer load times. If a drive is 35% faster then that is 35% less load time regardless of size. When broken down to a time savings, the larger the data loaded the more time it's saving overall..
Also with texture files game assets can be massive. A 10x increase in game size in 5yrs while extreme is def possible.
 
What Mark Cerny was talking about with the PS5 is the "ultimate" goal of DirectStorage and GPU decompression, you can cull textures like you cull polygons because streaming them back in when the camera moves will be fast enough to do that. Then most of your texture budget can be used just for polygons in view.
 
  • Like
Reactions: bit_user
In 5 years time we'll be using PCIe 7
I doubt that. PCIe 4 provided minimal benefit, when it rolled out. PCIe 5 launched more than a year ahead of any devices supporting it. If anything, consumer platforms have been adopting new PCIe standards too fast. And doing so is adding real costs to motherboards, not to mention burning more power.
 
And of course, that data needs to be put somewhere. If DirectStorage manages to allow these drives to near their maximum transfer rates when loading game assets, then one would expect the graphics memory to be full within seconds, limiting how long initial load times could potentially be.
That's a good point, but the compressed assets can be cached in main memory. Depending on the asset and the granularity of the compression, the game might end up having to read more than it needs, in order to extract the pieces it wanted at the moment.
 
I'm thinking "getting to where the game is running" might not have as much effect as in-game.
If I were a game developer, I would ensure the game runs smoothly (i.e. without stutters) with PCIe 3.0 NVMe drives, because most gamers are using those or SATA. And as long as a drive is fast enough to keep up with the realtime loading demands, then any extra throughput is meaningless. The only time you should notice a difference is during initial loading.

That's not to say there won't be badly-written or extremely taxing games that require PCIe 4.0 drives to play smoothly, but the vast majority shouldn't.
 
Wait time is the inverse of transfer speed. So the faster transfer speed becomes, the smaller the improvements in wait time. e.g. Consider time to transfer 1 GB of data
Ah, but you're holding the amount of data constant. There are a couple of trends working against that.

Parkinson's Law: "work expands so as to fill the time available for its completion."​

Wirth's Law: "software is getting slower more rapidly than hardware is becoming faster."​


We should therefore anticipate that game data will continue to bloat, to the point where the improvements offered by faster SSDs become more noticeable.

For 99% of use cases, SATA 3 is already indistinguishable from the newest NVMe SSDs.
This was true at a point in time, but that was back when software was still largely designed around hard disk-scale throughput and latency. The trends cited above would suggest that software will increasingly adapt to SSD-level throughput and latency, with SATA drives serving as the baseline which was formerly established by HDDs. At that point, we should expect to see the order of magnitude better throughputs and latencies of good NVMe drives becoming more noticeable.

The only time it's really worth spending extra money on a faster-than-SATA SSD is if you regularly work with large files. e.g. real-time video editing.
There are several use cases where good sequential speed matters. Copying around huge video files is one, but another is copying containers and VM snapshots.

The manufacturers, stores, and reviewers just don't like pointing this out because it's bad for business to say "yeah the cheap stuff is good enough".
You could say the same thing about CPUs, as well. Most people don't need a cutting-edge or high-end CPU.

The only way for the 3 sec difference in load time to balloon out to 30 sec in 5 years, is if the amount of data the game loads increases 10-fold in 5 years.
As already pointed out, we don't actually know that the full difference is only 3 seconds. It could be that the CPU decompression just becomes the bottleneck, at that point. If so, we wouldn't actually see how much more potential PCIe 4.0 or 5.0 could offer.
 
Ah, but you're holding the amount of data constant. There are a couple of trends working against that.

Parkinson's Law: "work expands so as to fill the time available for its completion."​

Wirth's Law: "software is getting slower more rapidly than hardware is becoming faster."​


We should therefore anticipate that game data will continue to bloat, to the point where the improvements offered by faster SSDs become more noticeable.


This was true at a point in time, but that was back when software was still largely designed around hard disk-scale throughput and latency. The trends cited above would suggest that software will increasingly adapt to SSD-level throughput and latency, with SATA drives serving as the baseline which was formerly established by HDDs. At that point, we should expect to see the order of magnitude better throughputs and latencies of good NVMe drives becoming more noticeable.


There are several use cases where good sequential speed matters. Copying around huge video files is one, but another is copying containers and VM snapshots.


You could say the same thing about CPUs, as well. Most people don't need a cutting-edge or high-end CPU.


As already pointed out, we don't actually know that the full difference is only 3 seconds. It could be that the CPU decompression just becomes the bottleneck, at that point. If so, we wouldn't actually see how much more potential PCIe 4.0 or 5.0 could offer.
Doesn't matter to regular user. Doesn't matter to anyone except those hyped by the tables or charts.
An old M58 ThinkCentre with Core 2 Duo, 4GB DDR3 can run Win10 and that's from an old Intel 40GB SATA2 SSD. It's enough for people who use their PC to browse the web. Hell, a Core 2 Quad laptop with a cheap TLC SSD is like a race car. It all comes down to what people need, and most don't need the latest and fastest.
The only problem is Windows Update and Defender, which waste resources for god know what reason. How, after all those years Microsoft wasn't able to fix that bloody updater is beyond me.