News DirectStorage Performance Compared: AMD vs Intel vs Nvidia

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
The GPU has to wait for the CPU to tell it what to render anyway. If the CPU is too busy decompressing assets, you'd get the same stutters.
Yeah, like um... if you're using a single-threaded game engine from 2003 or something.

This API is for sending the compressed data directly to the GPU without passing through the RAM.
It's not. The article even had a nice diagram clearly showing it still passes through system memory!

BnyXcM5g4Tok5sydJjZrmC.png


Note: be sure to click the little arrows embedded in the image frames! Many articles stack multiple images per frame. In this case, I think they'd have done better just to have two separate frames.
 
  • Like
Reactions: TJ Hooker
OK, let's get dirty....

The elephant in the room here is not compression, decompression, or even awful programming on the part of the clueless game developers who have never heard of speed optimizations or preloading in their lives... No. It's not.

The elephant in the room is Windows file access.

Windows doesn't just open a file and start transferring data. No. First it has to lock down the entire directory tree just to get to the file, wade through 18 layers of cached crap just because everyone in the OS development team thinks more cache is always better (hint: not!), and then, eventually, after copying the directory entry around a few times, it grabs a handle to the file. Oh no. Not done yet. Now it has to read the first block and run it through it's file-type identifier routines, yes, even though it really uses file extensions anyway... After determining that it might really just be a data file after all, it says hey, this is a jpg. I'll send it to the jpg processor and index it into the thumbnail database, because, why not do this every time.... and now, oopsies. Forgot to send it off to the virus scanner because we've only scanned this file a thousand times already, and who knows, it might have changed while no one was looking. Finally, done with virus scanning, oh crap... Wait. The searchindexer... Gotta scan the entire contents and send this thing off to the search indexer just in case the user wants to search their own computer (which has never in the history of Microsoft worked anyway!).... OK.. Identified, thumbnailed, cached, scanned, indexed, and sent off to the pre-processor for that particular filetype... Wait... we need an icon. Let's go look up the icon for that file... Phew... Maybe it's time to send some data to the game?

Turn off your searchindexer service, and disable virus scanning on your game data directory and test this yourself.
Windows file access is the problem. Windows file access is PAINFUL. Yeah, linux isn't too much better. Don't get smug.
Decompression of the file is trivial relative to the other silliness going on.

If you really want speed, then give me a raw partition. I'll slap a little UFS filesystem on it and be done.
Get the OS out of the way.
Excellent explanation!
On top of everything: "oh, why don't we involve the Graphics Processing Unit to deal with all this Files crap? Because that's what we have a Graphics card for, right?"

I think it's time for a change here. We are going back to pre 1990 times. A GPU nowadays is almost a whole computer, and costs as such. It doesn't make sense.

If they want paralell processing, fine, create something else in charge of that. With Raytracing things should be different now, why do we still have mixed up things in a gpu?

Have a "co-processor" to do the parallel batch staff, a standalone card that we can buy and add in a pci-e slot or whatever. Release the gpu from that. Lower gpu prices right now.

And yes, Microsoft should clean up their sh*t regarding the horrible process you described. And fix the search that is useless, as we all end up searching in DOS (a.k. "command line") or finding the things manually.
 
ROFL shared memory is the answer!

You must be new around here if you think PC's havent been doing shared memory pools since before cell phones were in use.
Everyone knows PC with iGPU use shared memory, you're not telling something new here.

But PC were never designed to use shared memory, that's one of the reason these iGPU are so weak, iGPU suffer from major data bottlenecks on PC.

ARM on the other hand, was designed from the ground up to be efficient. That means sharing as many resources as possible, including using shared memory pools.

All these "fixes" on x86 systems like "DirectStorage", are a patch on a major design flaw PC have.

Some people think that DirectStorage is some kind of computing evolution. No no no, it's a hackjob trying to fix a major flaw found on X86 systems. Namely that x86 systems were not designed to have unified memory pools and instead of the CPU and GPU sharing data, they are competing for data. And DirectStorage can not just fix this, all those layers and bottlenecks in microcode where the CPU gets in the way of the GPU when it wants to access data will largely need to remain in place to make x86 work.

ARM systems don't have these problems. Shared memory is the future because the packaging has become the bottleneck.

Separate volatile memory pools also add a lot of cost to PC architectures. PC users are paying twice for volatile memory, once for RAM and once for VRAM. And the only things games are doing is sending asset data like textures from one pool to another pool. If any ARM engineers would look at this, they would argue it's retarded. Because it is.
 
Last edited:
But the million dollar question is. Is this even useful for people with older, and/or lower-end GPUs and CPUs? Can a 3070/6600XT use it? A 1070/570? If it's only usable by newer GPUs, it's rather uninteresting for a majority of people. A test with such lower-end and older hardware would have been great.
 
Last edited:
Think the real question is how much performance difference it's going to make in the real world on even midrange gaming desktops since it will require the replacement of most existing NVMe drives as they are not optimized for DirectStorage.
 
  • Like
Reactions: bit_user
ARM Apple on the other hand, was designed from the ground up to be efficient. That means sharing as many resources as possible, including using shared memory pools.

...

ARM Apple systems don't have these problems. Shared memory is the future because the packaging has become the bottleneck.

Separate volatile memory pools also add a lot of cost to PC architectures. PC users are paying twice for volatile memory, once for RAM and once for VRAM. And the only things games are doing is sending asset data like textures from one pool to another pool. If any ARM Apple engineers would look at this, they would argue it's retarded. Because it is.
Fixed that, for you.