News AMD Preps Smart Access Storage To Accelerate SSD Performance

  • Like
Reactions: hotaru251
And from the other linked article, regarding DirectStorage:
https://www.tomshardware.com/news/d...forspoken-load-times-to-less-than-two-seconds
"A 71% increase in SSD throughput in DirectStorage compared to Win32 only delivers a 0.2 second improvement (around 10%) in load times, reducing it from 2.1 seconds to 1.9 seconds. "

0.2 sec diff, with vs without. weeee......
exactly.

New tech generally pretty w/e early on.

However in future when games get massive (compared to todays sizes) & the rate of data read/transfer gets high enough the tech will likely get more beneficial a kin to how we went from hdd to ssd times.
 
  • Like
Reactions: TinkerTot
exactly.

New tech generally pretty w/e early on.

However in future when games get massive (compared to todays sizes) & the rate of data read/transfer gets high enough the tech will likely get more beneficial a kin to how we went from hdd to ssd times.
Even now though...a decade after SSDs are commonplace...there isn't a lot of difference apart from benchmark numbers.
Going from HDD to SSD, huge differene.
The various flavors of SSD? Not so much.
 
when games get massive? lol, take a look at Red Dead 2, it's like over 80GB... i hate to burst your bubble but games are already huge. now you are probably talking star trek stuff, and nobody's gonna have that kind of power or bandwidth anytime soon
 
A new report claims that AMD is preparing Smart Access Storage (SAS) to accelerate storage performance on its Ryzen processors.

AMD Preps Smart Access Storage To Accelerate SSD Performance : Read more

I always have to laugh when someone mentions DirectStorage...

One of our favorite games in the family is ARK Survival Evolved. It has lots of add-ons and maps available and has grown from just around 100GB initially to something like 400GB currently.

But that's not the main issue: those 400GB are 150,000 files in 13,000 folders and evidently launching the game and a session needs to read/parse quite a few of them and perhaps rather randomly.

Since it's so rather big, I put it on a Windows Server 2019 share with a RAID0 of SATA SSDs as a backend and felt confident that the 10Gbit network between the gaming workstations should be enough to carry the SSD performance over the network, especially since I tested this with a couple of VM images with dozens of GB each and reached the expected 1 GByte/s across the wire.

But even if the network is fundamentally able to deliver the bandwidth, loading the game via the network still took ages longer than from a single local SATA SSD, which can also easily take some minutes. It was much slower than the local HDD loading that I wanted to replace with a shared pool instead of buying SSD storage for each PC. Could the SMB network add such drastic overhead?

During a Linux Steam test session I didn't have an SSD available for storage, so instead I used a rather ancient 2TB HDD I had lying around. So when I launched the game under Linux, I didn't really expect that to come alive in less than 15 minutes or so: I wanted to have a look at the graphics.

But in fact it loaded way faster than from the local SSD JBOD on Windows!

Again, just in case you missed it: a lowly HDD on Linux beat a RAID0 of SSDs on Windows 2019 Data Center Edition!

Unfortunately I didn't find the time to run the test with Linux as a file share host with Windows clients: that should have been really interesting!

Opening something like 100k files to start a game may be somewhat extreme. Lots of games store maps in large files and perform much better. ARK evidently was designed from an EPIC template that involved a lot of small files even in the original Shooter Game, but grew "epic" with the wonderfully detailed and large maps they designed.

Whatever Microsoft does when opening a file, the overhead adds up big time when you deal with hundreds of thousands. I don't know if they do virus scanning/blacklist-checking or have a really inefficient way of parsing the file system tree. But when you compare that with how Linux performs, the difference is orders of magnitude and I am shocked to see that a VMS successor could perform that badly.

Gaming doesn't need a new storage API on Windows: it needs a different OS that is actually able to use the hardware that's already there efficiently. And somebody better tune file sharing, as that brings far worse performance than a local hard disk even with an SSD backend and a 10Gbit network, when lots of small files are involved: there must be dozens of synchronous latency intensive dialog packets, before the first file data actually goes across the wire. Large files easily saturate the 10Gbit network, small files bring it to a crawl even when you copy RAM cache to NVMe.

I can only recommend Microsoft engineers repeat this easy (and fun!) benchmark using ARK and start profiling their code!

Too bad the "looks" of ARK under the Linux variant of Steam just aren't nearly as good, otherwise we'd have done the switch, just to cut down on those terrible load times.

BTW: Restarts of ARK with a warmed up file system cache do much, much better (locally, not network): Perhaps Windows remembers it's already virus scanned those files or it just traverses file system trees in RAM much faster than those on disk.
 
Last edited:
I always have to laugh when someone mentions DirectStorage...

One of our favorite games in the family is ARK Survival Evolved. It has lots of add-ons and maps available and has grown from just around 100GB initially to something like 400GB currently.

But that's not the main issue: those 400GB are 150,000 files in 13,000 folders and evidently launching the game and a session needs to read/parse quite a few of them and perhaps rather randomly.

Since it's so rather big, I put it on a Windows Server 2019 share with a RAID0 of SATA SSDs as a backend and felt confident that the 10Gbit network between the gaming workstations should be enough to carry the SSD performance over the network, especially since I tested this with a couple of VM images with dozens of GB each and reached the expected 1 GByte/s across the wire.

But even if the network is fundamentally able to deliver the bandwidth, loading the game via the network still took ages longer than from a single local SATA SSD, which can also easily take some minutes. It was much slower than the local HDD loading that I wanted to replace with a shared pool instead of buying SSD storage for each PC. Could the SMB network add such drastic overhead?

During a Linux Steam test session I didn't have an SSD available for storage, so instead I used a rather ancient 2TB HDD I had lying around. So when I launched the game under Linux, I didn't really expect that to come alive in less than 15 minutes or so: I wanted to have a look at the graphics.

But in fact it loaded way faster than from the local SSD JBOD on Windows!

Again, just in case you missed it: a lowly HDD on Linux beat a RAID0 of SSDs on Windows 2019 Data Center Edition!

Unfortunately I didn't find the time to run the test with Linux as a file share host with Windows clients: that should have been really interesting!

Opening something like 100k files to start a game may be somewhat extreme. Lots of games store maps in large files and perform much better. ARK evidently was designed from an EPIC template that involved a lot of small files even in the original Shooter Game, but grew "epic" with the wonderfully detailed and large maps they designed.

Whatever Microsoft does when opening a file, the overhead adds up big time when you deal with hundreds of thousands. I don't know if they do virus scanning/blacklist-checking or have a really inefficient way of parsing the file system tree. But when you compare that with how Linux performs, the difference is orders of magnitude and I am shocked to see that a VMS successor could perform that badly.

Gaming doesn't need a new storage API on Windows: it needs a different OS that is actually able to use the hardware that's already there efficiently. And somebody better tune file sharing, as that brings far worse performance than a local hard disk even with an SSD backend and a 10Gbit network, when lots of small files are involved: there must be dozens of synchronous latency intensive dialog packets, before the first file data actually goes across the wire. Large files easily saturate the 10Gbit network, small files bring it to a crawl even when you copy RAM cache to NVMe.

I can only recommend Microsoft engineers repeat this easy (and fun!) benchmark using ARK and start profiling their code!

Too bad the "looks" of ARK under the Linux variant of Steam just aren't nearly as good, otherwise we'd have done the switch, just to cut down on those terrible load times.

BTW: Restarts of ARK with a warmed up file system cache do much, much better (locally, not network): Perhaps Windows remembers it's already virus scanned those files or it just traverses file system trees in RAM much faster than those on disk.
I wonder if your setup would be better with an iscsi target instead of smb.
 
I wonder if your setup would be better with an iscsi target instead of smb.
Well the goal in that case was to centralize the (expensive) SSD storage in one place, so I didn't have to replicate the huge game files on every machine.

iSCSI is block storage, exclusive to one host for each target, so the consolidation/sharing effect would be missing.
 
  • Like
Reactions: fball922
Wow !!! I am amazed at the negative Nancy's in this forum.

Realistically it's a positive thing.

Sure, at the moment its real-world use is not exactly exciting but, in time
the tech will mature and become something worthwhile.

Then, the tunes here will also change too...
"WOW, this is a great benefit for everyone from server to gamer and workstation."
 
abufrejoval said:

Again, just in case you missed it: a lowly HDD on Linux beat a RAID0 of SSDs on Windows 2019 Data Center Edition!

You really should clarify this statement by expanding on it.
Adding that it was tested over :
abufrejoval said:

the 10Gbit network between the gaming workstations

I'm also a bit confused over "gaming workstations" are they gaming machines or workstations?

Are you using workstations to mean computers in general
or to mean that the machines are workstations with graphics cards in them?
 
Last edited:
  • Like
Reactions: TJ Hooker
You really should clarify this statement by expanding on it.
Adding that it was tested over :


I'm also a bit confused over "gaming workstations" are they gaming machines or workstations?

Are you using workstations to mean computers in general
or to mean that the machines are workstations with graphics cards in them?

What I can say about gaming machines vs. workstations: it's a 'continental divide' that is becoming bigger and bigger these days.

The PC mainstream is getting dropped completely and instead we have "the PC" being either served from notebook designs initially focussed at around 15-28 Watts (upsell?) or from server designs focussed at 150-280 Watts (downsell?): the core personal computer hotspot, which allowed Intel to kick out the traditional Unix vendors by revvying up a desktop design to workstation/server workloads is completely gone today.

The original motive to buy my hardware was technical architecture design work for machine learning, which meant a "balanced" mix of CPU and GPU power and plenty of RAM.

It's what pays for the hardware; gaming is a "test use case" with collateral benefits not the purpose, if you get my drift.

So I went with hardware that would offer relatively high clocks when the active cores were few, but which would also exploit additional cores when workloads were sufficiently parallelized and allow for server class extrapolations.

Case in point my Xeon E5-2696 v3 Haswell generation workstation, which clocks up 4GHz with only two active cores (pretty similar to a Haswell i7), but will need to drop to 2.6 GHz when all 18 cores are active.

That setup allowed me to combine both low core/high frequency and high-core/CMOS optimized frequency testing into a single piece of hardware. As for GPUs, the big market segmentation that Nvidia imposed on FP64 workloads or VDI capabilities didn't apply to ML for Pascal and Turing so consumer hardware was ok for extrapolation (and, ahem! 3D based testing).

But the 40 PCIe 3.0 lanes of the Haswell CPU also lent quite a bit of flexibility to add things like 10Gbit networking without compromising GPU bandwidth, which on a "desktop" system cloud not be avoided during the Haswell generation.

Even today 10Gbit Ethernet tends to eat far more PCIe lanes than really necessary as the latest Marvell/Aquantia hardware that realizes 10Gbit Ethernet with a single 1 PCIe v4.0 lane but hasn't become readily available yet.

The Haswell Xeon E5 still filled the "workstation" use case. Today the Threadripper gap between Ryzen desktops and EPYC servers is getting very difficult to buy hardware for. Intel is little better, with Alder Lake pushing into what used to be workstation territory, but a huge workstation gap between server CPUs which are yet to become available on the open market.

Without that workstation middle-ground, extrapolating server performance becomes very difficult and I'll loose my ability to purchase gaming hardware on a corporate argument: too bad, really!
 
Again, just in case you missed it: a lowly HDD on Linux beat a RAID0 of SSDs on Windows 2019 Data Center Edition!
As TinkerTot already pointed out, Latency is your issue, you are taking on an order of magnitude hit hosting these files on a SMB network rather than locally connected via SATA on old spinning rust. In a slow HDD 4k response time is around 28 ms, while the SSD should ≤0.2 ms, these are very repeatable, but on your SMB setup depending on switches and overhead traffic avg latency should be from 10-40 ms, with peaks of 400-600ms, these peaks are what is handicapping your setup, and causing you to conclude 2tb HDD beating Raid0 SMB.
 
Of course it's latencies, I'm well aware of that. The main question was if these are mainly physical, protocol or implementation.

And the point of my initial post was that Microsoft should just fix its OS instead of inventing a new storage API, because my measurements indicated an implementation issue specific to their OS.

And yes, it's probably all a bit muddled up, because it took me a while to find out that physical latencies weren't the main issue why network loading was so slow.

I eventually traced it to the latency of the OS for opening a local file (Windows vs. Linux) and it's the no-network case that the quote above is referring to.

My original working hypothesis was that network latency would be a low constant overhead, while SATA SSD latencies would be identical between local and remote and 10Gbit/s networking delivers almost 2x SATA bandwidth and that would mean a centralized SSD pool would be a better investment than local replicas.

And then it turned out that the local Windows file opening latency was so far above Linux, that the storage device technology (SSD vs. HDD) no longer mattered: it wasn't the seeks between these thousands of little files that caused the delays, but the cost of opening/reading a file.

And at that point obviously adding a file sharing setup based on Windows servers and clients could only be worse, because they are additive.

Again the surprise there was, just how much worse it was and I wonder again how much of that is unavoidable physics, bad protocols or bad implementation.

There are some additional measurement points that are interesting and could help investigate things further, if one had Windows source code and means to profile performance.

  • Moving ARK between a SATA SSD and an NVMe drive has no measurable performance impact for game/map loading. On Windows there is no measurable benefit from the vastly better physical performance and device parallelism that NVMe provides over SATA.
  • Loading ARK again, after it had been loaded and started once, is an order of magnitude faster. Yes my systems have 128GB of RAM, but no, according to Windows they didn't just cache the entire game, overall cache usage remained below 20GB. But it either caches some of the critical data or perhaps it foregoes lots of security checks by whitelisting files it's already opened recently. Just deactiving Windows security didn't make a difference that I could notice.
Of course it would be interesting to bring the network back into the picture once the mere file open performance issue was solved. But if average SMB latency overhead was indeed at HDD levels and the peaks you report at 400-600ms were indeed normal on a single switch 10GBase-T network without congestion or storage bottlenecks, then the very notion of a file server can be put to rest.

I'm not quite ready to believe that better implementations and protocols can't fix that.

And obviously the game engine/designers are also at fault, because they use tens of thousands of small files and serialize opening them.

But that works pretty ok on Linux...