Discussion High speed gaming on a budget.

Alan Alan

Commendable
Aug 9, 2022
227
11
1,595
Out of a complete blunder I managed to stumble on a fairly good blend for playing video games. Since it works well I thought I would pass this on. With my system, I only have 16 processor lanes. It seems these lanes are faster than the extra chipset lanes but I may be wrong. Yet assuming they are the fastest I used 8 of the processor lanes for the video card and the other 8 for 2 nvme drives. The two drives are also in a riser card and are configured as dynamic discs in a stripe 0 configuration.

The theory is that the raid system can nearly double the speed of the storage data and effectively simulate X16 speeds. (maybe)... If so, the game can do what it needs to do with the data and pack it into the graphics card. It seems the graphics cards do some magic with what little data they actually get. So the true need for X16 speed may not be all that necessary. My guess is the processor spends a lot of time gathering the data as it processes the game itself and the video card lanes are idle more than one might think at first glance.

So in a sense, It's possible the storage itself needs to actually be faster than the video card all things considered. Also bench marks have shown video cards running on X4 lanes are only down about 20 percents as compared to X16 lanes. So that's about it, I had a pretty choppy video game before adding the 2 nvme drives and now it runs much better. About the only time it gets choppy is when it loads in a whole new playing field. But the loads are a great deal faster compared to a couple of sata drives in raid 0 as before. So there is something to be gained with pcie nvme drives on the processors lanes. I can imagine how smooth a video game would be with a 34 lane processor and P4 X16 video card and P4 X16 riser card with 4 nvme drives in a raid 0 configuration, nice but expensive. Ok guys, shoot me down, it's all good but it works for me and it's fairly inexpensive but is there a better way?
 
Last edited:
Out of a complete blunder I managed to stumble on a fairly good blend for playing video games. Since it works well I thought I would pass this on. With my system, I only have 16 processor lanes. It seems these lanes are faster than the extra chipset lanes but I may be wrong. Yet assuming they are the fastest I used 8 of the processor lanes for the video card and the other 8 for 2 nvme drives. The two drives are also in a riser card and are configured as dynamic discs in a stripe 0 configuration.

The theory is that the raid system can nearly double the speed of the storage data and effectively simulate X16 speeds. (maybe)... If so the game can do what it needs to do with the data and pack it into the graphics card. It seems the graphics cards do some magic with what little data they actually get. So the true need for X16 speed may not be all that necessary. My guess is the processor spends a lot of time gathering the data as it processes the game itself and the video card lanes are idle more than one might think at first glance.

So in a sense, It's possible the storage itself needs to actually be faster than the video card all things considered. Also bench marks have shown video cards running on X4 lanes are only down about 20 percents as compared to X16 lanes. So that's about it, I had a pretty choppy video game before adding the 2 nvme drives and now it runs much better. About the only time it gets choppy is when it loads in a whole new playing field. But the loads are a great deal faster compared to a couple of sata drives in raid 0 as before. So there is something to be gained with pcie nvme drives on the processors lanes. I can imagine how smooth a video game would be with a 34 lane processor and P4 X16 video card and P4 X16 riser card with 4 nvme drives in a raid 0 configuration, nice but expensive. Ok guys, shoot me down, it's all good but it works for me and it's fairly inexpensive but is there a better way?
It depends on particular GPU but most don't suffer much if using only 8 out of 16 lanes. Some even work on 6 PCIe lanes. Just few wekks ago I was in the market or new GPU. I was set for a mid range AMD Rx 6600 but there are 2 types, 6600XT and non XT. 6660XT uses 16 lane and non XT only 8. but are gen4.
Now considering that my first PCIe slot is gen3 XT card would have little to no advantage. I chose non-xt and that gave me 8" free" lanes supplied from CPU (20 free PCIe gen3) of which I used 4 for top M.2 slot for Samsung 970 Evo plus to work at full speed.
Other 4 lanes became free to use in second PCIe x 16 gen3 in which I installed adapter for another NVME (Samsung 869 Evo) to also work at full speed. In theory I could use both for RAID 0 but there are few problems, They are not identical and not same capacity (500 and 250GB respectively) so RAID capacity would be reduced by 250GB with no real speed advantage and all bad things that may happen with a problem in one of them.
I used RAID in several configs (RAID 0 for OS) with HDDs for long time but with advent of fast SSDs and specially fast NVMe speed gains are practically nil and potential data loss offset practicality.
Also in my case, my MB has another 16 PCIe lanes supplied by chipset and they are shared with another PCIe x16 slot and second M.2 socket but they are only gen2 so any NVME drive placed in PCIe slot and M.2 work at reduced 1/2 which is already 3-4 times faster than SATA SSDs and used for storage only so high speed is of no consequence.
All together, RAID 0 is impractical for fast disks and has all disadvantages of RAID 0. It's just too inflexible with no performance gains. It's also cumbersome for backups which are essential when reliability is reduced.
There0s also a problem with raiser cards from PCIe to 2 NVME SSDs. If MB doesn't support bifurcation that card would still use only 4 PCIe lanes even in X 16 slot giving 2 NVMe SSDs only 2 lanes each thus halving their potential speed.
As for high speed storage, read and write speeds are not as much help for games as is reduced system load and modern games and some programs tend to do a lot of reading and even writing while running.
 
  • Like
Reactions: Alan Alan
Out of a complete blunder I managed to stumble on a fairly good blend for playing video games. Since it works well I thought I would pass this on. With my system, I only have 16 processor lanes. It seems these lanes are faster than the extra chipset lanes but I may be wrong. Yet assuming they are the fastest I used 8 of the processor lanes for the video card and the other 8 for 2 nvme drives. The two drives are also in a riser card and are configured as dynamic discs in a stripe 0 configuration.

The theory is that the raid system can nearly double the speed of the storage data and effectively simulate X16 speeds. (maybe)... If so the game can do what it needs to do with the data and pack it into the graphics card. It seems the graphics cards do some magic with what little data they actually get. So the true need for X16 speed may not be all that necessary. My guess is the processor spends a lot of time gathering the data as it processes the game itself and the video card lanes are idle more than one might think at first glance.

So in a sense, It's possible the storage itself needs to actually be faster than the video card all things considered. Also bench marks have shown video cards running on X4 lanes are only down about 20 percents as compared to X16 lanes. So that's about it, I had a pretty choppy video game before adding the 2 nvme drives and now it runs much better. About the only time it gets choppy is when it loads in a whole new playing field. But the loads are a great deal faster compared to a couple of sata drives in raid 0 as before. So there is something to be gained with pcie nvme drives on the processors lanes. I can imagine how smooth a video game would be with a 34 lane processor and P4 X16 video card and P4 X16 riser card with 4 nvme drives in a raid 0 configuration, nice but expensive. Ok guys, shoot me down, it's all good but it works for me and it's fairly inexpensive but is there a better way?
Raid o for M.2's are useless and often have less performance than a single drive.

View: https://www.youtube.com/watch?v=Ffxkvf4KOt0
 
Raid o for M.2's are useless and often have less performance than a single drive.

View: https://www.youtube.com/watch?v=Ffxkvf4KOt0
It seems the write speeds are far better than the read speeds but the read speeds are negligibly faster. In a way that makes sense because the data in the test is probably stored in main memory. The read speeds are probably also a test from a transfer directly into main memory. So sequential reads really shouldn't really be any faster but in fact a bit slower from the toggle instructions used by software raid. However if you notice, he didn't specify whether these tests were done with hardware or software raid. That makes a huge difference as hardware raid can read from two different drives on the same clock cycle. This is why I wanted to use Intel's Vroc. Unfortunately something went haywire and Asus discovered a year after production the Z370a did not work as advertised. Their restitution was to no longer mention Vroc on future mother boards. However Vroc is going strong in intels enterprise servers. My guess, this is all done with software raid, if not, a poorly designed hardware raid. Reading two drives at once doubles the data rate, but in my case it probably makes little difference with software raid. About the only true advantage with software raid is probably the lack of heat buildup for each drive. Thanks for the info, decent article on the subject. Maybe I missed something, did he mention whether it was software or hardware raid?
 
The theory is that the raid system can nearly double the speed of the storage data and effectively simulate X16 speeds. (maybe)... If so, the game can do what it needs to do with the data and pack it into the graphics card. It seems the graphics cards do some magic with what little data they actually get. So the true need for X16 speed may not be all that necessary. My guess is the processor spends a lot of time gathering the data as it processes the game itself and the video card lanes are idle more than one might think at first glance.
Yes, a fraction of the bandwidth available to the video card over the PCIe bus is used when you're actually in game and all the assets are loaded in. And it may not even affect loading times in general, because I'm under the impression modern game engines load the bare minimum to work with, then the load the rest in as needed to give the appearance that the game loads faster. But after that, the only thing that's left to send to the video card is commands from the CPU and any textures being streamed in, which is minimized to avoid breaking any semblance of seamlessness.

So there is something to be gained with pcie nvme drives on the processors lanes. I can imagine how smooth a video game would be with a 34 lane processor and P4 X16 video card and P4 X16 riser card with 4 nvme drives in a raid 0 configuration, nice but expensive. Ok guys, shoot me down, it's all good but it works for me and it's fairly inexpensive but is there a better way?
Considering the chipset is often connected with a four-lane PCIe interface, yes. But does it actually matter? No. Why? Because for the most part, the amount of data that a game requests from storage tends to really low than when you're waiting to get the game going. For instance, if you're playing say Battlefield or Call of Duty, once you're in a map, practically everything the game needs is in RAM or VRAM. Even if the game makes storage requests, it's likely the amount of data may not be a large amount, like say > 50MB.

You also have to remember that PC game developers don't target high-end system features. No developer is going to design a game that works best with a RAID0 PCIe storage system, because it alienates 99.999% of their potential customer base. I wouldn't be surprised if modern games, even AAA open world games, are still designed with a hard drive in mind.
 
  • Like
Reactions: Alan Alan
Sounds like you did a good job with what you have. Apparent the main difference between P3 and P4 is an extra lane. Seems the only real advantage or software raid 0 is the heat is shared between two devices. So for extensive writing that's a plus. Not sure if video games really ever write back to the drive but they probably do when saving progress or some preferences. I think what it boils down to on sequential reads is the cpu's hardware raid system. I probably didn't accomplish anything with software raid but hardware raid can read from two drives during the same clock cycle and store the data in a parallel fashion. It's only logical to think that's how it works. It's all about serial transmission on a low level basis, my guess the data from each drive is interleaved and is prepared for the next serial/parallel transfer via lanes off to the receptor, being memory, cache, or whatever. Funny a few years ago someone said the 32 bit system has the same bandwidth as a 64. I guess he didn't realize these bits are traveling in parallel along these lanes.
 
You know, that makes a lot of sense. Right, few have enterprise systems for gaming so they probably use every trick in the book. I was playing No mans Sky in vr. It's a tough one with all of the movements from several things moving at one time. One thing for sure, raid or no raid, the nvme drives really made a difference over sata ssd's.
 
A 32-bit data bus that's transferring twice as fast (be it using twice the clock speed or twice the edge triggers) does have the same bandwidth as a 64-bit data bus.
Yeh, you are correct if they didn't double up on the memory lanes. In reality a 64 bit system just has the ability to index 64 bit memory addresses but individual serial lane clocking is still the same. When the 64 bit systems came out I assumed they also used 2 serial lanes for existing 32 bit wide memory modules thus delivering 64 bits into the new processors for double precision improvements. I guess the only change was the 64 bit registers in the processors versus the older 32 bit registers but they didn't double up on the lanes like I assumed they would. Doing that may give double precision processing but really does nothing to speed up the system. In fact, it would cut the speed in half without two lanes. So as usual I guessed wrong. lol
 
Of course it does, most every game load and saves your progress. Especially survival games where everything you construct has to be saved and in storage for later retrieval. Not every game loads and runs in memory once it starts.
As shown in those videos, an NVMe is faster than a SATA III SSD.
Just not nearly as much as the advertised benchmark numbers would indicate.

At 3,500MB/s, a PCIe 3.0 x 4 drive is, in theory, 7 times faster than a SATA III SSD.
PCIe 4.0, twice more again.

That difference does not always bear out in direct user facing experience.

And as shown, in a blind test, self proclaimed experts often cannot tell the difference.
 
  • Like
Reactions: Alan Alan
Yea in real world use, for a gaming rig, you would never be able to tell the difference, between SATA, or NVME, when playing games stored on either. Yea my Gen 4 drive makes my system overall a little more snappy, over my old Intel 660p, but games don't really load much different. Same can be said for my laptop, with it's 1tb MX500 Sata drive vs said 660p.
 
Yeh, you are correct if they didn't double up on the memory lanes. In reality a 64 bit system just has the ability to index 64 bit memory addresses but individual serial lane clocking is still the same. When the 64 bit systems came out I assumed they also used 2 serial lanes for existing 32 bit wide memory modules thus delivering 64 bits into the new processors for double precision improvements. I guess the only change was the 64 bit registers in the processors versus the older 32 bit registers but they didn't double up on the lanes like I assumed they would. Doing that may give double precision processing but really does nothing to speed up the system. In fact, it would cut the speed in half without two lanes. So as usual I guessed wrong. lol
A lot of things described here doesn't make sense to me because that's not how things are really done.
  • While a common characteristic between 32-bit and 64-bit architectures is address space, there's no hardware limitation. The original 8086 had a 20-bit address space, despite being a 16-bit processor. 32-bit x86 processors from the Pentium Pro on had a 36-bit address space, though applications were still limited as if having a 32-bit one.
    • Heck most implementations of x86-64 cap the address space to 48-bits because it isn't practical to use all 64-bits that are available.
  • CPUs don't have a "serial lane", unless you mean PCIe. PCIe is an I/O interface and has a standard way in which it works so that other things that use it can talk to it. Besides that, the CPU doesn't act on data directly from it. A higher number of lanes may imply that the CPU can shuffle more data around, but it also depends on the speed they operate at. A CPU with 128 lanes of PCIe 1.x isn't going to beat a CPU with 32 lanes of PCIe 4.0
  • The data bus width on memory modules has been 64-bit since the introduction of DDR SDRAM back in the late 90s. Heck, the data bus on x86 processors has been 64-bits wide since the Pentium.
  • Crunching more bits at once doesn't do much for performance unless the application primarily uses them. A lot of applications still use 32-bit data types because a range of around -2 billion to 2 billion, or 0 to 4 billion, is still plenty to represent most things and using 64-bit values doubles the memory consumption and bandwidth requirements.
 
As shown in those videos, an NVMe is faster than a SATA III SSD.
Just not nearly as much as the advertised benchmark numbers would indicate.

At 3,500MB/s, a PCIe 3.0 x 4 drive is, in theory, 7 times faster than a SATA III SSD.
PCIe 4.0, twice more again.

That difference does not always bear out in direct user facing experience.

And as shown, in a blind test, self proclaimed experts often cannot tell the difference.
Lol. Sounds like a stereo salesman, put a blind fold on em and suddenly they can't tell the difference between one amp and another. In reality, I use the nvme drives to pull audio sample of individual acoustic piano notes. Each note is sampled with several key velocities. So when I play me midi keyboard , which is just like a real piano but without sound, the nvme drives deliver a sample of the note I just played. Sometimes I step on the sustain pedal and the nvme drives have to play 400 on more notes at one time. That's when they really do their thing, sata's can't keep up and notes start disappearing. It' the real reason I actually bought them, but they sure can run a video game. Unbelievable a one Tb is under 80 bucks. Three years ago they were 4 times that much. Instead of 4X it should be /4, lol.
 
A lot of things described here doesn't make sense to me because that's not how things are really done.
  • While a common characteristic between 32-bit and 64-bit architectures is address space, there's no hardware limitation. The original 8086 had a 20-bit address space, despite being a 16-bit processor. 32-bit x86 processors from the Pentium Pro on had a 36-bit address space, though applications were still limited as if having a 32-bit one.
    • Heck most implementations of x86-64 cap the address space to 48-bits because it isn't practical to use all 64-bits that are available.
  • CPUs don't have a "serial lane", unless you mean PCIe. PCIe is an I/O interface and has a standard way in which it works so that other things that use it can talk to it. Besides that, the CPU doesn't act on data directly from it. A higher number of lanes may imply that the CPU can shuffle more data around, but it also depends on the speed they operate at. A CPU with 128 lanes of PCIe 1.x isn't going to beat a CPU with 32 lanes of PCIe 4.0
  • The data bus width on memory modules has been 64-bit since the introduction of DDR SDRAM back in the late 90s. Heck, the data bus on x86 processors has been 64-bits wide since the Pentium.
  • Crunching more bits at once doesn't do much for performance unless the application primarily uses them. A lot of applications still use 32-bit data types because a range of around -2 billion to 2 billion, or 0 to 4 billion, is still plenty to represent most things and using 64-bit values doubles the memory consumption and bandwidth requirements.
Yeh, overkill for home computers that nobody ever uses, mad magazine had an issue on computers back in the 80's. They called the commodore the commode. lol And a bit was > yes, my sons computer cost quite a bit. lol That was a good laugh, but yah, PCIe lane is a single trace of copper on a pcb. True that it's an I/O interface but it's all serial communication, or bit after bit. What I was trying to say is when they went to a 64 bit processor I thought they would have added another lane to handle 64 bits at the same speed as 32bits. But in reality it seems the address bus was widened to 64 bits so you can access more memory and the processors registers could use 64 bit numbers. So they sort of double up both the memory allowed and the math precision of the processors on the X64 plat form. Just why they called it X86 for 32 bit systems is beyond me. Pretty crazy, X86 was probably some addition of everything to make a big deal of it. lol
 
PCIe lane is a single trace of copper on a pcb. True that it's an I/O interface but it's all serial communication, or bit after bit.
It's actually 3 minimum per lane, as PCIe uses differential signaling so you need two for the differential pairs plus a ground.

What I was trying to say is when they went to a 64 bit processor I thought they would have added another lane to handle 64 bits at the same speed as 32bits.
As from you just add however many PCIe lanes you need or can handle, PCIe doesn't care how many bits the CPU architecture is.

So they sort of double up both the memory allowed and the math precision of the processors on the X64 plat form. Just why they called it X86 for 32 bit systems is beyond me. Pretty crazy, X86 was probably some addition of everything to make a big deal of it. lol
It's called x86 because of it's lineage from the 8086 processor.

This all started from Intel's 8008, which was their first 8-bit processor. The successor was the 8080, then the 8085 (because it's the 8080, but a 5V version), then the 8086 (16-bit version) and its 8-bit external variant, the 8088. The rest after that just followed the format 80x86, hence x86.
 
  • Like
Reactions: Alan Alan
It's actually 3 minimum per lane, as PCIe uses differential signaling so you need two for the differential pairs plus a ground.


As from you just add however many PCIe lanes you need or can handle, PCIe doesn't care how many bits the CPU architecture is.


It's called x86 because of it's lineage from the 8086 processor.

This all started from Intel's 8008, which was their first 8-bit processor. The successor was the 8080, then the 8085 (because it's the 8080, but a 5V version), then the 8086 (16-bit version) and its 8-bit external variant, the 8088. The rest after that just followed the format 80x86, hence x86.
I see, makes sense. Speaking of 8000, I have the old Chessmaster 8000 that was designed to run on Win XP. Not sure if window 11 compatible mode can run it, I can try, But I was going to ask the forum if anyone knows how to get a thumbdrive to work on my router. The instructions are pretty hard for me. But I don't see a catagory for routers. Any idea what part of the forum would be best for this type of help?
 
Last edited: