Seeming GPU memory issue

The_Staplergun

Estimable
Jan 30, 2017
1,395
0
2,960
When I put my system together over the weekend, I started to tweak settings. I tweaked all of my clocks on ram, CPU, and GPU. The RAM and CPU I put through a stress test and memtest with individual sticks at overclocked rates. I did one stick all the way through, then did that stick in each slot to verify pins in the slots and the board are good. I then tried each of my four ram sticks in slots. I then did them in pairs. (I bought two separate pairs but ended up with four sequentially manufactured sticks) I then switched the pairs DIMM slots from 1 to 2, so I did four pair tests. I then ran all four together at OC'd settings overnight and threw zero errors. I also have run stress tests in CPUz with windows up and running on the CPU for an hour with no errors.

So, to my point:
When I am doing things in windows I get strange issues with graphical problems. I ran RealBench and when it gets to the "encoding" portion where you don't actually see anything on screen, but only in the log, it BSOD's saying page fault. I had been tweaking the GPU clock, and it ran just fine at 2088, even on a game with no artifacts. I got really frustrated and did some research and found out that even GPU ram can be bad. I decided to use a GPU based MEMTEST (not the CL version) at base clock settings. It instantly threw 1505 errors that carried throughout the 50 iterations.

Could it still be the CPU and Memory after running memtest86 overnight with no errors? It made it through 4 passes in about 6 hours (64gb of ram is apparently a long test)

I called amazon to RMA the gfx card, since I figure that's generally the problem since windows would throw weird errors and BSOD when stressing the card.

My system specs are listed in my signature.
Running a 5.0GHz OC on the core. 3200 MHz on the ram (XMP profile) Stock clocks on the GPU (1607 base clock)
 
Attached is the picture just before it finished the fourth iteration. I had to run out of the house and my wife let me know it didn't error at the end of the fourth pass.

I don't have a picture of the GPU memtest handy. I had to read it as it flew by, I couldn't get the window to stop closing after the test.

Sorry, its huge.
GVFlLy_1r5ZHY92vnFWbPRNreXfEfgrbdPymVD0PJRJTaEFJqOxBfxujjUg-COiPVunXO9V-NsoWCA4=w1920-h985
 
Yes, I also did that in a panic. I'm sorry I forgot to highlight that point. I freaked out because it just started BSOD'ing. I went into the bios and reset all settings to stock. I removed the XMP profile, reset the CPU to stock settings. I pulled the CMOS battery and discharged the capacitors by holding the power button with the PSU unplugged for 30 seconds also to verify the settings were all stock. I ran memtest, CPUz, and realbench on those settings with the display plugged into the motherboard HDMI slot, completely bypassing the graphics card and it ran just fine.
 
I did all that testing. The TL;DR of the first paragraph, since I did mention I did this, is I ran a crap ton of memory tests throughout all four DIMM slots individually, in pairs, in alternating pairs in alternating sets of DIMM slots (side 1 or side 2), and sets of 3 and then I just ran all four overnight.

I did stick 1 in slot A2, A1, B2, B1.
Then I did sticks 2-3-4 in slot A1.
Then I did stick 1 and 2 in A2/A1, A2/B2, A1/B1, A1/B2, A1/B1, A2/B1, B2/B1.
I did the same with 3 and 4 with the same results (zero errors)
I then added in an extra stick and scrambled the sticks around to test the different sticks in pairs and trios and different sets of single and dual channel ram. I got pretty stupid with it. I spent hours yesterday doing all this.
I just decided to then throw in all four sticks and test them at stock for one pass, then threw all of them at the OC settings overnight with no errors.

I did them as 1/2 and 3/4 pairs because I got two sets of pairs. I got four sequentially manufactured sticks.

My slot layout is A2/A1/B2/B1 like most standard boards.
So, my stick layout is as such and is stable (without the GPU):
3/1/4/2
 
So it seems there's a correlation between low temp and partial stability. There's still lots of strange graphical issues in windows...I can still use it if I choke the clock speeds of the memory and gpu down to the lowest numbers it can go without advbanced options. I also have to turn the fans on to keep it way below "standard" operating temperatures. Just looks like faulty memory and possibly a faulty onboard system between the RAM and the chip...maybe even the chip. It actually got more stable and less glitchy when I brought the clocks down.
 
Yeah it's pretty vague. I'm glad I figured it out. Thanks for the responses.

I received page fault errors that weren't ram related. It also caused instability when moving large amounts of data (benchmarking encoding processes). Normal stressing through CPUz did nothing to effect it since it was just attempting to tax the processor. When data was actually being moved from the hard drive to the ram or other forms of data transfer it became highly unstable usually resulting in a large string of BSODs.