PC crashing into a random solid color when under graphical load.

Status
Not open for further replies.

Ex Nihilo

Reputable
Oct 13, 2015
18
0
4,510
I'm aware there are similar problems around this forum such as this, but none of them helped me. Also my current priority is identifying the cause of the problem and I think the info I'll provide will make it easy.

Copying this from a post I've made on reddit:

Operating System
Windows 10

Computer Specs (PSU, GPU, CPU, RAM, Motherboard)
PSU: Corsair RM 850
GPU: Gigabyte NV98TG1 (980Ti)
CPU: i5-4690K OC'd to 4.5ghz (i've disabled OC)
RAM: 2x 8gb hyperx
Motherboard: Gigabyte z97 hd3

Speccy Link
http://speccy.piriform.com/results/niph08ZOsMCGBNrq4HEA0i3

Description of problem
Hi. I've had this problem where my computer freezes to a random solid color while gaming. The reset button restarts the computer but I won't get the display back until I power off/on. It's been like that for the past week, only happened once every 2 day, and I was able to continue my gaming for the rest of that day. I've done some kind of a stress test by leaving DCS 2.0 open with highest settings (99% GPU usage) and left it running for 60 minutes. No crash. So I tought it was something temporary.

Yesterday I've started EVE online (which is not so gpu heavy) It crashed immediately upon graphics initialization. I've resetted my computer, booted to archlinux, downloaded unigine and began an openGL test. Crashed immediately. I've booted back to Windows, started DCS 2.0, crashed immediately.

Right now, I'm only continuing the tests on Unigine Heaven. What I've tried so far and the consequences:




  • Replaced power cables from PSU to gfx card. No avail.
    Tried placing the gfx card to one pci slot below, it didn't fit. Reseated it to the previous one. no avail.
    Tried lowering the power limit to around 50% and memory,core clocks to lowest possible. Didn't crash unigine.

    Used DDU to remove and clean reinstall graphics drivers. No avail.
    I've tried using hashcat on some md5 tables. Uses up to 99% GPU, 96% gpu TDP and no crash. (hashcat uses almost no CPU)

    Tried using OCCT to stress test GPU. No crash.

    So; I would normally think this is definitely a GPU problem since it happens under graphical load and the error itself is graphical in nature. But I'm also thinking that the GPU is the only thing that can draw so much power from the PSU, so it could be the power supply what's causing the problem(because stressing GPU without CPU or vice versa didn't result in any crashes.)

When this issue began
1-2 weeks ago

Recurring issue
Yes

Date of purchase
2 years ago

Under Warranty
Yes (i should be but i'm not actually sure)

Cause/Steps to recreate the issue
start the computer up, start a game, experience crash.

What I've tried so far to resolve the issue

  • Revert back to non-OC settings
    replace PSU cables
    reseat gfx card
    try with lower voltage
    try various methods to utilize gpu/psu
 
Your problem is very complex, you need to perform isolation testing of the main parts of your computer most likely in motherboard, RAM, and Video cards. If there are no available parts for this procedure go on in testing of the GPU. I hope you have a good cooling of the GPU. Remember if overheat happen it can cause problem on it. Try to use other GPU and try to run not so solid graphical programs. Or use the built-in vga port of the mainboard. But also consider to clean and check the temperature of your processor and mainboards. Have you check also the driver update of your computer?
 
You mention the GPU use is maxed, but you don't mention anything about the memory usage on the GPU, or temps the card is reaching.
Try blowing out the fans on the 980 to increase airflow and monitor temps and voltage draw for the card (I use hwmonitor) to look for anomalies like spikes or high rpms at idle, or the fans not working at all.
Might have a bad memory module on the GPU which is only being accessed during certain tests and not others.

Your theory of the PSU starting to fail is also very possible, although I wouldn't expect an 850 w corsair to fail when under a "normal" pc load.
Make sure all the power cords are secure into wall sockets/power strips/etc. also. A loose cord or outlet under load could cause power issues as well.
 


Driver update has been done as I've stated above. Temperatures are well within acceptable limits.I once used the CPU for graphical rendering and tried stress testing. It didn't crash but I highly doubt this is a proper isolation testing since the computer barely draws any power from the PSU without my 980ti plugged in. As I've also stated above, I tried running hashcat (an md5 cracker using only the GPU.) to test. It didn't crash.

I'm not sure what you mean by "the other gpu". I only have one and unfortunately, I don't have any spare PSU or ram too :/

UPDATE: I ran windows' memtest. No errors there.
 


What I mean use other GPU, RAM, or PSU with the same SPECS. Because it is possibly the cause of the problem is in this parts, you need to remove this suspected parts and try to replace with the other parts you have. If you don't have this parts its better you need to bring your PC in service center that have complete parts to resolve your problem. Remember if you dont have resources to fix the problem it may cause more damage of your PC.

 


the temps look normal.fans working as they should (idle when below threshold temp. Rising smoothly under load as expected).

You're right about I've supplied no solid data about the GPU. the thing is, afterburner fails to save any logs in case of an abrupt shutdown/freeze. I've tried eyeballing the values but the crash happens immediately upon running unigine heaven or any game so I can't make any sense out of it.

I've also reseated every cable, that goes out from the PSU to the motherboard and other peripherals.

I'd be more than happy if you could reccommend me a way to test the memory modules. I've tried memtestCL without any errors if that's of any significance.
 


I might bring my gfx card and PSU to a friend's house if he's available and test them on his case. as i've said, no spare parts here. Just looking for in-house solutions to identify the problem if that's possible.
 


Ok just try on it, but remember install it in the same SPECS and if it will display try to run the program you use that mainly cause the problem. If it results with the same problem identify which part, then replace it with a new one parts.

I will logout this time because its now bed time here in the Philippines, just update me tomorrow for the results. Or just other persons here to attend the solutions of your problem.
 
memtestCL will do the trick, but not if you run it at default settings. The default only tests the first 128 MB of GPU ram, so you need to override the command line to accommodate for your total memory, so something like:
memtestcl 4096 5
would test 4 GB of vram with 5 test loops.

just add 1024 for each GB ram on the gpu.

I was referring more to the actual power cables from the wall to the PC, not interior connections. As your PSU tries to draw more power from the wall, loose outlets or plugs can result in arcing which causes unstable power surges. Make sure these are all tight. Might want to try a new PSU to wall power cord also if you have an spare.
 


No extra cord at home, but I've tried changing the socket (instead of extender, plugged directly into the wall. no avail)

Also I did the memtestg80 (memtestcl refused to do anything higher than 128 for some reason) and i got some errors! at 6144, i had 4 million or so errors, same at 5000mb. So I've tried starting from 500 and increased by 500mb increments. It froze again with a white screen at 3000mb.

I'll test a few times and then report again.

Update: I've actually found out that 4 billion errors might be a bug of memtestg80. I'll try some other options.
 
Update again: Memtestg80 surely freezes the system like unigine/gaming does when a 2000mb test is done ( card is 6gb). still not sure if memtest introduces a high load suddenly, or if it's actually the memory block.

yet another update: I've just noticed there's a loud coil whine like sound coming from the GPU area. contrary to usual coil whine, this one happens while the card is idling, and sounds like a tiny drill, instead of whining.
 

Hi Ex Nihilo did you repair your PC? About what you report here still the GPU is the suspect. If there are unusual sound check your GPU cooling system if there are abnormality of the part. Try to clean it, or I said use another compatible GPU.
 


Hi. I still couldn't find a friend with an adequate PSU to plug and stress test my graphics card. The cooling system is not rattling, it's obviously an electrically induced buzz. (happens even when the fan is not running/changes frequency depending on the number of pixels drawn). Also I highly doubt cleaning will help since the temperatures are well within accepted limits.

Just to clarify, the crash used to happen after a long session of gaming 2 weeks ago. but now, it IMMEDIATELY crashes when I start a 3d game or unigine heaven.

I'm suspecting these:
A gfx card memory block went bad
A gfx voltage controller or something like that went bad
Motherboard
PSU

I'm looking for a way to fill my 6GB's of VRAM without actually putting heavy load(without performing a full-fledged memory stress test) on the card. If it fails, I'll make sure it's a memory problem.
 
Hi guys. I've brought my gfx card to my friend's house and tried it in his case. Crashed his computer too in 2 seconds into Unigine Heaven.
Guess I'll need to find my card's invoice, and then RMA it. I hope they don't claim it's an user error or something like that.
 


thanks man. from what i've read on the internet, I think I'll need that. Gigabyte didn't even bother to reply to my e-mail as of now.
 
That's pretty good you verify now the problem, the mobo is the problem, maybe there are also caps damage.




 


can you explain? I said the card returned from RMA with repairs on two busted resistors. How did you come to the conclusion that mobo is the culprit? or did you mean that busted resistors on peripherals are usually signs of bad mobos?
 
Status
Not open for further replies.