[SOLVED] Is my GPU faulty?

Delythien

Honorable
Sep 1, 2015
78
1
10,545
Hi there.

I've been receiving some errors in the games I've been trying to play in the past couple of days. Since I'm not familiar with what is actually going on and what do the errors mean, I want to know if you guys could shed some light on the issue.

My specs:
Motherboard: Gigabyte Z390 Aorus PRO WiFi
CPU: Inter Core i7-9700K 3.60GHz (8 cores)
RAM: 2x 8GB Vengeance LPX DDR4 3200MHz
PSU: Corsair RM550x v2 - 550 Watt
GPU: MSI GeForce RTX 2070 AERO 8G

My issue:
1.) I've been playing Starcraft II and after an hour into the game, my GPU fans go to max speed and cause my Windows to be almost fully unresponsive. By Windows I mean File Explorer, Desktop, Task Manager etc, while other applications are just a bit slower. After 4-5 minutes of max speed on those fans, everything else slows down drastically and the only way out is to restart my computer physically. On rare occasion, after about 10 minutes the fans cool down and system sloooowly starts to function again.
Sometimes, instead of fans going to full speed, the game stops responding (or simply a black screen appears), but the other symptoms reoccur.
2.) I've been playing Tomb Raider (2013) and after about 15-20 minutes into the game, the game crashes, leaving the log:
18:31:22:078 (14832) > PCDX11BufferPool::AllocateBuffer: ID3D11Device::CreateBuffer failed with HRESULT 0x887a0005 (BindFlags=4, ByteWidth=16)
18:31:22:080 (14832) > Memory statistics:
18:31:22:080 (14832) > Total RAM = 15.9 GB (16315 MB)
18:31:22:081 (14832) > Avail RAM = 10.7 GB (10979 MB)
18:31:22:081 (14832) > Total virtual memory = 4.0 GB (4095 MB)
18:31:22:083 (14832) > Avail virtual memory = 2.5 GB (2521 MB)
18:31:24:574 (14832) > Exception at: 01/01/2020 at 18:31:24
Exception code: Access violation (c0000005): (Read of address: 0x00000004)
Exception address: 0x00e0aa4c
3.) I've been playing Shadow of War and every now and then, the game crashes and throws this error: "The GPU device instance has been suspended."
4.) I've been playing Star Wars Jedi: Fallen Order and have received the following error: "LowLevelFatalError Unreal Engine is exiting due to D3D device being lost. Error: 0x887A005"

As seen above, my specs should be able to handle any of those games without straining too much, even though the game recommends me to use Ultra settings (and I do).
This is how my MSI Afterburner is set up:
msi.png


My GPU reaches that 81°C temperature limit, but it stays there, goes down, comes back up and so on. Nothing out of the ordinary?

Now, like I said, I'm not much of a tech guy and I don't know how to troubleshoot my GPU (or other components if they're the real reason and not the GPU itself). I also don't know what do the errors above mean.
I've obviously tried to Google those problems, but I've tried following the instructions given to no avail. I've updated my GPU drivers, problem persists.
My NVIDIA Control Panel 3D settings:
nvidia.png


Any help?
 
Solution
While writing this, I also remembered that my case has an adjustable fan speed. It was set to 1 out of 3. I've set it to Auto now and redid the benchmark test with tessellation set to Extreme (previous test had it disabled) and got this result:
benchmark2.png


During the test, the temperature was mainly around 77°C. Going down to 75°C and then back up and a few times it reached 78°C.
I think there is a fine line between "OK" and "too hot" and I think that line may lie right around 80C or the low 80Cs.

So you might be fine with 78C all day.....but 84C causes crashes.
I think it's still possible you may have a temperature problem.
I think by setting the temperature limit.....the GPU might throttle itself so as not to go above the limit.
What happens if you raise the limit to say....100C.
Also, I would run with the case open and something like a desk fan blowing in there and see what happens. ( I would do this separately from raising the limit).
Also.....can you set the fan speed in Afterburner to "AUTO"....or is it in "AUTO"?
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
The most temperature MSI Afterburner allows me to set is 88°C.
As for the fan speed, it's set on Auto.
This is my current while playing, no change on the error part though. It still occurs.
current.png
 
I've never seen 84C on any of my NVidia cards.....and that includes a GTX 1080, GTX 1060, GTX 470 and an RTX 2080 Ti. They are always under 80C even at 100% usage.

My RTX 2080 Ti tops out at 76C.

Have you tried running with the case open and a fan blowing in there?
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
So I ran the benchmark with the suggested software and got the following result:
benchmark.png


All during the test, the GPU temperature was stable at 78°C.

I also opened the side of my case to let the air in, but I don't have any extra fans to bring in the extra cooling. I should note that I only have 1 frontal and one back fan inside my case.

inside.jpg


I haven't had any temperature problems with my previous PC builds whilst having the same amount of fans. Do you think increasing that amount would be beneficial?

Though how is my GPU's temperature connected to the errors I've described above?
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
While writing this, I also remembered that my case has an adjustable fan speed. It was set to 1 out of 3. I've set it to Auto now and redid the benchmark test with tessellation set to Extreme (previous test had it disabled) and got this result:
benchmark2.png


During the test, the temperature was mainly around 77°C. Going down to 75°C and then back up and a few times it reached 78°C.
 
While writing this, I also remembered that my case has an adjustable fan speed. It was set to 1 out of 3. I've set it to Auto now and redid the benchmark test with tessellation set to Extreme (previous test had it disabled) and got this result:
benchmark2.png


During the test, the temperature was mainly around 77°C. Going down to 75°C and then back up and a few times it reached 78°C.
I think there is a fine line between "OK" and "too hot" and I think that line may lie right around 80C or the low 80Cs.

So you might be fine with 78C all day.....but 84C causes crashes.
 
Solution

Delythien

Honorable
Sep 1, 2015
78
1
10,545
I see, that's good to know. So what does the "The GPU device instance has been suspended." error actually mean? Does this happen when GPU gets overheated and then it reboots or gets disconnected to prevent overheating or something?
 
While writing this, I also remembered that my case has an adjustable fan speed. It was set to 1 out of 3. I've set it to Auto now and redid the benchmark test with tessellation set to Extreme (previous test had it disabled) and got this result:
benchmark2.png


During the test, the temperature was mainly around 77°C. Going down to 75°C and then back up and a few times it reached 78°C.

Did you try increasing GPU fan speed? On some cards, the default fan curve is not enough to properly cool the GPU.

Are you overclocking your CPU or RAM?
Do you have the latest drivers and latest version of the game?
 
I see, that's good to know. So what does the "The GPU device instance has been suspended." error actually mean? Does this happen when GPU gets overheated and then it reboots or gets disconnected to prevent overheating or something?
I'm not really sure regarding that specific error.....but it doesn't surprise me you would get an error like that with the card being too hot.
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
Did you try increasing GPU fan speed? On some cards, the default fan curve is not enough to properly cool the GPU.

Are you overclocking your CPU or RAM?
Do you have the latest drivers and latest version of the game?
No, I haven't tried increasing the fan speed. I've set it to Auto.
I'm also not overclocking anything.
And the GPU drivers are latest with all the games being up to date with their files integrity checked (on Steam).
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
I'm not really sure regarding that specific error.....but it doesn't surprise me you would get an error like that with the card being too hot.
I see. But how come the card would get that hot? Is the most probable reason my lack of case fans or does it have something to do with the GPU - as in its lack of fan power or something?
 
No, I haven't tried increasing the fan speed. I've set it to Auto.
I'm also not overclocking anything.
And the GPU drivers are latest with all the games being up to date with their files integrity checked (on Steam).

Try increasing the GPU fan speed or make your own fan curve.
I had a Vega 64 and it would throttle with the fan at Auto.

I really miss AMD's fan curve, it just works and the interface is much simpler, nicer and better, now I am having issues applying a fan curve with my RTX 2080 Ti STRIX OC.
Nvidia's control panel is prehistoric, looks like Win XP.
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
Try increasing the GPU fan speed or make your own fan curve.
I had a Vega 64 and it would throttle with the fan at Auto.

I really miss AMD's fan curve, it just works and the interface is much simpler, nicer and better, now I am having issues applying a fan curve with my RTX 2080 Ti STRIX OC.
Nvidia's control panel is prehistoric, looks like Win XP.
And what would be a proper custom made curve?

curve.png
 

Delythien

Honorable
Sep 1, 2015
78
1
10,545
Funny enough, when I switched the case fans to Auto, I haven't reached more than 75°C playing two different games. Apparently that's what was missing. Hopefully I won't encounter the error without my GPU overheating.

So, guess that's problem solved. I'll keep a close eye on the temperature. Thanks to you both. If anything changes, I'll make sure to ask more.