[SOLVED] GPU crashing under load

NotBatman

Reputable
Jul 18, 2015
35
0
4,540
Hi guys

I turn towards this forum because I’m all out of ideas.
I have build a new watercooled pc since 2 months and since (estimated) 2 weeks my GPU has been crashing under heavy load: sudden black screen with PC still on. From there on the only thing I can do is a manual shutdown.
Specs:
Gigabyte Windforce Gaming OC RTX 2080ti
I9900k (not OC’ed)
Asus Maximus XI Formula
4x8GB 3600mHz Corsair Dominator RGB (XMP 1)
Seasonic 1000W Prime Ultra Platinum
Corsair Commander Pro
6x Corsair LL120 fans
3x Corsair ML120 premium fans
Darkside 30cm UV led
Lian Li Strimer CPU and GPU RGB cables
D5 pump Alphacool
2x 360mm EKWB radiators
EKWB Velocity ARGB CPU block
EKWB Vector RGB GPU block

I tested all components on air first and they all worked perfectly.
At the beginning my GPU worked perfectly under high load. I could maintain decent overclocks with temperatures maxed right below 60 degrees Celcius. I also did a BIOSflash to a rom (for the same card, but with a higher power limit 140% vs 111%, found on the Techpowerup website), with zero problems. Even with hours of heavy stress testing no problems.
Then after a few weeks I started noticing the crashes (sudden black screen) during gaming (more specifically when I started playing Kingdom Come Deliverance). It also happens when I benchmark in Heaven Unigine.
The only thing that has changed (I think) in the mean time was that I did a clean driver/Geforce installation, which normally doesn’t cause problems. I did that because of constant Geforce crashing. I also installed the DCH Nvidia driver instead of the standard one, because it said it was otherwise incompatible.
When monitoring the GPU in MSI afterburner, I noticed slightly higher average temperatures than normal, and also (but I could be wrong) less “stable” temperatures (in Heaven Unigine). The crash always happend when the GPU was around 58 degrees Celcius. I also noticed that the voltage limit would be hit a lot more times than usual.

Things I have already done:
  • Checked all cables inside/outside the computer.
  • Changed poweroutlet
  • Disabled overclocks
  • Installed original BIOS again
  • Re-installed drivers a few times
  • Underclocked core clock by 100mHz

Nothing worked: games/Heaven Unigine keep crashing after only 5 minutes under load.

I would be forever gratefull if you guys could assist me.

Thanks in advance
 
Solution
dwm.exe

Desktop Windows Manager

Link for more information about DWM:

https://windowsreport.com/desktop-window-manager/

You can and should google for similar links as you believe is necessary and relevant.

Kernel Power 41

Multiple causes. For example:

https://ugetfix.com/ask/how-to-fix-kernel-power-41-error-on-windows-10/

Use the link as a reference to check out your system. I am not recommending or endorsing any of "download now" type fixes.

As for VRM's, yes they could be a problem.

Tom's Hardware has published a Reference article regarding VRMs.

https://www.tomshardware.com/reviews/vrm-voltage-regulator-module-definition,5771.html

Take a close look at the required specifications for your...
Did you reseat all cards and RAM?

Anything new and unexpected being launched at startup ( check Task Manager > Startup tab)?

Boot up as normal but do not do anything that will take the system into "heavy load/crash".

Look in Reliability History and Event Viewer for error codes and warnings that correspond with the crashes.

Use either Task Manager or Resource Monitor to observe system performance. First just boot and watch. Then do some light load work and continue watching . Lastly, go into heavy load but do so as slowly as possible as you watch.

Hopefully you will note some resource or overload condition that builds until the crash. E.g., memory being consumed and not released.

Or some background app trying to update, phone home, or do some backup for itself.
 
Did you reseat all cards and RAM?

Anything new and unexpected being launched at startup ( check Task Manager > Startup tab)?

Boot up as normal but do not do anything that will take the system into "heavy load/crash".

Look in Reliability History and Event Viewer for error codes and warnings that correspond with the crashes.

Use either Task Manager or Resource Monitor to observe system performance. First just boot and watch. Then do some light load work and continue watching . Lastly, go into heavy load but do so as slowly as possible as you watch.

Hopefully you will note some resource or overload condition that builds until the crash. E.g., memory being consumed and not released.

Or some background app trying to update, phone home, or do some backup for itself.

Thank you for your response.

I have done as you said. The memory works normal (tested with memory sticks reseated and XMP on/off).
I haven't noticed any weird changes on Task Manager or Resource Monitor.
The only weird thing I noticed is that the GPU gets alot warmer when I run Heaven Unigine. A few weeks before I had this issue the temperatures were stable in the high 40's and now it gets in the high 50's and doesn't show any sign of stabilizing. Even more bizarre is that the crash always occures when the temperature reaches 57-58 degrees Celcius.
I also notice the voltage fluctuates perhaps a bit more: it goes up to 1043mV.
I tested the card completely at stock settings and original BIOS.
I didn't touch the GPU physically so the waterblock should be just as good as it was in the beginning.
Could it be a problem with the VRM's?
In the logs these are the errors that keep reoccuring during the crash
  • Reliability History: APPCRASH: C:\Windows\System32\dwm.exe
  • Event Viewer: Kernel Power 41

Thank you for the help
 
dwm.exe

Desktop Windows Manager

Link for more information about DWM:

https://windowsreport.com/desktop-window-manager/

You can and should google for similar links as you believe is necessary and relevant.

Kernel Power 41

Multiple causes. For example:

https://ugetfix.com/ask/how-to-fix-kernel-power-41-error-on-windows-10/

Use the link as a reference to check out your system. I am not recommending or endorsing any of "download now" type fixes.

As for VRM's, yes they could be a problem.

Tom's Hardware has published a Reference article regarding VRMs.

https://www.tomshardware.com/reviews/vrm-voltage-regulator-module-definition,5771.html

Take a close look at the required specifications for your GPU.

From the VRM link:

"Note that VRMs that are too small for their GPU can break if the current the VRMs are sending to the GPU are too high for it."
 
Solution