[SOLVED] The temperature difference between GPU and GPU HotSpot is huge

Sep 19, 2022
2
0
10
Recently I've reinstalled my OS. After that I've noticed that the GPU HotSpot temperature overly increased, although the overall temperature of the GPU has remained in the same familiar range in which it was before.
The difference has never ever been like that! 24 degrees!

ip0sWjn.png


To be clear I have Palit RTX 3070Ti Gaming Pro. And also I've did undervolt and OC'ed it. But I've been using those curves almost a year and everything was okay..

wh6.png
kwe5FXn.png


And also there is one more thing I need to say. 3 month earlier I've changes thermal pads with one of the coolmygpu copper plate.
I've used Noctua NH-2 thermal paste with it. I've never mined on my GPU but was very worried when memory temps were higher that gpu hotspot.
Thats why I decided to install copper plate. And after installation everything seemed perfect! Nice memory temps, nice gpu temps. But after 3 month and 1 OS reinstallation I am worried once again.

I don't think I can believe that 24 degree diff is normal. Is there something wrong? Is my GPU dying right now?
 
The elephant in the room to me is you didn't undervolt the video card. Assuming the solid line in the VF curve is the normal VF curve (minus that spike), what you did was overvolt the GPU. The VF curve is what it sounds like: for a given voltage, what frequency should the GPU be at?

For instance, this is what your curve is doing (I have an RTX 2070 Super, but the behavior is the same)
6Q1XugC.png

Notice that for about 1700MHz, the voltage is 1.038V and the power consumption is about 160W

Here's what an undervolt is supposed to look like:
WVm7fgF.png

Now notice that the GPU is running at 1800MHz, but using 0.825V and 123.5W.

EDIT: As a point of comparison, this is the default curve:
ODetGI9.png


The clock speed is up to 19350MHz, but the voltage is at 1.038V with a power consumption of 173W. So similar to the overvolted case.

EDIT 2: Also I'm aware HWiNFO is reporting one of my drives as failed, it's a false positive.
 
Last edited:
The elephant in the room to me is you didn't undervolt the video card. Assuming the solid line in the VF curve is the normal VF curve (minus that spike), what you did was overvolt the GPU. The VF curve is what it sounds like: for a given voltage, what frequency should the GPU be at?

For instance, this is what your curve is doing (I have an RTX 2070 Super, but the behavior is the same)
6Q1XugC.png

Notice that for about 1700MHz, the voltage is 1.038V and the power consumption is about 160W

Here's what an undervolt is supposed to look like:
WVm7fgF.png

Now notice that the GPU is running at 1800MHz, but using 0.825V and 123.5W.

EDIT: As a point of comparison, this is the default curve:
ODetGI9.png


The clock speed is up to 19350MHz, but the voltage is at 1.038V with a power consumption of 173W. So similar to the overvolted case.

EDIT 2: Also I'm aware HWiNFO is reporting one of my drives as failed, it's a false positive.
Where do you read an overvolt out of? Oo
TC's card is locked to 925mV and 1965MHz clock maximum. The curve is set slightly different, but it is literally the same thing as in your screenshot. This card can go up to 1.065V or something in that ballpark. It definitely undervolted. There is an OC, though, which doesn't really matter in the context however.
 
Gpu isn't dying. Cooler cold plate isn't as flush with the gpu die before the modifications you made.
I've reinstalled gpu cooler and tweaked copper plate a little bit and now the diff between gpu temp and gpu hotspot temps are significantly decreased. That was such a relief xD