CrestHD

Reputable
Feb 27, 2016
41
3
4,545
Hello,
I've got problems with my GPU since yesterday, I've got to admit I used AMD's Wattman where I experimented with power limit to +50% the last few days. Before that I used the Turbo button to have a nice and simple overclock for about 1-2 months. I was checking temperatures constantly and at worst it hit 81C on core and I completely ignored HBM memory because I forgot, so I'm thinking that the memory might have burned a little too much. I tried reinstalling the drivers with DDU, stress tested everything, CPU with prime95, system ram with memtest86, GPU Vram with OCCT and overall gpu stress test with OCCT. First I was sure it was the VRAM, but the vram test actually doesn't crash, once I started the normal GPU stress test it crashed immediately, after that the computer restarts but no picture, it has to be turned off completely and then turned back on to get the picture back on screen.
I can play 4K videos, run applications like excel with huge tables and all basic stuff, but as soon as I start a game it crashes. I believe I'll have to RMA the card, try my luck there whether they swap it or not despite being OC'd. Still I am curious what caused it, so I don't make the same mistake again, was it wattman?
Picture before crashing
View: https://imgur.com/a/SMfJXBn


Video before crashing
View: https://imgur.com/a/GkZCa5w
 

CrestHD

Reputable
Feb 27, 2016
41
3
4,545
I actually used the silent bios for a long time with "power saver" in wattman, it consumed around 158 watts like that, but then started experimenting with power limit and saw it get above 300W, with core clocks in the 1600s very few times 1700s. I never changed voltages or core clocks manually though. Always used the turbo/power save buttons and newly found power limit slider, the last one only for a few days, then the card went bad soon after. What I failed to mention is that I have 750W PSU, could it be it got pushed too much? Actually that's what failed and cannot sustain any system load anymore? I also OC'd the DDR4 memory and Ryzen 7 1700. Of course now everything is at default settings, tried getting the power limit to -50% in wattman on the GPU, it drew around a 100W and actually played games for a while, then the same issue came up again.
 
Last edited:

xollextor

Commendable
Jul 28, 2019
62
0
1,540
From what I've heard those kind of artifacts are usually caused by VRAM, sometimes when it's unstable and other times faulty, my bet is your VRAM isn't stable, try increasing the VRAM voltage by small increments until you reach stability OR you could try and underclock your VRAM.
 

CrestHD

Reputable
Feb 27, 2016
41
3
4,545
From what I've heard those kind of artifacts are usually caused by VRAM, sometimes when it's unstable and other times faulty, my bet is your VRAM isn't stable, try increasing the VRAM voltage by small increments until you reach stability OR you could try and underclock your VRAM.
I got excited for a few minutes there. I managed to run the stress test I couldn't before because it would crash right away. Then I got even more excited and started up Red Dead 2, but sadly after loading in to the game, it crashes right away with the same artifacts. I guess it is the vram then, must have overheated, because of the things I did in wattman. :[
 

xollextor

Commendable
Jul 28, 2019
62
0
1,540
I got excited for a few minutes there. I managed to run the stress test I couldn't before because it would crash right away. Then I got even more excited and started up Red Dead 2, but sadly after loading in to the game, it crashes right away with the same artifacts. I guess it is the vram then, must have overheated, because of the things I did in wattman. :[
Oh well, at least you now know where the problem is coming from, 81°c doesn't seem too hot to start damaging components.

Can you tell me by how much did you increase the VRAM voltage this time?
 

CrestHD

Reputable
Feb 27, 2016
41
3
4,545
The maximum I can apply in wattman is 1200 mA, I tried undervolting too to 1000 mA (1100 is default) and decreased the memory clock, same outcome.

And by the way, 81C on the gpu core, I wasn't paying any attention to HBM memory temperatures at the time.
 

OllympianGamer

Honorable
Dec 22, 2016
317
50
10,890
You can't apply regular gpu logic to vega cards as it's not how they work. Try this with everything from default.
Set fans to 100%
Set power limit +50%
Looking at the newest stock bios p-state 5 is 1400mhz, set 6 and 7 to the same and set 5, 6 and 7 voltages all to 1000mv.
Vega has a dynamic boost clock linked to temperature which at stock has a 70°c limit but you also have to realise that the core voltage can't go lower than the vram voltage which I think at default is the same as p-state 4. Open something like super position, game at 4k or 1080p extreme and then run cinematic mode, see what boost clock you get and monitor temperature with the hwinfo in a separate window. I dont bother with afterburner when undervolting as I used to have conflicts with them.
 

CrestHD

Reputable
Feb 27, 2016
41
3
4,545
Alright, so in the meantime I've got my hands on another graphics card, 5700XT and the same issue is still present, though not as bad. Instead of crazy epileptic levels of artifacting like in the video above, it just goes to black with a pink line in the middle. I'm starting to think it's something with the motherboard?
 

CrestHD

Reputable
Feb 27, 2016
41
3
4,545
Uhm.. okay.. So, so far it's been smooth sailing with the 5700XT I borrowed. Turns out , my friend told me to take a look at the power supply and actually. It melted! The PCI-E connected to the PSU melted! I was very surprised. Had a look at the recommended wattage for the card, that was 850W and I have a 750W Cooler Master 750V. Not only that, but as I said I increased the power target of the GPU to +50% and OC'd CPU and RAM as well. I guess I'm lucky all I needed to do is change a cable and RMA a card rather than it catching fire!