Oct 24, 2021
Hello there.

This post might be a bit lengthier one, but I will try to include as much info I can.

I got a Asus Dual RX 580 8GB OC AMD GPU, which was fine up till some time ago. I do not know for how long is my issue going on for now but it is not very fresh like a day or two, it could be 1 2 or 3 months even.

The thing that happens is that GPU will randomly just stop displaying the picture on the monitor just as if the monitor cable was pulled out of the GPU connector without any other way of bringing it back than just rebooting the PC.

I suspected the PSU for some time, but I gave up on that idea since GPU behaved the same way on the second PSU as well.

The second option of drivers causing the issue, I do not really subscribe to that theory as much to be honest since I am on GNU Linux distributions, and AMD drivers are packed in the Linux kernel itself that regularly get updated with each kernel update itself.

The issue was present on 2 different PSUs, 2 different motherboards and 2 different configuration in general, as well as on 2 different Linux kernels, or 2 different Linux distributions in other words.

Through monitoring the GPU itself, I noticed that my preset overclock values would rather drop down bellow what is set for them to do. And I assume that some GPU controller automatically attempted to keep the card stable by reducing the amount of the CORE clock value. Where while clock value was fixed and forced to be the one that is set, it would cause the crash. Overclock values that were used, used to work always before, and also are not even maxed out to what manufacturer specifies, my clock values were rather reasonable. (In example, say that core was allowed to go to 1150 or 1200, I would set it to 1100 so I didn't really push it to the borders).

I am completely lost and I do not know what to suspect anymore in the end, except GPU being faulty, but I have no idea what exact thing on GPU it could be. On the other hand side, I slightly suspect VBIOS, since GPU has VBIOS modded on it. Again I am not fully subscribed to that theory but I leave it as a possibility cause who knows.

I had few more things to say, but as I was writing this, I forgot what it was. If there is something present which isn't fully clear, or further information is needed for elaboration purposes please let me know I will add available info.

EDIT#1: I remembered one of the things I wanted to say. Crash doesn't happen to often, nor it is specifically tied to load (although it might be in some way). Crash would occur like once a day, once every 2 or 3 days or more, it is not fixed interval more like random. Also speaking of the load, my GPU is mostly under the load, be it gaming or some other program use / calculations and what not. But the GPU does not crash as soon as you apply the load on it nor in 10 or 30 minutes. It works pretty well under the load as well for day(s) until it just decides to crash.

EDIT#2: Speaking of the overclocking, I forgot to say that I noticed GPU crashing sometimes with the stock clock values as well. I use even lower clock values than the stock ones and I think it would crash with them as well like once every 3 or 5 days or similar, so if someone thought suggesting removing overclock values, I think it doesn't really help.
