Holy craps, it appears you might actually be right.
How I verified it? I started up afterburner 4 and let it log all data from the GPU (including voltage), waited for the black screen to happen, and put all of this data into an exel table and added some charts to it, you can download the document here (please open it with LibreOffice): https://www.dropbox.com/s/sz08ddgzlwh3d3k/HardwareMonit...
I have split the data into a green and a red section. The green background marks that everything is still going well, whereas the red section marks the time span during the malfunction. I have also added a dotted red line to over all charts blow, marking the exact point of where the malfunction starts. The charts show a clear break the consistency that existed prior - note that this has all happened during total idle!
My conclusions after considering the data and general observation:
Obviously, unlike I thought before, this process take a certain time of "instability" (~10min in this case) and then finally if no user input/activity is being given. If you happen to catch the graphics card during this instability time span you will probably prevent the card from a crash
Well as you see the behaviour is comparable to what I have in my exel document and it successfully recovered.
Also note that some data is obviously distorted: There's no way for the graphics card to constantly switch between ~30 and 0 degree Celsius. It may be that there are simply some gaps in logging or whatever.
However other data may indeed be accurate: For example you may notice that the logs report the fan speed being constanlty at 0% -- well I opened my PC and holy craps, this is indeed the case!
However I noticed there is also a bug which causes exactly the opposite: During normal usage (e.g. browsing the www) the Win7 areo effect suddenly turns off... so apparently something is not right with the driver/graphics card. So I looked into afterburner and saw, that my graphics card was constantly a a very high power state. Since it did not stop I manually reclocked my card using afterburner and suddelny everthing whent to normal.
Apparantly the cards reclocking somehow hangs. I don't know if the bug has the same cause as the first one, however there are some clear similarities.
What do you guys think of it? Any thoughts or feedback? And don't forget to take a look at my document, since I really made an effort to put extensive information into it.
So now I would like to try modifieing my voltage ahnd see if it fixes those issues... does anyone know how much I need to raise it?
Thanks in advance and have fun with the data! 😛