My HD 7870 might have overheated and fried - Advice?

Killerbanana

Honorable
Apr 11, 2013
15
0
10,510
Hi Everyone,

Your help with this would be really appreciated. I know it's a long post, but I'm trying to give as much detail as possible. There's a tldr at the bottom.

I recently RMA'd a faulty HD7870 (the factory confirmed that it was faulty). I was sent a different model of HD7870 in return, and I noticed it ran quite hot (a fair amount hotter than the other GPU, which I highly doubt overheated).

BUILD:
OS: Windows 8.1
CPU: Intel i7 3770k
RAM: 2 x Crucial Sport 4GB, Ballistix 240-pin DIMM, DDR3 PC3-12800 Memory Module
GPU: Radeon HD 7870 (VTX 3D Black Edition)
Sound Card: Asus Xonar D2X
PSU: Corsair Gaming Series 2013 Edition GS 600W ATX/EPS 80 PLUS Bronze
Motherboard: Gigabyte Socket-1155 Z77-DS3H Motherboard

THE SYMPTOMS:
1. I was playing Tomb Raider (the new-ish one) and my computer crashed. I noticed a slight reduction in fps that the GPU seemed to do when it was throttling due to overheating, and chose to continue playing to see if the GPU would sort itself out. It had been fine with about 5-6 hours of gameplay on Tomb Raider before this, and probably about 4 hours of gameplay on the horror game 'Anna'. Perhaps this was a mistake: the screen froze with vertical brown stripes on a white background, and the sound was stuttering with this screen. The computer was unresponsive, so I had to do a hard restart.

2. When I restarted my PC, I got the initial motherboard screen followed by the Windows logo, but where I would normally be asked to log in a screen didn't show. After several restarts,the Wingows 8 start up problems dialogue came up, and I system restored to before the problem. This fixed nothing; I had the same issues. After a successful start-up I noticed a few odd things; Avast needed reinstalling (the hard restart may have caused a problem) and I couldn't find my crash logs (Bluescreenview helped me diagnose a problem before, but for some reason it can't find any minidumps anymore, and I don't know why but there is no 'Minidump' folder in C:/Windows - any help in simply finding what caused the crash would be really useful).

3. It seems that the computer thinks the GPU still gives out a signal. The GPU definitely gets power (the fan whirs), and my motherboard doesn't let internal graphics kick in when the GPU is plugged in. I can only get a screen when my GPU is unplugged.

EXTRA INFORMATION;
-As soon as I received the card I tested it in Furmark a bit. It ran quite hot, and seemed to stabilise around 85 degrees Celcius. Furmark is demanding software, so I presumed the GPU wouldn't normally be under this load. Also, I noticed that the GPU activated throttling above about 65 degrees; my fps would drop from about 44 to 33 between 60 and 75 degrees.

-I have not OC'd the card.

-My case has a lot of air flow: a decent Zalmann CPU cooler, along with 5 fans from good brands. I even taped another 80mm fan (again from a good brand) to the side of the case to blow directly onto the side of the GPU when I noticed it ran hot. A sound card sat about two inches below the GPU fan and may have blocked some airflow.

MY TAKE
Well, it seems likely that the GPU was fried, but shouldn't my computer should shut down before permament heat damage was done? I had been gaming with the GPU for at least 8 hours before this occurred, with at least two longer gaming sessions that I can think of before this which were absolutely fine. If overheating is what happened I'd be very unimpressed that a new GPU in a case with decent airflow would have such terrible cooling (though I have mentioned the sound card perhaps reducing airflow around the GPU). The GPU definitely isn't completely dead: when it is plugged in the fan still runs, and the card still gets hot despite giving me no picture.

TLDR

¬¬My GPU might have fried. Does anyone have some advice? It would be especially useful if someone could tell me how to find the crash dump that might have been created when my computer crashed, or recommend some way of fixing or diagnosing the card. Also, would I be able to get away with a second RMA? Or was this really stupid of me and not covered by a warranty?

Thank you so much for any answers, these GPU problems are driving me mad.


I should add that I fairly thouroughly tested my RAM and CPU because of the previous faulty GPU: Ram- memtest, tried one stick at a time. CPU - some CPU testing software I can't remember the name of.
 
Solution
If you haven't reloaded Windows then the Event Viewer will have the crash information and the probable cause. The Bluescreen viewer should have picked up the info from the Event Viewer.
By your description of the events it doesn't appear that you did anything unusual with the video card and you will have to RMA it again.
If you haven't reloaded Windows then the Event Viewer will have the crash information and the probable cause. The Bluescreen viewer should have picked up the info from the Event Viewer.
By your description of the events it doesn't appear that you did anything unusual with the video card and you will have to RMA it again.
 
Solution


Thanks, Event Viewer does have some details of the crash. It is: Event ID: 41 ("The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."), Source: Kernel-Power.

It sounds like it was waiting for a component to respond.

Full details:

Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 21/12/2013 01:12:38
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (2)
User: SYSTEM
Computer: Not sure if posting computer name is a risk
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
<EventID>41</EventID>
<Version>3</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000002</Keywords>
<TimeCreated SystemTime="2013-12-21T01:12:38.280046900Z" />
<EventRecordID>8260</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer></Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="BugcheckCode">0</Data>
<Data Name="BugcheckParameter1">0x0</Data>
<Data Name="BugcheckParameter2">0x0</Data>
<Data Name="BugcheckParameter3">0x0</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">0</Data>
<Data Name="PowerButtonTimestamp">0</Data>
<Data Name="BootAppStatus">0</Data>
</EventData>
</Event>
 
Your video card should not have overheated by simply playing a non demanding game and it should have been able to do a stress test without overheating.
A stress test is designed to reveal any defects so that it can be fixed. After market cooling is good but it shouldn't be a requirement , when you receive your new card hopefully it will run a bit cooler from the start.