Dear all, a fairly major issue I've been hapenning for the past few weeks.
I purchased a prebuilt with the following specs, it arrived in january:
My main OS is windows, but I have a linux mint installation on the WD SSD.
The issues started appearing just over a month ago, it would crash when opening any kind of graphically intensive program. (ETS2, synthetic benchmarks) and occasionally crash in normal operation as well. A few times the screen froze, the last sound would continue to play from the speakers (usually buzzing as well), then the computer would restart after the screen went black. Most of the time though, the screen just goes black, and shortly after the cpu cooler makes a short sound indicating that it is restarting.
In the event logger, a critical "Kernel-Power" error is seen (41), shortly before there is a kernel power info (172) with bugcheck 0x116 (VIDEO_TDR_FAILURE). I have opened a MEMORY.DMP with the windows debugger, it shows atikmpag.sys to be the cause of the crash.
Importantly, after this crash starts hapenning, they also happen in linux mint and event memtest86 booted from a flash drive.
The temporary solution is uninstalling the drivers in windows using DDU then reinstalling, with factory reset enabled. Not using factory reset does not fix it. After this, I usually have a day or so before the issues start to appear again, on all systems. Over time, this time between reinstalling and crashes gets shorter. So far, I think I've worked out that the issues only start to appear after a shutdown or sleep.
I've ran memtest86 for ~45 minutes while the computer was not crashing, with no issue. I've also run burnintest, furmark, occt and several games when the computer is not crashing, with no apparent loss in performance. I have also tried individual RAM sticks, the crashes continue to happen. Reseating the graphics card, enabling and disabling XMP profile 1 in the BIOS seems to have no effect.
As I have warranty support for this pc, I contacted them about this and described the issues. They sent out a new GPU, which worked correctly for about a week before the crashes returned with increasing frequency.
I have read on several forum posts that XFX factory overclocks their cards but does not change the max power limit to match, suggesting increasing the max power limit and changing clock speeds. I have tried decreasing memory clock speed and main clock speed and increasing power limit in the AMD adrenaline section, crashes keep hapenning. Additionally, the settings do not seem to save after a restart. However, a quick browse on the web shows that its not uncommon.
Something to also not which may be unconnected, input is lost briefly when I plug and unplug certain devices from the mains. These tend to be speakers, I've read that its quite likely that cheap HDMI cables that aren't properly shielded suffer from interference such as this.
My thoughts:
Any thoughts of further diagnostics would be welcome, I would like to try and get all bases covered to try and work out what the hell is going on before the whole unit is sent to the warranty company.
Many thanks,
GoldSloth
I purchased a prebuilt with the following specs, it arrived in january:
- Gigabyte B450M DS3H-CF
- AMD Ryzen 5 2600
- Adata XPG Spectrix D60G 16GB (2x 8GB) 3200MHz RAM
- Kingston A2000 500GB M.2-2280 NVMe PCIe SSD
- XFX Radeon RX 580 GTS XXX 8GB Graphics Card
- Be Quiet! System Power 9 600W 80+ Bronze PSU (Not modular)
- 1TB Seagate Barracuda HDD
- 240GB WD Green SSD
My main OS is windows, but I have a linux mint installation on the WD SSD.
The issues started appearing just over a month ago, it would crash when opening any kind of graphically intensive program. (ETS2, synthetic benchmarks) and occasionally crash in normal operation as well. A few times the screen froze, the last sound would continue to play from the speakers (usually buzzing as well), then the computer would restart after the screen went black. Most of the time though, the screen just goes black, and shortly after the cpu cooler makes a short sound indicating that it is restarting.
In the event logger, a critical "Kernel-Power" error is seen (41), shortly before there is a kernel power info (172) with bugcheck 0x116 (VIDEO_TDR_FAILURE). I have opened a MEMORY.DMP with the windows debugger, it shows atikmpag.sys to be the cause of the crash.
Importantly, after this crash starts hapenning, they also happen in linux mint and event memtest86 booted from a flash drive.
The temporary solution is uninstalling the drivers in windows using DDU then reinstalling, with factory reset enabled. Not using factory reset does not fix it. After this, I usually have a day or so before the issues start to appear again, on all systems. Over time, this time between reinstalling and crashes gets shorter. So far, I think I've worked out that the issues only start to appear after a shutdown or sleep.
I've ran memtest86 for ~45 minutes while the computer was not crashing, with no issue. I've also run burnintest, furmark, occt and several games when the computer is not crashing, with no apparent loss in performance. I have also tried individual RAM sticks, the crashes continue to happen. Reseating the graphics card, enabling and disabling XMP profile 1 in the BIOS seems to have no effect.
As I have warranty support for this pc, I contacted them about this and described the issues. They sent out a new GPU, which worked correctly for about a week before the crashes returned with increasing frequency.
I have read on several forum posts that XFX factory overclocks their cards but does not change the max power limit to match, suggesting increasing the max power limit and changing clock speeds. I have tried decreasing memory clock speed and main clock speed and increasing power limit in the AMD adrenaline section, crashes keep hapenning. Additionally, the settings do not seem to save after a restart. However, a quick browse on the web shows that its not uncommon.
Something to also not which may be unconnected, input is lost briefly when I plug and unplug certain devices from the mains. These tend to be speakers, I've read that its quite likely that cheap HDMI cables that aren't properly shielded suffer from interference such as this.
My thoughts:
- Drivers seem to be poking the hardware somehow, I initially thought that it reflashed the vbios but that is not correct. That would mean that the hardware is somehow resetting/corrupting whatever was poked by the driver reinstall.
- A malfunctioning component of the system is damaging the GPU, in my opinion this would be the power supply.
Any thoughts of further diagnostics would be welcome, I would like to try and get all bases covered to try and work out what the hell is going on before the whole unit is sent to the warranty company.
Many thanks,
GoldSloth