Question Unknown problem

H0PEFU11Y

Prominent
Apr 30, 2019
120
7
615
Specs:
CPU Ryzen 2600
GPU Vega 56 6GB
Memory Samsung 860 EVO SSD
PSU Corsair TX750M
Motherboard B450M Gaming Plus

Hi there! I'm new to this forum after being suggested to put my problem here. I am (ironically) a computer science student.

So basically my computer has been having problems where it will suddenly black screen and the fans will go fast and I have to turn off the computer via the power button. Originally I thought it was the PSU being too underpowered, but then I upgraded it to one with more wattage and a higher efficiency rating, and it was still having the issue. I then thought it was the GPU, but I have RMAed (sent for testing) two times and they've said that it's been fine both times.

From what I have been able to gather so far, it is some sort of voltage/ power issue, as I can cause it by plugging in anything to the same extension lead, or even in a different one that is the same wall socket. As a result of this, I've tried not using the extension lead and just having the computer plugged into the wall outlet but it is still happening. Other stuff I’ve tried are different power sockets, different monitor, different drivers, different WATTMAN settings, not daisy chaining pci-e (though this stopped the fans going fast randomly), changing power supply and different system memory. I also used a different GPU and I never had any issues, same when my GPU was tried in another computer.

An observation I've made is that it tends to happen after around 4 hours of being on the PC, regardless of the activity. No particular program/ type of program seems to cause it either, as it can be pretty random (one time it happened within 5 minutes of loading when I was looking at the Internet, another time using Discord, another time gaming etc etc).

When it happened today, I went straight into the event viewer after it happened (and did the whole shut down then turn on palaver), and I noticed a warning with Event ID 4101, which was "Display driver amdkmdap stopped responding and has successfully recovered.". Then afterwards, I had a look back at the same event and it seems to have happened around as much times as this problem has, so I think the two are very likely related. I'm not sure what to make of this though.

Is anyone able to offer any advice on what I could do/ what could be causing it?

Thank-you :)
 
Sounds like a video card issue, maybe a power supply issue or motherboard issue. Sounds pretty simple, if you try a different video card in the system it works. If your card works in another system the card is not bad, so that still leaves something around the card connection, PCIe slot, power supply.

Was the video card you tried in your system in the same class as the Vega? Same power draw at least? Did you test the card in another system yourself? You are also talking about some not standard setups, using extensions for things, that can easily cause issues, or damage things even when you stop using them.
 

H0PEFU11Y

Prominent
Apr 30, 2019
120
7
615
Sounds like a video card issue, maybe a power supply issue or motherboard issue. Sounds pretty simple, if you try a different video card in the system it works. If your card works in another system the card is not bad, so that still leaves something around the card connection, PCIe slot, power supply.

Was the video card you tried in your system in the same class as the Vega? Same power draw at least? Did you test the card in another system yourself? You are also talking about some not standard setups, using extensions for things, that can easily cause issues, or damage things even when you stop using them.

Hi, the other GPU I used was a GTX 650ti I believe, so a different power draw and it also didn't use PCIe cables and didn't have a DP port.

Only been using an extension as I don't have enough sockets for everything.

Is there any ways of testing the mobo/ PSU without RMAing it/ using a different one?

Thanks :)
 
May 2, 2019
34
8
35
+ Switch off dynamic overclocking in CPU or dynamic stepping,whichever is available to you.

+ Switch off dynamic overclocking in GPU or dynamic stepping, whichever is available to you.

If those are not available or existing at all, ignore that and go on ...

+ Limit the CPU-clock via power-|energy- -settings to 60% or 40%. Watch the imediate effect on the reported clock-frequency in your system-monitor. If it has no effect, the CPU uses the wrong driver or needs a microcode-update or the motherboard doesn't support the CPU.

+ Limit the GPU-Clock setting, according to the levels stated by the manufacturer.

+ If you can not or do not want to lower the GPU-clock, watch and monitor the voltages requested/applied to the graphics-card. You named a PCI-e cable; Can we assume it is an extender to a seperately mounted graphics-card versus a standard slot-installation? If so, take a close look at the contacts and at the cable, maybe it has been punctured or otherwise damaged. The latter, as also the diameter of each and any single by cable lengthened pin have effects on the voltage, which in turn affects the stability of the core-clock when risen. Use monitoring-software as available in Over-|Under- -Clocking-Tools, Motherboard-, CPU- and GPU- -monitoring-software. A well as card-specific driver-extension-apps that deliver the same values to compare them against reference.

+ Check if the pins and cards and extenders contacts are clear, or maybe oxidized, due to shitty material or excessive humidity in a near-limit-heating-phase resulting from a dynamic over-clocking.

+ Check your motherboard: Can you apply additional powerlines to the pci-e-bus through the motherboard? If so, use it, as long as the graphics card manufacturer doesn't state otherwise. It stabilizes clock and voltage beforehand active additional powerlines directly connected to the graphics-card. Heavy interferences can be created, due to simple reasons like placement of the card against cables, vents|props, cooling units, which in these position create|induce a directed em-transmission affecting the core-clocks of pci-e and cpu, which in turn collapse when reaching resonance. An oldschool-example for guaranteed happiness is having a microwave or wlan-router driving 54mbit G in a 3 meter range of diverse Dell, IBM and HP Pentium 4 and early Core i7 CPUs, where the rips|fins of the thermal cooling unit effectively resonated on 2.4+ Ghz, killing the machines immediately or with a ultra-short warning-signs. Yes, not often, but happens.

+ Check the RAMs specifications and compare to the cache latency and bus-clock shown in BIOS and in Monitoring-Software, like CPU-Z, AIDA, or similar.

... just some thoughts in addition.

---

Hope it helps. Anyhow. Thanks for reading.

---

POST SCRIPTUM:

I just flew over your description, once more.

+ Check if SERR|DMI Reporting is available and possibly enabled, as this is not supported for a wide range of AMD|ATI Radeon|FirePro|FireGL cards and leads to exceptions, resulting in shutdown of the gpu-core.

+ Some C-States, or to be precise the rate|latency of switching between diverse C-states of the CPU can raise exceptions, that result in the GPU-core trying to soft-reset, but being effectively hindered due to unsupported switching-times, effectively resulting in the GPU-core switching to suspend, which leads to ACPI suspend to ram, which is not viable for Windows, as it has no control over the already gunned-out gpu, resulting in an immediate leave from suspend to ram to full throttle nominal C1 of the CPU, which contradicts with any assumption each participating element has yet made. Ergo: Forced Shutdown, with no possibility of Kernel-Dump at all.

+ Many, many more AMD GPU vs CPU vs APIC vs ACPI vs everything stories that sound very similar, all in all either results of different or non-existent interpretations, mostly activated due to dynamic-clocking and ACPI-state-handling in general, as also due to (incomparison heavy) power consumption and spontaneous draw-above-limits of the graphics-card destabilizing the rest of the system, as it has not have had a chance to up the core-voltages beforehand.
 
Last edited:
  • Like
Reactions: H0PEFU11Y

H0PEFU11Y

Prominent
Apr 30, 2019
120
7
615
Hi there, many thanks for the detailed reply!

+ Switch off dynamic overclocking in CPU or dynamic stepping,whichever is available to you.

+ Switch off dynamic overclocking in GPU or dynamic stepping, whichever is available to you.
Sorry, I'm not quite sure how to do this! Is it in the BIOS? I haven't messed around with settings that much on my PC.

You named a PCI-e cable; Can we assume it is an extender to a seperately mounted graphics-card versus a standard slot-installation? If so, take a close look at the contacts and at the cable, maybe it has been punctured or otherwise damaged. The latter, as also the diameter of each and any single by cable lengthened pin have effects on the voltage, which in turn affects the stability of the core-clock when risen. Use monitoring-software as available in Over-|Under- -Clocking-Tools, Motherboard-, CPU- and GPU- -monitoring-software. A well as card-specific driver-extension-apps that deliver the same values to compare them against reference.
Well, I'm not entirely sure about this either (and am contemplating my level of stupidity). The PCI-e cable runs from the motherboard/PSU (not sure?) to the GPU I believe. The GPU has a port for it to plug directly in, and the cables themselves are 3x2 and 2x1 (have to be joint together to make 4x2).

Do you know of any software that monitors and automatically saves information such as core-clock and temps? I do have software that monitors but doesn't save automatically and so I can't look at stats when it happens.

I'll reply to the rest tomorrow as I'm getting a little tired :)
 
May 2, 2019
34
8
35
First place to look and to get minimal, but complete, toolset for the core details is TechPowerUp; for example your AMD Radeon RX Vega 56 is described there in a neat overview of the tech specs, the supported formats, general usage advice on what subjects the card finds its optimum target, in which version this card was released by which vendor, with what feautre-changes.

In the main menu bar you can see a Download-Link, which leads you to the Download-Page, representing the most famous / most used tools.

Go grab straight the GPU-Z tool, as also at least one benchmark to simulate example loads on the GPUs Shader-Pipelines, Texture-Buffers, RayTracing-Capabilities as a whole, etc.

A classical exemplary benchmark for Windows is FutureMarks 3dMark, which is minimum two decades on the market, if i'm not mislead by my memories.

Until now, all embedded links are actually links to the TechPowerUp-site and accordingly subpages, simply because it is as already mentioned, a reliable basic starting point to dive into the inerts of your graphics-card. You get everything you need to monitor, benchmark and for sure, if you didn't do that already, as well the latest drivers for NVidia and AMD graphics-cards. In your case the AMD Radeon Adrenalin Suite. They are listed also top in the right pane.

Take the time to get a feel what values your environment is build around.

Regarding your w e i r d Expander and|or Riser and|or Adapter .. cable .. combo .. combination, yeah ...

I think that taking a snapshot and publish the link here or embed it directly, would be r e a l l y helpful. [ .. to at least be able to try to understand your in-real-life-setup :) ].

... so far

---

Hope it helps. Anyhow. Thanks for reading.
 
  • Like
Reactions: H0PEFU11Y

H0PEFU11Y

Prominent
Apr 30, 2019
120
7
615
First place to look and to get minimal, but complete, toolset for the core details is TechPowerUp; for example your AMD Radeon RX Vega 56 is described there in a neat overview of the tech specs, the supported formats, general usage advice on what subjects the card finds its optimum target, in which version this card was released by which vendor, with what feautre-changes.

In the main menu bar you can see a Download-Link, which leads you to the Download-Page, representing the most famous / most used tools.

Go grab straight the GPU-Z tool, as also at least one benchmark to simulate example loads on the GPUs Shader-Pipelines, Texture-Buffers, RayTracing-Capabilities as a whole, etc.

A classical exemplary benchmark for Windows is FutureMarks 3dMark, which is minimum two decades on the market, if i'm not mislead by my memories.

Until now, all embedded links are actually links to the TechPowerUp-site and accordingly subpages, simply because it is as already mentioned, a reliable basic starting point to dive into the inerts of your graphics-card. You get everything you need to monitor, benchmark and for sure, if you didn't do that already, as well the latest drivers for NVidia and AMD graphics-cards. In your case the AMD Radeon Adrenalin Suite. They are listed also top in the right pane.

Take the time to get a feel what values your environment is build around.

Regarding your w e i r d Expander and|or Riser and|or Adapter .. cable .. combo .. combination, yeah ...

I think that taking a snapshot and publish the link here or embed it directly, would be r e a l l y helpful. [ .. to at least be able to try to understand your in-real-life-setup :) ].

... so far

---

Hope it helps. Anyhow. Thanks for reading.

Thankyou very much for your reply :) Just done the GPU-Z one and downloading the 3DMark one at the moment. Will update when I've done it :)

I will also try and take a picture of how it is set-up tomorrow if I have some free time. I may also see if I already have pictures as I did take some before but I'm not sure if I still have them.

Today I had another strange occurrence. The monitor was black-screening when I was looking on the Internet and left it for a minute. The fans didn't go fast, and shaking the mouse made it come back after a few seconds. I looked in Event Viewer and it had Event ID 4107 (A caller specified the SDC_FORCE_MODE_ENUMERATION flag in a call to the SetDisplayConfig() API). The events before it were Windows updates- at least 20 of them! Very confusing. I may try a reinstall of Windows but it's something I've been dreading to do as it means moving all my data over to my HDD which I've never done before (I originally had an SSD which has all my data on at the moment, but it's reaching maximum storage so I had to get a HDD).

Thankyou again for your reply and also for reading this :) Your help has been invaluable.