[SOLVED] Troubleshooting a Crash

Jun 2, 2019
10
0
10
So my PC just started to seemingly randomly crash starting last night. Was just browsing the internet, looking at nucleotide structures for getting caught up for an internship, when it suddenly crashed. CPU was hovering around 32 degrees C, GPU1 around 42, and GPU2 around 30, so I am near certain it isn't a thermal issue with those components. No bluescreen, just the audio cutting out and monitors losing signal, so I wouldn't think it is a problem with my boot drive or windows. Both of my monitors lose signal, so I don't think it is a problem with the monitors. Additionally, all the lights in my computer remain on and fans keep spinning, so I have reason to believe it isn't a power draw issue. All drivers are up to date.

  • MSI Pro Carbon X370
  • Ryzen 5 1600
  • 2x AMD RX 480s (Sapphire Nitro 8 GB)
  • EVGA 750 G3 (750W PSU 80+ Gold)
  • EVGA CLC 120
  • 2x 8GB DDR4 2400 RAM (GeIL EVO POTENZA, OC'd to DDR4 3000)
  • TP Link T6e Archer Wifi PCIe card
  • Intel 660p 512 GB M.2 SSD
  • Crucial BX100 250GB SATA III
  • Hitachi Ultrastar 7K3000 7200 RPM 2 TB 64MB Cache SATA 6.0Gb/s
  • LG Black Blu-ray Disc Drive SATA Model UH12NS30
  • Logitech Z313 Speakers and Subwoofer
  • Pixio PX277 1440p 144Hz monitor
  • BenQ GL2460HM Black 24" TN 2ms monitor
  • Windows 8.1 64 Bit

  • Waiting over night, hoping time will fix the issue.
  • Unplugging and replugging cables.
  • Spraying computer duster in the GPU output ports.
  • Cleaning inside the PC with duster.
  • Power cycling the monitors (while the lights inside the case were on but monitors lost signal).
  • Unplugging the wifi card (the newest addition to the PC, since up until last week, I have been running off of ethernet).
  • Plugging cables into different ports on GPU1.
  • Plugging DP and HDMI cables into GPU2.
  • Switching which GPU is in which PCIe port.

Built the computer in September and OC'd the RAM then, so I don't think the OC is the issue. Like I said, it's seemingly random when the signal gets lost. Could be a minute after the PC is turned on while it is still on the desktop. Could be an hour, while watching a video. Additionally, I have a 4xUSB 3.0 devices plugged in, 2xUSB 2.0 devices, 2 Noctua NF-F12 industrialPPC-3000 PWM on the radiator (push-pull), 2xNF-A14 industrialPPC-3000 PWM plugged into the mobo, and 2 Corsair ML140 PRO LED Red 140mm PWM plugged into the mobo. I have checked, and crossfire is enabled. There was a power outage yesterday, but everything is plugged into a surge protector, so I'm not certain, but that should have prevented any issues due to the power outage.
 
Last edited:
Solution
The loss in signal won't necessarily immediately occur as a voltage fluctuates - You can often find that if there is an issue, the rail will be outputting an out of tolerance voltage most of the time. This is just a test to verify that there is nothing obviously wrong with the PSU. The other side is if you've got a rail sitting constantly at a boundary, it could also be indicating a problem.

Again, not trying to say this will identify your problem, it's just one test you can do to eliminate anything obvious.

The absolute best way to check if the PSU is failing is by simply swapping it with a known good one and seeing if the issue reoccurs.

PC Tailor

Illustrious
Ambassador
Additionally, all the lights in my computer remain on and fans keep spinning, so I have reason to believe it isn't a power draw issue
This could still very easily be a power issue - as it won't just be down to does the wattage meet requirements. The PSU has to output stable and regular wattage to all components, and it supplies practically all major components down the 12V rail - if the power isn't regular for example, your biggest power draw (GPU) could stop giving an appropriate signal as it doesn't receive the appropriate power.

Not saying this IS the PSU, but it certainly (and fairly commonly) could be.

Built the computer in September and OC'd the RAM then, so I don't think the OC is the issue.
A previous OC could still cause an issue, I would remove the OC and see if the issue persists.

There was a power outage yesterday,
This could certainly be indicative of why is the issue is occurring and why it could be a PSU problem.

Additionally:
  • Does the issue occur in safe mode?
  • Is your BIOS up to date?
  • Assuming as part of drivers, your WiFi and Ethernet drivers are up to date?
  • Does the issue occur when your remove any OC?
 
Jun 2, 2019
10
0
10
I'll try removing the OC, as well as booting into safe mode, and let you know if the problem persists. Checking the website, I am one BIOS revision behind, 2 if you include the currently open beta. Would you recommend updating to the current open beta, or sticking to non-beta BIOS release? WiFi/Ethernet drivers are up to date.

The other PC in my house is a prebuilt with one of those cheap chinese PSUs lacking 80+ certification. Out of curiosity, if it is a PSU problem, is it just my bad luck from the PSU lottery (this would be the second PSU that failed me) that my PC is experiencing issues, and the other desktop isn't? I would think my PSU, given the money I poured into getting a decent quality one, would fare better than that one against a power outage.
 

PC Tailor

Illustrious
Ambassador
Don't update to the Beta, just the last fully released version. Not the Beta. The Beta is only really needed if it fixes a specific problem you are encountering.

Well not necessarily, I almost always expect the cheap PSUs to fail sooner rather than later, and PSUs later can fail if they encounter power surges or oddities. But they protect your other components (which is usually the most important matter). Not saying it IS the PSU, just saying it is perfectly possible - however swapping it out for a known working one of the same elk would show if it is or not.
 
Jun 2, 2019
10
0
10
Sorry it took me so long to get back to you. Tried booting into safemode, and it crashed while in safemode. Tried removing the overclock on the RAM and it still crashed. Updated the BIOS; still crashed. After updating the BIOS, the monitors lost signal while I was in the BIOS, trying to assign the RAM speed to the advertised 2400 MHz, over the default 2133 MHz. Not certain, but would this point toward a hardware issue over a software issue?
 

PC Tailor

Illustrious
Ambassador
If the issue is occurring in safe mode, that would more likely indicate hardware.

The other problem is that the symptoms you describe can be multiple things, they're not as straightforward. So I would probably start with the simplest with removing all but 1 RAM stick and running memtest on each module - to ensure modules aren't at fault.

Following that I'd potentially be looking at seeing if the issue occurs after a clean windows install.
 
Jun 2, 2019
10
0
10
The link below shows the memtest results. When I initially ran memtest, the same no signal on the monitors happened. From the mouse and keyboard LEDs going off and on again as the fans were ramping up and down, it seems like the computer rebooted a few times while the monitors lost signal. When I tried a different monitor (just to see if the monitors were going bad simultaneously), I saw this reboot occur a few times in windows before it eventually lost signal. Back to the memtest run, maybe 20 minutes after the monitors lost signal, the signal returned. To be honest I am surprised it behaved for 3 and a half hours (I was there watching for most of it). Since no errors, I am guessing the memory is fine.

Stupid question though: wouldn't this error occurring in memtest and the BIOS suggest it isn't Windows? To my knowledge, Windows is isolated on the SSD or Hard Drive it written to, once the computer has shutdown until after Windows is booted again (pretty sure RAM and VRAM are volatile, meaning they'll lose what's written on them if they completely lose power; the CPU and GPU don't have any other memory space, while the motherboard doesn't have space for much else other than the BIOS). Since the BIOS and memtest aren't accessing my SSD with Windows on it (at least not for the purpose of running Windows), wouldn't a problem with Windows not manifest itself in those environments?

View: https://imgur.com/iPOcHij
 

PC Tailor

Illustrious
Ambassador
If memtest showed no errors, I'd be fairly confident RAM isn't the issue. Not guaranteed but I'd start shifting my focus to other areas.

Technically the GPU has VRAM and the CPU has cache, so they both do have memory, but as you say, they are volatile. The OS may be confined to your storage drive during boot, but the drivers are not necessarily, you can have boot load drivers that are booted on startup to ensure functionality of the computer. That and obviously once it's started, all manner of data is moving between various components.

Being as the issue was largely display disappearing the manner you describe, if I'm not mistaken you do not have integrated graphics, so I would be tempted to replace the GPU with a known working one to see if the issue reoccurs.

The fact that it happens in safe mode and even during a memtest removes most possibilities of it being an OS issue. So a clean install would likely not do much.
 
Jun 2, 2019
10
0
10
Replaced my RX480s with an old GTX960 I know works. I didn't see when it happened, but I returned to my desktop after roughly 40 minutes and saw the signal was lost, and mouse and keyboard off, with no response from them when used (caps lock light didn't light up when caps was pressed, and mouse clicks didn't turn on what is supposed to be a constant red LED showing power to the mouse). My guess at this point is either a bad motherboard or PSU (as you said before, this happening after a power outage could be indicative of a PSU problem). I don't have another X370 motherboard to test this out on or another functioning PSU. PSU is definitely the cheaper of the two. Do you suggest I go for that? None of the pins are bent on my CPU, I never OC'd it, and as I said, temps are normal, so I have hard time believing it's my CPU going bad. I know it could be other CPU problems that could be causing this, but I believe the likelihood is much lower at this point in time.
 

PC Tailor

Illustrious
Ambassador
CPU going bad is a rare occurence. Far more likely that something else fails first - not saying it doesn't happen, just is pretty rare unless the CPU is very old.

Just to absolutely be sure RAM isn't playing a part, I would also go down to 1 stick of RAM, and if the issue reoccurs, try the other module and repeat.
But yes - I would now sooner go down PSU route. The G3 are a fairly good series of PSU - but if this has happened after a power problem, then it could well be.

Can you get a screenshot of the voltages of your PSU rails (something like HWInfo or HWMonitor can do the trick) - a physical better test if you have a multimeter would be checking the output of each rail and seeing if it means the voltage output +/- 5%.
 
Jun 2, 2019
10
0
10
Just making sure, by checking the rails you mean making sure that, when checked with a digital multimeter set to DC voltage, the pins on the 20+4 mobo connector display within 5% of +/- 3.3V, +/-5V, and +/-12V, according to this diagram, where COM is ground, PSU ON and a ground pin are bridged, and PSU is on? https://www.smps.us/atx-connector-20-24pin.jpeg I am sorry. I'm new to this whole technical troubleshooting, and don't know the terminology quite yet. My guess is, if there is a fluctuation in one of the pins greater than 5%, it is going to occur at the time when the loss of signal to the monitor/reboot would occur. So, would I have to shut off and on the PSU 13 times in order to check all 13 pins, waiting about 40 minutes each time after I turn it on, just to get an idea if the rails are failing, or would a rail failing be self evident as soon as I turn on the PSU? If the former, I think I would rather lose the $100 on a new PSU that doesn't fix the issue than spend the 8 and 2/3 hours testing, only to find out it wasn't the PSU.
 

PC Tailor

Illustrious
Ambassador
if there is a fluctuation in one of the pins greater than 5%, it is going to occur at the time when the loss of signal to the monitor/reboot would occur.
Simpler really, if any one of your rails starts moving out of the 5% tolerance range, then it - at it's most basic form - means the PSU is malfunctioning, as it is not outputting sustained power within a tolerance. Usually any PSU that starts fluctuating outside of that tolerance, will start causing issues somewhere.

This video will answer all your questions and give you a visual idea of what to do:
View: https://www.youtube.com/watch?v=ac7YMUcMjbw

Just remember that it passing this test doesn't immediately mean it is working normally, as you're not inducing any load, the PSU may malfunction when you put it under load. Just this test can be a first point indicator.

As I said, as a starting point, download HWInfo and that should give you accurate readings - then if something seems off kilter, you can physically verify it with the multimeter. Because with HWInfo, you can monitor voltages under load.
 
Jun 2, 2019
10
0
10
I just don't understand how, if the fluctuation happens at the time of the monitors losing signal, I will be able to tell if there is a fluctuation? It's not like I can look at my monitors to tell if there has been a fluctuation without a signal going to them, and the signal only returns after several minutes and system reboots. Troubleshooting with HWInfo just seems like catch-22 to me. Testing with a multimeter seems to be the only way to tell, but it's imperfect in the sense that, with my computer behaving the full 3 and half hours for memtest, it could be a long time before I actually notice the fluctuation if I am testing the right pin, and there are 13 possible pins that it could happen on, and only for the split second when the monitors would lose signal. Am I wrong: would the fluctuation in voltage last longer, and would I have a larger margin of time to check the pins?
 
Last edited:

PC Tailor

Illustrious
Ambassador
The loss in signal won't necessarily immediately occur as a voltage fluctuates - You can often find that if there is an issue, the rail will be outputting an out of tolerance voltage most of the time. This is just a test to verify that there is nothing obviously wrong with the PSU. The other side is if you've got a rail sitting constantly at a boundary, it could also be indicating a problem.

Again, not trying to say this will identify your problem, it's just one test you can do to eliminate anything obvious.

The absolute best way to check if the PSU is failing is by simply swapping it with a known good one and seeing if the issue reoccurs.
 
Solution
Jun 2, 2019
10
0
10
So after looking at my voltages in HWInfo, I noticed after awhile the 3.3VCC (I assume it is supposed to be my 3.3 V DC current output to the mobo) was starting to drop. I noticed it drop initially to 3.008 V (well outside of th -5% range) and then return to 3.3 V, then again to 2.992 V. It even hovered around 3.135 V range for awhile. Even if this isn't the problem which is causing the monitors to go and system to reboot, that is definitely not a healthy PSU. Even with the margin of error of HWInfo, I bet it is well under the -5% range. I'm going to buy a new PSU, and I'll let you know if that fixes the issue. Thank you so much for the help thus far!
 

PC Tailor

Illustrious
Ambassador
I noticed it drop initially to 3.008 V (well outside of th -5% range) and then return to 3.3 V, then again to 2.992 V.
Bingo. There you have a problem.

Don't get me wrong, it COULD be the software, as not every software is ever perfect, but it could be a damn good indication. There are instances where the software can report quite drastically different to what is actually being delivered on the rail, but it's usually pretty good.

But based on this, I would suggest buying a new PSU and retesting - as always, just make sure it is a good quality unit my friend otherwise the issue may persist. At the very least you may be able to return the PSU under warranty, and it will eliminate the PSU from the suspects if the problem continues. But hopefully, it shouldn't!
 
Jun 2, 2019
10
0
10
If you do know, out of curiosity, what does the 3.3V rail power? You said 12V is a majority of my heavy duty components, like GPU. Is 3.3V powering the RAM at all? If so, that might explain the reboots.

It also makes sense, that it is the PSU, since I saw this morning that a white light was flashing next to the PCIe lane my GTX960 was plugged into after it lost connection the first time (probably indicating insufficient power), and heard a separate fan from my chassis fans slowing down and ramping up (probably the PSU fan). When I did look at the voltage in HWInfo, I noticed fan noise got quieter when the voltage reported dropping too.
 
Jun 2, 2019
10
0
10
Just bought a Corsair RM850x (several PSU calculators were suggesting an 850W PSU, even though stating I only use 650W, so I spent the extra 20 bucks on one instead of another 750W PSU), and that seems to have done the trick. Stayed on with no issues through the night. I know I am just asking for trouble at this point, but now that I know I have some headroom with the PSU, I kinda want to overclock that Ryzen 1600 and the RX480s.

Thanks again for the help.
 

TRENDING THREADS