[SOLVED] Faulty PSU or something else?

Jan 14, 2022
8
0
10
Over the last few months I've been struggling with many different blue screens, freezes, crashes etc. Sometimes they happen all the time, sometimes once every few days or less. During gaming, browsing folders, the web, anything. Blue screens were rather random (the most frequent one saying 'memory management'), but eventually they stopped showing up at all as my PC started freezing instead (can move the mouse around, do stuff for a few seconds before it becomes unresponsive). Then I can usually move my mouse around but that's all, everything else is frozen and unresponsive. Sometimes the mouse pointer freezes too. One time I was playing SCP secret lab,alt-tabed to browser and noticed it froze. Couldn't open folders, close the browser, nothing. Funny enough one thing I could do is go back in game and play for 10 more sec before it froze completely. Thought it may be software at first as reinstalling drivers for graphics card seemed to stop freezing but I guess it may have been a coincidence.


Event viewer used to give me the 'volmgr' errors, saying it can't make a dump file but now doesn't give me any critical errors except for the 'unexpected shutdown' when I restart my pc forcefully. Bluescreen view doesn't really show any errors at all. It gave me one on the 5th of January with 2 files highlighted: ndis.sys and ntoskrnl.exe. I believe I then used sfc scan and dcim which helped for a few days at most.

Doing an SFC scan or a DCIM freezes my PC now even in safe mode (tried 4 times). I might try again tomorrow.

I have also tried or things I've noticed:

-switching the graphics card from 6700xt to nvidia 1060, didn't help
-windows memory test came up with nothing
-switching the ram with another PC (exactly same memory, same brand, etc.). Interesting enough the other PC experienced one crash with my old ram, which didn't happen in the past. I have also experienced a blue screen on my PC for the first time in a while but afterwards it went back to freezing (error said memory managment). No error in bluescreenview or event viewerbut it might be because I had mini dumps in system settings. I switched them to full dumps, had a freeze 10 min ago and nothing in eventviewer or bluescreenview
-an hour or two ago it froze on the login screen. Could move the mouse, couldn't click anything as it was unresponsive. Noticed some fans started running (think CPU)

Specs:

ryzen 5 3600,
b450 tomahawk motherboard,
1060 6gb nvidia msi.
PSU is EVGA SuperNOVA 750 G3, 80 Plus Gold 750W.
System windows 10

Would appreciate any tips.
 
Solution
I can think of multiple causes. A root virus, corrupted boot EMI, bad/failing ssd. When the OS loads, it's in 2 parts. First it loads what it needs to operate, then loads everything else. That includes non essential drivers. And then you do something, a corrupted driver is accessed and you crash.

Honestly I'm down to thinking you should try diskpart and wipe all the data from 1 drive, unplug all the others, install windows onto the wiped drive, and then use that to check pc health. If it does, you'll know 2 things. First is that whatever the cause is, it's related to the original OS drive, second, your issue is fixable. You could then hook the original drive back up, wipe it, install windows again (don't bother with internet updates...
Jan 14, 2022
8
0
10
I gave it a good clean when changing graphic cards. Only place I haven't fully cleaned is the PSU but used compressed air from all sides. Don't really think its the temps as sometimes I can game just fine for hours with no issues. Then I'll hang about on the desktop and it'll freeze randomly.

I did a PSU stress test for 1 minute and a memory test for 1 min too, using OCCT. No idea how long these should run normally. Guess I can stress test the CPU as well but I think that was tested along with the PSU test.
 
Jan 14, 2022
8
0
10
I replaced my PSU with a temporary one. After one day of testing I got a bluescreen with the error "KMODE EXCEPTION NOT HANDLED". Restarted the PC and it froze 5 sec after logging in (could only move by mouse, everything else was unresponsive). I think it's much less frequent than before for w/e reason.
Bluescreenview shows this View: https://imgur.com/a/IRhy1hv

Here's the minidump file for it: https://easyupload.io/polgzy

Any advice?
 

larkspur

Distinguished
Sounds RAM-ish. Double-check your RAM configuration - speed and timings. Make sure they are set to what the RAM is rated. Then run memtest86. Boot it from a USB flash drive and let it run for at least 4 passes. It's a good idea to test each stick individually (one stick installed at a time). There won't be any errors if the stick is good and your RAM is configured properly.

 

Karadjgne

Titan
Ambassador
Kmode exception errors are almost always corrupted/missing device driver related.

I'd shut down the pc. Start it, but turn off physically right after windows starts to load. Do this 3-4x and it'll pop you into windows repair mode which comes without having loaded windows or any of its associated drivers loaded prior. Then follow the prompts to do the repair.
 
Jan 14, 2022
8
0
10
Did the first stick of ram. Default settings, 4 passes, no errors. I'll do the other one overnight but frankly I don't expect much (much errors that is). As I stated in the first post I already tried replacing it with exact same two sticks from a different computer that had zero issues. Freezes persist both on stock bios settings (which set ram to 1800 or something) and on XMP profile.

I did what Karadjgne said. Restarted, switched off 3 times and went into repair while booting. Gave me an error saying it couldn't repair the computer and that there's a logfile located in E:\Windows\System32\LogFiles\Srt.
Now, no idea why it says drive 'E' as that's an almost empty drive restricted by Windows with 100mb space. The log file instead went to drive C/Windows.../Srt. Says the main fault is "Can't find harddrive. If drive is installed, it's not responding.".

When changing PSUs I did leave one SSD unplugged from it but connected to the motherboard. No idea if that's it or it's something else. The temporary PSU cable I'm using isn't long enough to plug them all in. I tried using the one from the old PSU but that didn't even let me start the computer (I'm guessing it's not compatible with the other PSU).
Nonetheless, on my old PSU with all drives connected it was still freezing and crashing. I tried using windows repair back then too but it also came with the error 'unable to fix the computer'. Honestly can't remember what the srt log file said or whether I checked it.

Edit: I just double checked the the srt log file. The error about missing driver comes from the system disk.
It's not in English so here's a rough translation of the last part:

Ran test:
---------------------------
Name: Test of system disk
Result: Completed successfully. Error test= 0x0
Time of operation = 0 ms

Found main reason:
---------------------------
Can't find harddrive. If drive is installed, it's not responding."

---------------------------
---------------------------
 
Last edited:
Jan 14, 2022
8
0
10
Did a full translation of the log file. I'm very confident I had something similar with my old PSU and all drives connected. It also said to check disk E for the logfile which I ultimately found in disk C. Here's the full logfile:

---------------------------
Repair attempt number: 1

Session details
---------------------------
System disk =
Catalog Windows = E:\WINDOWS
Uruchomienie programu AutoChk = 0
Number of main reasons = 1

Ran test:
---------------------------
Name: Check updates
Result: Completed successfully. Error test= 0x0
Time of operation = 0 ms

Ran test:
---------------------------
Name: Test of system disk
Result: Completed successfully. Error test= 0x0
Time of operation = 0 ms


Found main reason:
---------------------------
Can't find harddrive. If drive is installed, it's not responding."
---------------------------
---------------------------
 

Karadjgne

Titan
Ambassador
Ahh. If you installed windows with all the drives present, then you are stuck having to have all those drives present. Windows during installation 'claims' All drives installed and treats them as 1 giant C drive, but with different designations. It'll do stupid stuff like offload the boot EMI, pagefile, and other important files onto the other drive, so they aren't located physically on the physical C drive.

The really sucky part about that is that if the physical C drive goes bunk, you also have to wipe the other drive, or you'll get conflicts. If D drive goes bunk, you then have to reinstall or hopefully just repair windows missing files.

It's a pita. It's why it's always recommended to install windows with just the OS drive actually connected, so windows is contained in 1 drive and any successive drives are treated as extra storage.
 
Jan 14, 2022
8
0
10
Done memtest on the other ram stick. Default settings, 4 passes, 0 errors.

I managed to plug in my last SSD and launch automatic repair. It worked for about 10-20 minutes doing disk checking and attemping repair. Then the PC restarted and was stuck on loading windows idefinitely (it was on the loading part where you see the motherboard logo). After 20 minutes of waiting I restarted the PC which rebooted into automatic repair once more. This time it only took about 3 minutes and launched windows normally.

The SrtTrail is pretty long and not in English so I'll post the important bits:
Number of repair attempts: 4
Session details
---------------------------
System Disk = \Device\Harddisk3
Windows directory = F:\WINDOWS
AutoChk Run = 0
Number of root causes = 1


Test Performed:
---------------------------
Name: Internal state check
Result: Completed successfully. Error code = 0x0
Time taken = 0 ms

Root cause found:
---------------------------
Unspecified changes to system configuration might have caused the problem.

Repair action:
Result: Failed. Error code = 0x32
Time taken = 375 ms

Repair action: Restoring system
Result: Completed successfully. Error code = 0x0
Time taken = 407156 ms

Repair action: System files integrity check and repair
Result: Failed. Error code = 0x57
Time taken = 1938 ms

Then it looks like the second repair took place. It ends abrubtly before the third test I guess?
---------------------------
Session details
---------------------------
System Disk = \Device\Harddisk3
Windows directory = F:\WINDOWS
AutoChk Run = 0
Number of root causes = 1

Test Performed:
---------------------------
Name: Internal state check
Result: Completed successfully. Error code = 0x0
Time taken = 0 ms

Root cause found:
---------------------------
Unspecified changes to system configuration might have caused the problem.

Repair action:
Result: Failed. Error code = 0x32
Time taken = 375 ms

Repair action: Restoring system
Result: Completed successfully. Error code = 0x0
Time taken = 407156 ms
The third session, test or w/e it is says it has found 1 root error, same as before but nowhere does it say it's found a root cause. Instead it ends abruptly with no time taken on:
Test Performed:
---------------------------
Name: Bugcheck analysis
Result: Completed successfully. Error code = 0x0
Time taken
I really wish I could change the whole system language to English and just post it all. Somehow changing main language to English in settings didn't help.
Any advice on what to do next? Is the thing most likely resolved and I just have to wait and see? I'm really curious if it was a one off and the previous freezes were because of the old PSU or because of this.
 
Jan 14, 2022
8
0
10
Had a new one. Instead of a regular freeze/crash my screen went black and the PC simply restarted itself. I was in the middle of opening steam and discord. There's an error in eventmanager:
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0
Quick google search points mostly to drivers. I'll give a go in reinstaling the ones for gpu and chipset.
Still not sure if the PSU was the main cause of issues and these are something that got messed in the process during all the crashes and freezes. Matter of fact is I'm crashing less often with different symptoms but I had that in the past too after reinstaling some drivers only for the problems to come back in a few days. Another thing I noticed is it crashes/freezes/resets more often in the afternoon-evening hours than in the morning.
 

Karadjgne

Titan
Ambassador
I can think of multiple causes. A root virus, corrupted boot EMI, bad/failing ssd. When the OS loads, it's in 2 parts. First it loads what it needs to operate, then loads everything else. That includes non essential drivers. And then you do something, a corrupted driver is accessed and you crash.

Honestly I'm down to thinking you should try diskpart and wipe all the data from 1 drive, unplug all the others, install windows onto the wiped drive, and then use that to check pc health. If it does, you'll know 2 things. First is that whatever the cause is, it's related to the original OS drive, second, your issue is fixable. You could then hook the original drive back up, wipe it, install windows again (don't bother with internet updates etc) and check the drive then, use Antivirus/malware to check for root virus, use Crystal disk or other to check the drive health etc.
 
Solution
Jan 14, 2022
8
0
10
Did as instructed. Left only 1 formated ssd plugged in, installed a clean version of win10 and tested for a few days. Tested with other drives plugged in too (except the one with old version of win 10). Zero issues whatsoever.
Today I plug in the old drive (unplugged the others), install a clean version too. Plug everything else back in and I get a bluescreen 10 minutes in. Error said 'driver overran stack buffer'.

Dump file says:
DRIVER_OVERRAN_STACK_BUFFER (f7)
A driver has overrun a stack-based buffer. This overrun could potentially
allow a malicious user to gain control of this machine.
DESCRIPTION
A driver overran a stack-based buffer (or local variable) in a way that would
have overwritten the function's return address and jumped back to an arbitrary
address when the function returned. This is the classic "buffer overrun"
hacking attack and the system has been brought down to prevent a malicious user
from gaining complete control of it.
Do a kb to get a stack backtrace -- the last routine on the stack before the
buffer overrun handlers and BugCheck call is the one that overran its local
variable(s).
I did a root scan with some tool from kaspersky. All clean. Crystal disk... no idea how to really use it but it says drive condition is 93% and good. Also did a firmware update on the ssd by corsair, no idea if that will do anything.
Thank you for the help so far. This really pointed me in the right direction. I'll carry on testing but if you have any more advice, shoot it.

Edit: unless it's something to do with the way I removed/formated the drive I've been testing for the last few days? I simply plugged it in after installing win10 on the old drive and formated it. No idea how else I could have done it.
 

Karadjgne

Titan
Ambassador
Used 1 drive for a few days, no issues. Used old drive and got bluescreen. Crystal Disk verified the drive itself as good, so that leaves just 1 thing. The controller on the old card is flakey and leaving/creating corrupted data on the drive. It's like verifying a bike tire is good and has 93% life/tred left, but the reason it keeps going flat is a leak at the valve stem.

I believe you've done due diligence, all you really can do. There's no fix for a bad or flakey controller, no matter how healthy the drive is. SSDs work by storing minute amount of voltage in cells. If there's voltage present, that's a 1, if it's blank it's a 0. You end up with a bunch of 1/0 for a specific addresses and that's machine language the controller reads/writes as data. The cells can be healthy, everything good, but if the controller is assigning a 1 or 0 in the wrong place, especially during TRIM (rotating data like a hdd optimizes) the data itself is now corrupted. And you get a bluescreen.