Question Black Screen of Death after weeks of debugging

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Hi all,
My PC which I built in 2018 has started developing issues in the last 2 months. I have been scouring forums for solutions but I feel like I have tried everything now. The behaviour is a sort of distinctive so I'm hoping someone here has an idea of what might be going on.
Here are my hardware details:

Case: dancase a4
PSU: corsair sf600
MOBO: ASUS ROG STRIX X470-I
CPU: RYZEN 2700x
GPU: gtx 1080ti (initially) RTX a5000 (current)
RAM: corsair vengeance 32gb (2x16gb)
HDD: 1x Samsung evo pro m.2 500gb 1x samsung evo pro m.2 1tb
I should also mention that my calculated maximum load is 425 watts, my PSU is 600w.

Around 2 months ago, I started getting blue screen of death while working on a 3dsmax scene. I thought it was maybe software specific but I found the issue occurred also in Cinema 4D. The error codes were different each time, and some times they would disappear before I could even read them. I thought it might be a GPU issue so decided it was a good time to upgrade from 1080ti to rtx a5000. The issue persisted so I think I can rule out the GPU as the culprit.
I then ran the following checks:

Memtest86 (each stick separately) - PASS
Furmark - PASS
Prime 95 - PASS

I then updated the BIOS,
tried about 5 different GPU drivers, among many other drivers.
I followed so many debugging suggestions that I actually cant remember them all.
Somewhere along the line the problem went away, however, now I'm getting BLACK screen of death when doing certain tasks.
Using redshift in c4d (GPU engine) causes an instant black screen.
Using corona in 3dsmax causes an instant black screen.
Booting with 2 monitors connected instead of one causes a black screen before reaching the login screen.
Once again, Furmark passes, prime95 passes.
I'm at my wits end now, it seems like it must be some kind of corrupted driver or something but I just don't know how to identify the issue.
By black screen, I mean the screen goes black but fans stay running until I hold down the off button. The audio drops as well and I cannot access the PC using teamviewer. I have tried multiple monitors and the issue is consistent.
Does anyone have an idea of what might be going on here?
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
61,160
5,187
166,290
10,453
Is it a Black Screen of Death, or a Blue Screen of Death?

If 2nd
Can you follow option one on the following link - here - and then do this step below: Small memory dumps - Have Windows Create a Small Memory Dump (Minidump) on BSOD - that creates a file in c windows/minidump after the next BSOD

  1. Open Windows File Explore
  2. Navigate to C:\Windows\Minidump
  3. Copy the mini-dump files out onto your Desktop
  4. Do not use Winzip, use the built in facility in Windows
  5. Select those files on your Desktop, right click them and choose 'Send to' - Compressed (zipped) folder
  6. Upload the zip file to the Cloud (OneDrive, DropBox . . . etc.)
  7. Then post a link here to the zip file, so we can take a look for you . . .

Do you get black screen error on either GPU?
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Is it a Black Screen of Death, or aBlue Screen of Death?

If 2nd
Can you follow option one on the following link - here - and then do this step below: Small memory dumps - Have Windows Create a Small Memory Dump (Minidump) on BSOD - that creates a file in c windows/minidump after the next BSOD

  1. Open Windows File Explore
  2. Navigate to C:\Windows\Minidump
  3. Copy the mini-dump files out onto your Desktop
  4. Do not use Winzip, use the built in facility in Windows
  5. Select those files on your Desktop, right click them and choose 'Send to' - Compressed (zipped) folder
  6. Upload the zip file to the Cloud (OneDrive, DropBox . . . etc.)
  7. Then post a link here to the zip file, so we can take a look for you . . .

Do you get black screen error on either GPU?
Thanks for the reply.
The issue is currently black screen of death, and I haven't had a blue screen of death for around 5 weeks now.
Using the 1080ti I had mostly blue screen, with maybe 5% of the crashes being black screen. With the a5000 they are all black screen.
My windows OS actually looks different now from about an hour ago (new UI), and I haven't experienced any crashes since.
Over the last few weeks I have continuously checked windows for updates, I will lose my mind if I find a dodgy windows update has lost me days of work.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
61,160
5,187
166,290
10,453

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Is new UI because you on win 11 now or did you change the shell?

Does PC stay on when you black screen?



https://forum.corona-renderer.com/index.php?topic=27699.0

what are the codes for the RAM? are they listed on here for motherboard? https://www.corsair.com/us/en/Categories/Products/Memory/c/Cor_Products_Memory?type=motherboard
I'm still running windows 10. I have made no changes to the shell myself, however it seems to have changed slightly (colour scheme now dark, search bar disappeared).
My PC stays on with fans running until I hold down the power button, but I think on one occasion it automatically restarted. The cursor disappears and the audio also stops so I don't think it's just the a problem with the display or connection with the card driver.
The ram sticks are on the website (VENGEANCE® LPX 32GB (2 x 16GB) DDR4 DRAM 2666MHz C16 Memory Kit)
I have fully tested both sticks individually using memtest86.

I'm not going to count my chickens just yet but since the change in the UI, I haven't been getting black screens, even when using corona. Maybe there is a way to identify if a windows update could have caused the crashes?
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
I just got another black screen while doing some basic stuff after having it working well all day. I have just discovered something that may shed a bit more light on the issue. The PC seemed to 'threaten' to go black screen by blinking black maybe 3 or 4 times today, but it was only after 8 hours of work that it became unrecoverable.

I was running a 3d print using Cura via USB and noticed the print continued during the black screen. I tried accessing my computer using team viewer but it didn't load, although it did say it was online.
So it seems the system is still somewhat operational, but not accessible.
Seems like it's a graphics issue after all? but the issues on my 1080ti as well as my brand ne A5000. I tried all kinds of GPU driver resets and installations of various releases so not sure what to try next.
It really feels like the crashes are sort of random. No clear correlation now between actions and crash.
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
What are various the temps (CPU/GPU/motherboard/drives/etc.) when you think it's about to go black?
Nothing out of the ordinary. It varies but my CPU gets to max 80c while stress testing. GPU is an A series so runs very cool, I haven't seen it about 60c.
I haven't had the black screen in a few days now. It's very frustrating because one day I can't even get past the login screen, then only when doing certain tasks, then not at all, it's very un-nerving to have to use this for my work.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
61,160
5,187
166,290
10,453
It could be psu, black screen but pc keeps running could be it doesn't have power to run everything so shuts off screen to save power.
I should also mention that my calculated maximum load is 425 watts, my PSU is 600w.
if its a bad PSU, what it should be able to do doesn't really matter.
difficult to test, its just an option

Wouldn't think 2 GPU have problems.
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
It could be psu, black screen but pc keeps running could be it doesn't have power to run everything so shuts off screen to save power.

if its a bad PSU, what it should be able to do doesn't really matter.
difficult to test, its just an option

Wouldn't think 2 GPU have problems.
Yes, I've been really hoping it's not a bad PSU, because I have no good way to test it as I don't currently have another PSU here. If it continues I guess I will have to test that, but it would be a last resort for me.
The last black screen was on Friday so I will have to see it it comes back. If so I will probably try doing a clean install of windows. Something in my gut tells me it's not a hardware issue. It seems to get a lot better after windows randomly updated and changed the UI.
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Something strange about it swapping to a darker theme and removing search window.

See how it goes.

Do you get any errors in reliability history when screen turns black?
Wow, I didn't know reliability history existed.
There were two occasions stating hardware errors, here is what came up:




The rest of the errors say computer was not shut down correctly (presumably that's me holding down the off switch to restart).
I guess that would suggest a gpu issue? I'm unable to trigger a black screen though stress testing, and it is a brand new high end pro card and had the issue with my previous card (except more blue screens than black screens with that one).
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
61,160
5,187
166,290
10,453
i don't look at reliability history, it can tell me things I don't want to know - it just did lol.
so it doesn't look the same as mine was. I was expecting to see something like this if it was GPU drivers crashing

4 years ago, I don't miss it.

so its not windows turning display off. it seems windows doesn't know why it goes off. Clue its hardware.

curious if some of the red circles are for Gameinput
 
Reactions: Mandark

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Yeah, I'm not liking the way this is going, I REALLY don't want to find the A5000 has issues :(
One of the critical events was GameInput Host Service, which occurred just after 'windows was not properly shut down', but it was just on one occasion and it seemed to occur after the crash.
Does this mean definitively that it's a hardware issue then? A Gpu issue?
By 'It's not windows turning display off', do you mean that rules out a windows related issue completely?
It's very difficult to reproduce the crash now so I'm not sure how I can conclusively identify the issue... I'm feeling a bit stuck
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
61,160
5,187
166,290
10,453
i don't think its the GPU
i actually leaning towards PSU, GPU often one of 1st things to be turned off if its a power problem.

it passes Prime so thats 1 less test to run, passes memtest too so unlikely to be ram... ram would cause freezes, not black screens
passes Prime so not likely CPU
storage won't cause black screens.

game input crashing on every pc at moment, even mine. Its why I asked.

Only really 3 tests for PSU, and well, multi meter is only reliable one. Top test just tests it works, and bottom one just shows values in bios

the paper clip method - https://forums.tomshardware.com/threads/what-is-the-paperclip-method-of-testing-a-psu.1336402/

or multimeter https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

or in the BIOS to check the +3.3V, +5V, and +12V. - https://www.lifewire.com/power-supply-voltage-tolerances-2624583

I don't expect you have a multi meter. I would be tempted to get a repair shop to test all your gear as well. They might have a multi meter.
 
Reactions: Julesm2

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Interesting, I will look into that and try to find a repair shop to check things.
It's an AIO build in a dancase a4, so has to be taken apart in a very specific order (remove cooler to access ram etc) so I've been reluctant to take it to a repair shop in case they are not comfortable with the setup. I'll try to find a place who I can trust.
Thanks so much for all your input on this. I was feeling completely lost, but feel I have some direction now.
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Black screens have returned just as I'm starting work again. Bios suggests voltages are well within the tolerances.
I'm going to borrow a multi meter tonight and try to check the voltages.
Odd thing is, the black screens came after another bunch of windows updates.
Anyway I accept that it's most likely a hardware issue
 

ubuysa

Honorable
Jul 29, 2016
63
9
10,565
6
It does sound like hardware, but upload your System and Application logs if you'd like to. There might be something interesting in them...?

1. Open Event Viewer (enter eventvwr command in Run box).

2. Locate the Windows Logs folder in the left hand pane and expand it by clicking on the arrow (>) to the left of it.

3. Right-click on the Application entry and select 'Save all events as...'. Choose a folder anywhere that suits you and a filename of 'Application' (an .evtx suffix will be added automatically).

4. Right-click on the System entry and select 'Save all events as...'. Choose a folder anywhere that suits you and a filename of 'System' (an .evtx suffix will be added automatically).

5. Zip the Application.evtx and System.evtx files together and upload the zip file here.
 
Reactions: Colif

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
Just got another crash to black screen followed by an automatic restart. This time in the reliability history it was reported as 'shut down unexpectedly', technical details say 'blue screen 133'
 
you might consider removing/disable all audio sources on your machine.
stupid audio add ins can cause this type of issue. or
you can look at your audio drivers and go to

Microsoft Update Catalog
and get a update.
method to install is on this thread:Use the Microsoft Update Catalog to find and install Windows drivers and updates (geeksinphoenix.com)

if this is the failure case, the audio driver faults and hangs the gpu driver. happens when the driver is not updated. mostly from 32 bit subsystem audio drivers.
debugging is a pain. often require special settings and a full memory dump

disable all audio drivers would prevent the issue, then enable on at a time to figure out root cause. (or just update from the microsoft catalog since it will be the only fix anyway)
 

ubuysa

Honorable
Jul 29, 2016
63
9
10,565
6
This might be a red herring, but I looked through your system log for the critical event 41 restart events and then looked to see what messages you were getting immediately prior. I've checked there (or four, I'm just about to go out and can't remember) and all of them have an information message for secnvme.sys, Samsung's NVMe driver. The message is only informational though...
Code:
Log Name:      System
Source:        secnvme
Date:          09/11/2022 14:04:48
Event ID:      11
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-1MAQ613
Description:
The description for Event ID 11 from source secnvme cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\RaidPort3
144d
Samsung SSD 970 PRO 512GB

The message resource is present but the message was not found in the message table
But soon afterwards, at the same timestamp, you get the critical event 41...
Code:
Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          09/11/2022 14:04:49
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (70368744177664),(2)
User:          SYSTEM
Computer:      DESKTOP-1MAQ613
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
In each case, immediately before the error 41, is an NTFS informational message...
Code:
Log Name:      System
Source:        Microsoft-Windows-Ntfs
Date:          09/11/2022 14:04:49
Event ID:      98
Task Category: None
Level:         Information
Keywords:      (2)
User:          SYSTEM
Computer:      DESKTOP-1MAQ613
Description:
Volume \\?\Volume{ddb359ea-0000-0000-0000-100000000000} (\Device\HarddiskVolume1) is healthy.  No action is needed.
On its own this is a nothing message and is usually ignored, but there are NVMe messages in the log each time you have a crash.

I've seen several niggly issues with M.2 drives that were solved by re-seating the drive. I would suggest you try that.
 
Reactions: Colif

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
This might be a red herring, but I looked through your system log for the critical event 41 restart events and then looked to see what messages you were getting immediately prior. I've checked there (or four, I'm just about to go out and can't remember) and all of them have an information message for secnvme.sys, Samsung's NVMe driver. The message is only informational though...
Code:
Log Name:      System
Source:        secnvme
Date:          09/11/2022 14:04:48
Event ID:      11
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-1MAQ613
Description:
The description for Event ID 11 from source secnvme cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\RaidPort3
144d
Samsung SSD 970 PRO 512GB

The message resource is present but the message was not found in the message table
But soon afterwards, at the same timestamp, you get the critical event 41...
Code:
Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          09/11/2022 14:04:49
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (70368744177664),(2)
User:          SYSTEM
Computer:      DESKTOP-1MAQ613
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
In each case, immediately before the error 41, is an NTFS informational message...
Code:
Log Name:      System
Source:        Microsoft-Windows-Ntfs
Date:          09/11/2022 14:04:49
Event ID:      98
Task Category: None
Level:         Information
Keywords:      (2)
User:          SYSTEM
Computer:      DESKTOP-1MAQ613
Description:
Volume \\?\Volume{ddb359ea-0000-0000-0000-100000000000} (\Device\HarddiskVolume1) is healthy.  No action is needed.
On its own this is a nothing message and is usually ignored, but there are NVMe messages in the log each time you have a crash.

I've seen several niggly issues with M.2 drives that were solved by re-seating the drive. I would suggest you try that.
Thanks, I really appreciate you taking a look. I will give that a go.
I will be building a new PC next year so decided to buy the PSU now, which means I will be able to test the PSU also.
The most irritating thing about the crash is that it comes and goes. It may take weeks to know it's gone for good :(
 

Julesm2

Reputable
Sep 24, 2019
23
0
4,510
0
i don't think its the GPU
i actually leaning towards PSU, GPU often one of 1st things to be turned off if its a power problem.

it passes Prime so thats 1 less test to run, passes memtest too so unlikely to be ram... ram would cause freezes, not black screens
passes Prime so not likely CPU
storage won't cause black screens.

game input crashing on every pc at moment, even mine. Its why I asked.

Only really 3 tests for PSU, and well, multi meter is only reliable one. Top test just tests it works, and bottom one just shows values in bios

the paper clip method - https://forums.tomshardware.com/threads/what-is-the-paperclip-method-of-testing-a-psu.1336402/

or multimeter https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

or in the BIOS to check the +3.3V, +5V, and +12V. - https://www.lifewire.com/power-supply-voltage-tolerances-2624583

I don't expect you have a multi meter. I would be tempted to get a repair shop to test all your gear as well. They might have a multi meter.
It looks like you were right, I'm not 100% certain yet but it very much looks like it was the PSU at fault.
As I'll soon be building another PC, I decided to just buy the PSU for that build now, as I could determine if I need to replace the sf600 in my dancase pc. I will RMA the faulty PSU as it's still well within warranty, but as I'm traveling abroad for two months, I can't wait for the RMA so have tp replace the faulty sf600 in the meantime. Question is, do I replace it with sf600 or sf750? I'm trying to understand what caused the fault after 3 years of use. Would it from running it too close to the max output, or just random?
 

ASK THE COMMUNITY