Question Continuous crashes driving me crazy

Dec 14, 2022
3
0
10
Praying that you fine people will be able to help me figure this out because at this point I don't know where to turn anymore.

Here is my setup:

  • ASUS ROG Strix Scar 15 with AMD Ryzen 9500 and Nvidia GeForce 3080 (Laptop Edition)
  • 32 GB RAM
  • Connected Gigabyte G34WQC monitor (the laptop has DisplayPort power on its sole USB-C port so I'm using a USB-C to DisplayPort cable)
  • Razer keyboard+mouse
Problem and symptoms:

The computer and display will completely freeze up, seemingly randomly. At least once or twice per day. Never the same times. Sometimes it's a gradual freeze where certain windows stop responding before freezing completely. Sometimes it's a full freeze and then the screen goes black. Sometimes it freezes and then recovers itself. So weird. Sometimes (though not always) it also makes that sound that a device or cable has been disconnected. Also, sometimes when it freezes up it appears to be just the visual things (external and laptop monitors) while the lighting effects on my keyboard+mouse+laptop all continue. Other times the lighting on everything freezes too.

Not sure if related or not, but certain apps and games cause the screen the flicker like crazy while others do not. Research shows that this may be a refresh rate issue and that's been happening long since these freezes/crashes so never thought it would be related but maybe I'm wrong.

At first I tried figuring this out with software drivers and updates etc but eventually I just wiped all partitions and did a full clean install of Windows 11. Updated everything properly (especially GPU). The Ryzen has onboard Radeon tech so I made sure not to install the actual Radeon software package, only the drivers. Nor did I install any of ASUS's nightmare-in-a-file (Armory Crate and so on), save for the MyASUS app which I thought would be harmless enough. Freezes continue to occur. I tried pausing Windows Update because I thought that might be it, no luck. Also thought maybe something could be broken with the monitor software so I turned off the Gigabyte Auto Switch Input feature. And for good measure I tried a different DisplayPort port. Nothing, still occurs.

The main reason this is maddening is because I can't find out how to replicate the issue. It really does appear to be random. The ONLY common denominator appears to be that the freezing begins when I click on something - the notifications tab on Twitter, a tab in my browser, switching windows, I don't know. But it could also be that this is only when I notice it. Regarding times of the day - random.

After every occurrence, I check the Windows Reliability Monitor history. Often times it's a LiveKernelEvent 117 or 141. Sometimes it's bad_module_info. And on at least several occasions, like the most recent one that made me write this post, it didn't show anything at all other than "Windows was not properly shut down".

The reason this is being posted under Windows 11 and not Hardware is because I want to try and eliminate the possibility of this being a software/driver issue, am still not certain yet. Can anyone recommend some sort of info/log software I can install that will help track what's going on every time there's a freeze?

Thanks so much in advance to anyone who's contributed the 10 minutes to read all this!

[Edit: just a small addition that I ran Heaven Benchmark on max settings and it was smooth as butter. Does not appear to be a "GPU load" issue]
 
Last edited:
Dec 14, 2022
3
0
10
Noted that you have looked in Reliability History/Monitor.

Also take a look in Event Viewer.

Takes more time and effort to navigate and understand.

To help:

http://www.tomshardware.com/faq/id-3128616/windows-event-viewer.html

Didn't even know about this, thank you! The data here is fascinating - and horrifying. Looks like thousands of errors in the past seven days. I don't even know where to begin.

Is there a way to export all this and upload it here so that I can have more eyes helping me?
 

Ralston18

Titan
Moderator
Probably but not likely worth the time and effort.

Just pick out , as best you can, the most frequent 3 or 4 error codes.

If you can find a representative day in Reliability History, select that day and take a couple of screen shots of the errors listed below the graph.

Post the screenshot(s) here via imgur (www.imgur.com).

For the most part, to me anyway, increasing numbers of errors and varying errors are a sign of power related problems.

All the more so because of the other things you have tried in the meantime.

One key: what happened "7 days ago"?

Reliability History's time line may flag that....

Another tool is Update History: any failed or problem updates? Or even some "sucessful update(s) of any sort?
 
Dec 14, 2022
3
0
10
Probably but not likely worth the time and effort.

Just pick out , as best you can, the most frequent 3 or 4 error codes.

If you can find a representative day in Reliability History, select that day and take a couple of screen shots of the errors listed below the graph.

Post the screenshot(s) here via imgur (www.imgur.com).

For the most part, to me anyway, increasing numbers of errors and varying errors are a sign of power related problems.

All the more so because of the other things you have tried in the meantime.

One key: what happened "7 days ago"?

Reliability History's time line may flag that....

Another tool is Update History: any failed or problem updates? Or even some "sucessful update(s) of any sort?


Okay suspicions pretty much confirmed - at the very least, this looks like some sort of display/GPU issue. I don't know if software or hardware or both. You also mentioned power, which makes me wonder. Though I've never had problems before with this setup and I've not changed anything manually with the hardware (i.e. have not overclocked or anything like that).

Here's just a small sample (attached) of today's crash at 14:39 but there are oodles of errors and warnings going on all day every day.

View: https://imgur.com/I9AYzXa


nvlddmkm comes up in many of the instances, but other "Sources" of system failures seem to include more frequently than not:

  • DistributedCOM
  • LSA (LsaSrv): "LSA package is not signed as expected. This can cause unexpected behaviour with Credential Guard." Reference to various PackageName.
  • Kernel-Power with reference to various drivers failing to load: WUDFRd, IGO_VSD, rt68cx21
  • NDIS: "Miniport Microsoft Wi-Fi Direct Virtual Adapter #2, {4df2f069-f863-4a3e-9f6a-5fde5c0a4656}, had event Fatal error: The miniport has failed a power transition to operational power"

Those are the most frequent/apparent around the times/dates that freezes occurred.

Anyone make something of all this?
Also, since it's a laptop GPU and I can't really pull it out and expect it, is there a good software package for testing GPU health/integrity/whatever?

Thank you again.
 

Ralston18

Titan
Moderator
nvlddmkmsys

https://answers.microsoft.com/en-us/windows/forum/all/nvlddmkmsys-problem/c36c7f51-7ee4-49e0-b6e2-e0ed4190dc4d

Read through the link and then look through your system to see what you can find.

Then try running "sfc /scannow" and "dism".

https://www.lifewire.com/how-to-use-sfc-scannow-to-repair-windows-system-files-2626161

How to use DISM command tool to repair Windows 10 image | Windows Central

Immediate objective/purpose to determine if there is some file corruption or other problem within Windows.

If that is not the case then (barring other comments and suggestions) delve into some Nvidia related problem.
 
something to consider, I have seen many bugchecks in nvidia drivers that the root cause was a sound driver for the motherboard or for add on sound support.
you might consider updating your realtek sound drivers and your Dolby Atmos drivers or disabling them and see if your system runs better.

there were cases where these drivers would respond to the nvidia sound drivers and would overflow the stack of the GPU and crash the GPU card.

fix would be to update the drivers from the motherboard vendor, or disable all sound sources that do not have a speaker connected to them.

minidumps might help but it is hard to figure out when the cause is a sound driver messing with another sound driver. You just see stack overflows in the dump.

used to be only one sound source, now we have a sound driver for the cpu, one for the GPU, and one in the motherboard. Then you have sound effect add on like positional audio, and dolby.
And tools that route audio so you can capture for streaming. All of these can mess up the gpu. (as well as overclocking drivers)