[SOLVED] Obscure BSOD over the past few months

Oct 7, 2019
8
0
10
Hey Everyone,

Been trying to solve this obscure BSOD issue for a number of months now and can't seem to get to the bottom of it. Randomly why idle either overnight or while I'm at work the machine will BSOD and reboot with a bugcheck code of 126 (7E), this originally used to be a hard hang and further down the line of troubleshooting it became a reboot.

Before I replaced the mobo+cpu+psu, it started 1 day when I plugged an external 2.5" USB drive into 1 of the front header USB 3.0 port, was copying files to it overnight and next morning it did hard hang so I had to reboot it, didn't think too much of it at the time, over the coming months it started doing once a day while idle, or once every few days, maybe once a week or maybe once a month and of course it never created any dump files, this is part of the reason I can't seem to find what the issue is, it is worth noting I recently bought an RTX 2080 about 2-3 months before this started happening and my machine is custom water cooled by myself, I have monitored and temps have never been a problem.

1.) Roll-back to a known good working image of Windows I had, this is the same image that had been working for a few months
2.) Re-installed Windows from scratch, I used to be on LTSC 1809.
3.) Checked for updated drivers.
4.) Ran driver verifier which did find Logitech drivers were causing a hard hang shortly after logging in, removed that but didn't made a difference.
5.) Ran memtest86+ on each of the dimms, no issues
6.) Ran Furmark and Prime95 overnight, no issues

Some of the details are bit hazy as it's been ages and I did spend hours and hours trying different things, eventually the machine started rebooting instead of hard hanging and just 1 of the times it rebooted it created a minidump file which pointed to the nvlddmkm.sys driver, I possibly updated/reverted the nVidia drivers by this point, I was dusting out the PC and found that the slightest nudge on the ram sticks started hard hanging the machine, so I put some pressure on the dimms (4 dimms in total) to keep them seated firmly in plaace, eventually after a few days or a few weeks it had rebooted so that wasn't it, there was also the fact that the watercooled 2080 and the TITAN X before it had a heavy gpu sag on the PCI-E slot it was sitting in, also occassionally taking the 24 pin ATX connector out and dimms caused quite some bend in the motherboard, ultimately it got to the point where the machine would not even post, stripped down the PC, motherboard out on it's own on workbench, good known working PSU, known good video card, and still no post so I wrote that mobo off as I did not have another CPU to test with.

Anyways I bought a new mobo+cpu, the PSU is one of the first things I had replaced a while back when I started experiencing this issue, new build of Windows 10 1809, all drivers up to date, everything clean, working, no probs for a few months now, Friday evening I came home and plugged a USB camera into 1 of the front USB 2.0 header ports and overnight at 2:46AM it had crashed and rebooted, same bugcheck code 126, this time it did write a minidump file which points to the nvlddmkm.sys driver, however I also noticed that it created .dmp files in another location that showed a driver crashing called "USBHUB3", I then used Appcrashview which was interesting because at the very second that the event log reported the bugcheck error which was 2:46:53 there was 7 events that took place, started with the USBHUB3 driver causing a crash, I've now moved that USB camera to a USB hub I have connected (I did try changing hubs months back this did not help), I have not had any random reboots yet but it's getting to the point now where I really just want to drill down into what exactly is causing this, if it is a faulty RTX 2080 I'd like to find out because the RTX 2080 has a USB Type-C connector on it and 3 of the USB devices in Device Manager are registered under NVIDIA and using the USBHUB3 driver, also looking in the event log it's saying where the .dmp and .xml etc files are for the USBHUB3 crash but they never exist, it doesn't actually seem to create the files.

Apologies some of the details are a bit hazy, I've been trying to remember everything I've done over the past number of months. I've attached a link to the minidump file it created for nvlddmkm.sys as well as a previous USBHUB3 .dmp file it did manage to create, any help would be massively appreciated as I'm very much lost at this point, I have tried to look at both .dmp files using windbg but not experienced enough to understand what the assembly instructions are doing or what else is going on, there is also a MEMORY.DMP if it helps but it's 1.2GB so will upload upon request.

Minidump/msinfo32 below:
https://mega.nz/#F!oaojCSZK!xyK3lpkE3TY_waY2LruxwA

If you need any more info let me know.



Thanks
Rainman65
 

gardenman

Splendid
Moderator
Hi, I ran the dump files through the debugger and got the following information: https://pste.eu/p/99PX.html
File information:100519-9093-01.dmp (Oct 4 2019 - 21:45:09)
Bugcheck:SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007E)
Driver warnings:*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 8 Hour(s), 03 Min(s), and 16 Sec(s)

File information:USBHUB3-20190824-1109.dmp (Aug 24 2019 - 06:09:39)
Bugcheck:BUGCODE_USB3_DRIVER (144)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 00 Min(s), and 08 Sec(s)
Comment: The GPU tweaking driver "iomap64.sys" was found on your system. (AI Suite or GPU Tweak 2)

The nvlddmkm.sys file is a NVIDIA graphics card driver. There are a few things you can do to fix this problem. First off, try a full uninstall using DDU in Safe Mode then re-install the driver (more information). Or try getting the latest version of the driver. Or try one of the 3 most recent drivers released by NVIDIA. Drivers can be found here: http://www.nvidia.com/ or you can allow Windows Update to download the driver for you, which might be a older/better version.

BIOS info was not included in the 1st dump file. This can sometimes mean an outdated BIOS is being used.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 
Oct 7, 2019
8
0
10
Thanks for the response, appreciate it, so a bit more history on this, the original build I had on this system which had been working fine 2 months+ was Windows 10 1809, I then did the in-place upgrade to 1903 which seemed to be fine for a few weeks.

Originally I was on NVIDIA driver 436.02 I think and I did the driver upgrade to latest using Geforce Experience and did a Express install to 436.48 sometime last week, it's been fine since Saturday morning and I just today installed the Intel drivers for my Intel I219-V NIC, ASUS's website did not seem to have NIC drivers which was a bit odd, so now I've got the proper NIC drivers on, will see if any of this helps.

It just seems odd that on Friday I came home plugged in a camera into a front header USB 2.0 port which was working fine and then overnight it does the same bluescreen, the dump files the event logs point to do not exist however it's the same USBHUB3 that is crashing, I'm not sure about this because the camera was plugged into a USB 2.0 port so should not have caused the USB 3 driver to crash ?
 

Colif

Win 11 Master
Moderator
The Sept Nvidia drivers are just bad news. ITs possible the nvidia drivers are cause of both as don't RTX cards have USB 3.1 on them now?

Are they the newest Oculus drivers?

@gardenman
unknown drivers -
STXII.sys - Asus Xonar drivers
MegaSas2i.sys - this appears to be a part of Windows - Megasus Raid COntroller driver
 
Oct 7, 2019
8
0
10
Hey Colif,

Interesting, I might try rolling back to a previous version of the NVIDIA driver then, I was going to look into the RTX USB but yes the RTX cards have a USB Type-C connector on them, I was going to look at which driver this uses, but it's probably USBHUB3 I'd guess, I don't actually use that connector on the video card though.

For the unknown drivers, I'll list my machine specs below:
Corsair 900D case
EK Pump/Res combo
400mm EK Radiator
Intel i9 9920X @ Stock Clocks/Watercooled
ASUS TUF MK2 x299 Chipset Motherboard
16GB DDR4 Corsair Dominator 2333Mhz Ram (using XMP Profile 1 clocked at 2667Mhz), using custom EK heat spreaders.
Gigabyte RTX 2080 Watercooled
Asus Xonar Essense STX II sound card
Samsung 512GB NVME
LSI 9271-4i Raid card (24TB Raid 5)

I have noticed a while back that in the Application Event Log it is often flooded with "Windows Error Reporting" App Crash logs saying a whole bunch of things are crashing, I'm not experiencing this and don't see any of it happening, it sometimes says apps that are not even running are crashing.

I'm starting to think it could be RAM or Video card, however video card will be difficult to RMA if i cannot reproduce the fault.

EDIT:
Regarding Oculus, the machine was built a few months back so they would be a few months old, I can defo try updating them but I did experience this same random reboot without the Oculus connected or software/drivers installed when I was troubleshooting a while back.

Motherboard was on BIOS v1902 which was a July 2019 BIOS, since the machine crashed I have updated to v2002 which is September 2019 BIOS, I was also going to test the 4 DIMMS again 1 by 1 using memtest86+ again over the next few days.
 
Last edited:

Colif

Win 11 Master
Moderator
didn't think too much of it at the time, over the coming months it started doing once a day while idle, or once every few days, maybe once a week or maybe once a month and of course it never created any dump files,
If its crashing without creating dump files, it can mean the cause is hardware and it crashes too fast for the ssd to record a dump file.

Event viewer can show errors from weeks ago, and many of them can be ignored as they may have been one off crashes.

USBHUB3 appears to have been set off by a USB device, as it also mentions https://carrona.org/drivers/driver.php?id=ucx01000.sys

What USB devices have you got attached to the Nvidia card?? Were the USB errors happening before the Nvidia drivers were updated? i
 
Oct 7, 2019
8
0
10
If its crashing without creating dump files, it can mean the cause is hardware and it crashes too fast for the ssd to record a dump file.

Event viewer can show errors from weeks ago, and many of them can be ignored as they may have been one off crashes.

USBHUB3 appears to have been set off by a USB device, as it also mentions https://carrona.org/drivers/driver.php?id=ucx01000.sys

What USB devices have you got attached to the Nvidia card?? Were the USB errors happening before the Nvidia drivers were updated? i

Originally when this first started out the machine would hard hang which would not allow it to create dump files because I had to press the reset button, eventually this became a reboot instead of a hard hang and just 1 of the times it rebooted it managed to create a dump file which pointed to the nvlddmkm.sys driver, most other times it did not create dump files, there were event logs which even said it was not able to write out a dump file to the hard drive device which at the time was a Samsung 256 SSD with no write errors/firmware up to date.


Regarding USB, I actually have quite a few devices connected, however these all used to be connected before the issues started happening as well

I don't have any USB attached the NVIDIA card, it has a single USB Type-C video out connector but I don't use that, there's no other USB functionality on the card that I'm aware of.

Below is a list of all USB devices I can remember right now connected to the machine:
USB Hub (USB 2.0) > Keyboard / Xbox Wireless Adapter
Front Header USB 2.0 > Logitech Wireless USB Receiver
Rear USB 2.0 > Audient USB Audio Interface / External 2.5" 1TB SSD
Rear USB 3.0 > 3 x Oculus Sensors / Oculus Rift USB
 
Oct 7, 2019
8
0
10
Try following the 2nd post in this thread and use Driver verifier to tease the faulty driver into crashing. It will cause BSOD or should, as it tests drivers and puts them into situations they shouldn't be in. It is part of win 10

https://forums.tomshardware.com/thr...nclude-in-blue-screen-of-death-posts.3468965/

Yup so I did this yesterday evening for around 3-4 hours, tested verifier against nvlddmkm.sys, USBHUB3.sys and the Intel Ethernet driver, no errors or BSOD's for about 3-4 hours.
 

Colif

Win 11 Master
Moderator
that doesn't help, it puts spotlight back on hardware again

did you run it only against those specific drivers or all as it could be some other driver causing it. I would run DV in default settings for about 8 hours. If you still don't crash

UsbHub3!HUBDSM_MarkingUnknownDeviceAsFailed+0x10 - seems its a USB device,

Can you run this and I will ask someone else to look at results, it might tell us more - https://www.sysnative.com/forums/pages/bsodcollectionapp/
 
Oct 7, 2019
8
0
10
that doesn't help, it puts spotlight back on hardware again

did you run it only against those specific drivers or all as it could be some other driver causing it. I would run DV in default settings for about 8 hours. If you still don't crash

UsbHub3!HUBDSM_MarkingUnknownDeviceAsFailed+0x10 - seems its a USB device,

Can you run this and I will ask someone else to look at results, it might tell us more - https://www.sysnative.com/forums/pages/bsodcollectionapp/


Ah ok, that does seem interesting, I will try again overnight to run DV in default settings, I ran the bsodcollectionapp, link below to results:

https://mega.nz/#!9aJB1A4Y!v_eISVIMrUE2NWcbgm9n_QcTqKIMxYviSFsV937CPM4
 
Oct 7, 2019
8
0
10
Small update on the Windows Error Reporting flooding the Applications log, this initially baffled me as it was saying applications had crashed that were not even running even when I had renamed the .exe. After some digging it appears that when an application does crash at some point, WER generates a dump file and ".wer" report file which it stores in "C:\ProgramData\Microsoft\Windows\WER\ReportQueue".

Upon reboot Windows actually reads those files and creates new event logs in the Application log to say those applications crashed so it just looks like they crashed again, deleting the files and rebooting appears to fix the issue.
 
While going over the logs, I noticed entries related to piracy.
I would like to ask you to get rid of everything piracy related on your system, I am not saying your operating system is pirated, only that you have software installed that is pirated.

Why is pirated software a problem?
  • It is unknown what could've come with it, such programs often have a way of letting other software come into your system and who knows what they'll do.
  • Since it is unknown what could've come with such programs it is unknown what the behavior of your system will be and any potential solutions may, as a result, backfire and make your system's stability worse.
It may also be against forum rules, but I didn't find any forum rules so I didn't include it in the list.

Please remove everything piracy related on your system so we can start troubleshooting your system.
Please consider this as a warning, I may refrain from providing support if I see piracy-related entries in new logs.
 
  • Like
Reactions: gardenman
Oct 7, 2019
8
0
10
Oh, that's interesting, I have been going back and forth and trying a few different things and I am simply at this point going to opt to rebuild my system on a clean Windows 10 1903 and see how that goes, thanks for the support.