Question Strange and completely random BSODs, no real cause.

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Jan 28, 2022
61
0
30
Hey all, after seemingly fixing my PC's stability I have been encountering some random crashes here and there. Once per day seemingly entirely at random.

Not while gaming, not while running an intensive load, at the desktop or while general internet browsing. My system has passed just about every stress test I can throw at it as well as a fresh OS install twice.

Dump files can be found HERE

Any help would be greatly appreciated, thank you

Full Spec List
Motherboard - ASROCK B550 Taichi
CPU - Ryzen 7 3700x
GPU - NVidia GTX 1070 Founders Edition
RAM - 32gb DDR4 3600 Corsair Vengeance RGB Pro
PSU - PowerSpec 850w Gold Fully Modular RGB (PSX 850GFM )

Links to previous threads can be found below:
Thread 1
Thread 2
 
Last edited:
Jan 28, 2022
61
0
30
it would make sense that amdryzenmasterDriver.sys was installed with asrock tuning software. it would override settings in bios and hopefully accounts for your underclocked cpu.

Wow yeah I was unaware the CPU was even underclocked, BIOS never gave anything off to indicate it and I don't actually think I ever opened the ASRock tuning software.

I did apply a GPU underclock for my old blower style card but that has since been removed, and only with MSI afterburner.

Should I check on my bios and make sure everything is running standardly as it should?

Or maybe a CMOS reset? My mobo has an easy to press button on the rear I/O
 

Colif

Win 11 Master
Moderator
if the underclock was software driven and over riding BIOS settings, you probably don't need to reset cmos. Resetting BIOS to defaults and then re enabling XMP (if it was on) might be all it takes - may not be even necessary to do this.
 
Jan 28, 2022
61
0
30
it would make sense that amdryzenmasterDriver.sys was installed with asrock tuning software. it would override settings in bios and hopefully accounts for your underclocked cpu.

Hey again @johnbl and @Colif we are back again with a BSOD but this time it was for an error I had never seen before, "reference by pointer".

My system has never crashed for this reason, its always been APC mismatch or various other issues, never seen this one before.

Files Located HERE
I also changed my dump files to a complete dump/kernel dump and I noticed my PC took a lot longer to gather data about the crash so I am really hoping these are larger and have the detail required to actually be useful.

Should I resort to driver verifier? That program scares me as a novice PC user.

Really can't thank you both enough for your help, I appreciate it.
 
Jan 28, 2022
61
0
30
try running prime 95, so far we only checked ram but I am just wondering if its drivers.
https://www.guru3d.com/files-details/prime95-download.html
Prime 95 how to Guide: http://www.playtool.com/pages/prime95/prime95.html

it checks CPU & Ram

I don't see reference by pointer very often so I don't know what its associated with. I won't say what I think it is as John will know for sure.

I will download and run prime95 and report back.

I also remember the interesting bit about Corsair listing my memory as compatible with my motherboard, but ASRock's QVL list with my motherboard didn't actually list my specific kit.

Could that cause some unknown stability issues?

I checked over ASRock's entire list for corsair and my exact kit ( CMW32GX4M2D3600C18 ) isn't listed anywhere on the QVL list. I also checked other sources outside of ASRock's site, and I can find my memory kit listed there but not under QVL.

I know this is sort of where we left off on another thread, I can't seem to find a definitive answer anywhere on this specific kit being unstable with my specific board.

Pangoly RAM support List - my kit appears when QVL only remains unselected.

ASRocks's QVL Sections
Matisse
Vermeer
Cezanne
 
Last edited:
the last dump provided was a mini dump
it showed iCUE.exe running,
looks like a stack overflow,
raw stack looked like it was making a DMA call right before the bugcheck.

check to see if you have the file c:\windows\memory.dmp
this file will have the proper info in it for debugging.
 
Jan 28, 2022
61
0
30
the last dump provided was a mini dump
it showed iCUE.exe running,
looks like a stack overflow,
raw stack looked like it was making a DMA call right before the bugcheck.

check to see if you have the file c:\windows\memory.dmp
this file will have the proper info in it for debugging.

I've got the file, I misunderstood the directory until now. Sorry about that.

I did some yardwork and housework, left it to compress. It's a big 30gb file, will take a minute to do it's thing and upload.

I will upload the full compressed file when it is done. Thanks John.
 
Last edited:

Colif

Win 11 Master
Moderator
I also remember the interesting bit about Corsair listing my memory as compatible with my motherboard, but ASRock's QVL list with my motherboard didn't actually list my specific kit.
I remember that part but Motherboard makers don't spend as much time testing all the combos.
Corsair checked it with motherboard - https://www.corsair.com/us/en/Categ...32&page=0&sort=popular&view=gridView&filters=
Asrock, if they had checked it, would have checked it with CPU and Motherboard. They didn't

Asrock only tested 8gb and 32gb sticks of ram from Corsair, odd that there are no 16gb sticks there at all.

you have same ram as I do.
 
Jan 28, 2022
61
0
30
the last dump provided was a mini dump
it showed iCUE.exe running,
looks like a stack overflow,
raw stack looked like it was making a DMA call right before the bugcheck.

check to see if you have the file c:\windows\memory.dmp
this file will have the proper info in it for debugging.

Hey @johnbl here is the dump file from the "reference by pointer" crash.

REFERENCE BY POINTER DUMP

While attempting to upload that file, I encountered another BSOD for 'DPC watchdog violation'.

I am currently uploading the additional dump now and I will update this post with it for you when it is ready.

DPC WATCHDOG VIOLATION DUMP
 
Last edited:
Jan 28, 2022
61
0
30
I remember that part but Motherboard makers don't spend as much time testing all the combos.
Corsair checked it with motherboard - https://www.corsair.com/us/en/Categories/Products/Memory/c/Cor_Products_Memory?type=motherboard&manufacturerSelection=ASRock&systemModelSelction=B550 Taichi&maxModule=8&minModule=1&maxCapacity=128&minCapacity=32&page=0&sort=popular&view=gridView&filters=
Asrock, if they had checked it, would have checked it with CPU and Motherboard. They didn't

Asrock only tested 8gb and 32gb sticks of ram from Corsair, odd that there are no 16gb sticks there at all.

you have same ram as I do.

Would it be a good idea to make a bootable USB on it with a Linux distro and see if it has a kernel panic?

A buddy suggested it to me but I had not thought of it before, wanted to get a second opinion.
 
Jan 28, 2022
61
0
30
it can't hurt really. Give it a try.

I have seen cases before where replacing ram that works fine in tests, can still fix the problem. It could be a slight timing difference. All it takes.

In that case, you think it would be worth finding a cheap 16gb kit specifically listed on ASRocks QVL and seeing if that improves things?
 
Jan 28, 2022
61
0
30
@Colif well I searched every single 16gb kit on my motherboards QVL and every single one they tested is some upper echelon RGB variant so I may be SOL on finding one that fits both the manufactures testing and ASRocks testing. I was also able to run Prime95 for several hours last night with absolutely zero issues. I'm running out of programs to stress test it seems. Someone did suggest running the windows memory diagnostic tool in extended mode which I have not done.. Is that something you'd consider doing overnight tonight just to see if it can pass again?

Also I created a bootable USB with Pop_OS on it to mess around in and see if any failures appear. Only thing I guess I won't be using running an OS on a thumbdrive is my SSD but crystaldisk cleared it long ago. I'll report back if that acts up as well because that would point to hardware failure somwhere.

Here is another dump file for @johnbl from some issues that occurred today while my PC was just resting at the desktop. This one is in the Kernel format and not the Complete formant.

Two crashes, one complete system freeze/lockup. Was only able to get one memory dump out before getting another BSOD, and my whole PC froze up as I was writing this post.
 
Last edited:

Colif

Win 11 Master
Moderator
Have I suggested a repair store at any stage? they might have spare ram they can put in to test if it makes any difference. They can test the other parts too, it could be motherboard and we have no way of testing that. Only real way is to put all the hardware on another board.

If it was me I would pay to get someone to look at it, before buying any replacement parts. I see people replace almost entire PC and you have to tell them, maybe it was the one thing you didn't swap. Its why I don't like blindly suggesting parts.
 
system bugchecked because buffer overrun
the process running at the time was
CorsairMsiPluginService.exe

(bunch of corsair threads)

you should remove this tool.
you have a lot of tools running that you might want to remove until you get your system stable:
OverwolfBrowse
iCUEDevicePlug
RTSSHooksLoade
MSIAfterburner

sorry the debugger truncates the long names
for example
OverwolfBrowser.exe
another one is
RTSSHooksLoader64.exe


I would remove as many of these tools as you can and see if the system is stable. if it is then you can start adding them back.
 
Jan 28, 2022
61
0
30
Have I suggested a repair store at any stage? they might have spare ram they can put in to test if it makes any difference. They can test the other parts too, it could be motherboard and we have no way of testing that. Only real way is to put all the hardware on another board.

If it was me I would pay to get someone to look at it, before buying any replacement parts. I see people replace almost entire PC and you have to tell them, maybe it was the one thing you didn't swap. Its why I don't like blindly suggesting parts.

That's the next step after I play around in Linux for another day to test hardware stability. I've got buddy who I can leave my PC with to do some testing like that.

Smart idea to do that before throwing hardware at it.

I've also followed the advice of @johnbl and I have uninstalled all of the software that was associated with those processes and I will see if my system is stable.

Ideally I'd love to continue using ICUE as all my peripherals, my fans, and my aio are all corsair and I thought with ICUE I could control them all under one roof but it that comes with an unstable system than I'll find another way to control the RGB.
 
Last edited:
Jan 28, 2022
61
0
30
system bugchecked because buffer overrun
the process running at the time was
CorsairMsiPluginService.exe

(bunch of corsair threads)

you should remove this tool.
you have a lot of tools running that you might want to remove until you get your system stable:
OverwolfBrowse
iCUEDevicePlug
RTSSHooksLoade
MSIAfterburner

sorry the debugger truncates the long names
for example
OverwolfBrowser.exe
another one is
RTSSHooksLoader64.exe


I would remove as many of these tools as you can and see if the system is stable. if it is then you can start adding them back.
System was stable for a few hours after uninstalling the software that was associated with those tools. I think I'm on the right track. I can now let my system rest at the desktop without encountering crashes, so far.

I have encountered 3 BSODs since removing all that software, but all were directly correlated with an action I made on the system. Removing a program, deleting a file, launching a program, etc.

The only one actually important is the MSIafterburner as it controls the fan on my old blower style 1070. Without a custom fan curve (can't set one in bios) my card pretty much roasts.
I attempted to add it again to avoid my card operating at 82C and things seemed alright, but I was greeted with another BSOD after I started the program. I restarted, and I've ended and started up MSI Afterburner about 20 times now and have not encountered an issue. Still, it's worth mentioning.

Interestingly on these BSODs, it offers me what fails. Past two were system files. In all of my previous crashes, I was never told by my system what may actually be failing.

Most recent BSOD dump file can be found HERE


Would either you or @Colif recommend a fresh OS install so that I can start adding programs back carefully and one by one?
 
Last edited:

Colif

Win 11 Master
Moderator
Notices you said F... converts to c... 28c? thats not hot? My RTX 2070 Super runs at 122f at idle.

I expect what its showing you are the victims. Its rare for Microsoft files to be actual cause, mostly victims.

I will wait until John has looked at recent dumps before suggesting a clean install as it would be pointless to read them after.
 
Jan 28, 2022
61
0
30
Notices you said F... converts to c... 28c? thats not hot? My RTX 2070 Super runs at 122f at idle.

I expect what its showing you are the victims. Its rare for Microsoft files to be actual cause, mostly victims.

I will wait until John has looked at recent dumps before suggesting a clean install as it would be pointless to read them after.

Silly mistake, I meant 82 Celsius. Didn't realize I had made that error at the time, my bad.

82C on the other hand is right on the edge of throttling from my understanding.
 

Colif

Win 11 Master
Moderator
originally I had this in last post until I noticed the F:

sounds like the thermal paste on your GPU needs replacing. I had a GTX 980 that went from always low temps to 60c all the time. My RTX 2070 Super sits on 50c most days but it doesn't do much apart from show videos or desktop. I don't think its fans start until it hits 61c.

I wonder if GPU is part of the problem (or is the problem) as those temps wouldn't be helping.
 
bugcheck was in usb code extensions.
i would guess it was from this device:
"HID\VID_1B1C&PID_0A52&MI_03&Col01\8&7552ab9&0&0000"

https://devicehunt.com/view/type/usb/vendor/1B1C

it is some corsair device but id did not see 0a52 as a device in the list.

this could be:
the device itself: Look for a firmware update, maybe move the device to a different USB host by itself, or remove the device, depending on what it is. (move it to usb 2.x port if you have one, usb3 ports are more likely to have bugs)

this could also be: a bug in your bios setup for the usb, or the wrong usb extension driver USBXHCI.sys or UsbHub3.sys
looks like you have the microsoft generic driver for both of these.
see if your motherboard has a custom USB 3 driver to install.
(custom drivers have to match the bios version)

so, for each of your usb devices, you need to make sure you have updated the firmware. Also check for firmware updates for the usb chips on the motherboard.

the video device indicated a Mouse. Does your mouse have lights that you can control? see if there is a firmware update for the mouse

looks like all of the cosair devices are listed as vid on the usb port. all have custom logs that I can not read with the debugger. I I can really read is one asked to see what other devices were connected to the USB bus and the bus just stopped working. then you got a bugcheck since a bogus memory location (address 19) was used.
I think it is just going to be a stupid bug where some device passed the contents of a memory pointer rather than the memory pointer itself. ie the number 19 is the size of the memory block not the address.

you might consider going into windows control panel, device manager and find every single USB port and right clicking on them and find the power management tab and tell windows not to put them to sleep. just in case this is some sleep issue with one of the devices on the USB bus.

found a internal error for device
USB\VID_1B1C&PID_0A52&REV_2041
it indicated that it was doing something that is only allowed by an audio device but the device was a video device. look for firmware update.
I am guessing it is corsair speaker that were set up as a video device.
------------
debugger internal log:
496: HUBPDO_CreatePdoInternal - Device Context 0xFFFF878B68E3A170 - USB\VID_1B1C&PID_0A52&REV_2041 - Port Path 6:0:0:0:0:0
497: HUBPDO_EvtDevicePrepareHardware - DeviceHackFlags:0x8
498: HUBDESC_InternalValidateEndpointDescriptor - HW_COMPLIANCE:Configuration Descriptor Validation warning due to the bLength value (0x9) of the endpoint descriptor at offset (0xb7) is greater than expected (0x7). This is allowed by the audio class, but others should avoid it.
499: HUBDESC_InternalValidateEndpointDescriptor - HW_COMPLIANCE:Configuration Descriptor Validation warning due to the bLength value (0x9) of the endpoint descriptor at offset (0xfa) is greater than expected (0x7). This is allowed by the audio class, but others should avoid it.
500: HUBDESC_InternalValidateLastInterface - HW_COMPLIANCE:Configuration Descriptor Validation Failed due to the number of endpoint descriptors found (0x0) for an interface was not equal to the number specified (0x1) in the corresponding interface descriptor (bInterfaceNumber 0x2 and bAlternateSetting 0x2). Ignoring failure for further validation.
501: HUBPDO_EvtDeviceWdmIrpPreprocess - Failing the MS OS Descriptor request by client as the device does not support it
------------
usb c port was reset in the logs right before the crash.
 
Last edited:
device USB\VID_1B1C&PID_0A52&REV_2041
=
VOID ELITE USB Gaming Headset

look for firmware update
---------------
you may find that installing the amd chipset drivers for your motherboard might replace the microsoft generic USB drivers and extensions. the AMD drivers might not check and block the audio calls on a video usb device like the microsoft driver does. IE the amd driver might work better with all of the corsair drivers claiming to be video devices (headset, mouse, and keyboard ) (just in case there is not a firmware update)

also, usb drivers are installed via plug and play and get associated with each port.
when the device is removed the driver gets hidden but still runs. you have to go into window windows control panel, device manger find the option in the menu to show hidden devices. Then you can remove the software for the device for each port. you might even have to remove the driver from your machine.
plug and play will reinstall the driver if the device is connected.
(update the driver if you can)
 
Last edited: