[SOLVED] Help troubleshooting Daily BSOD

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

maomaobear

Prominent
Nov 13, 2021
51
2
535
Hi all, been getting BSODs once or twice a week on my build for the last few months. Last week I upgraded my CPU cooler and cleaned out the PC and have been getting the BSODs more frequently, about once a day. And the temperature is down from 50-80C to 30-60C instead, which confuses me!

The BSOD can happen during gaming, or closing a Microsoft Word file, or it seems at any time almost. All the errors are different, from IRQL NOT LESS OR EQUAL, to MEMORY MANAGEMENT, to KERNEL FAILURE (or something).

I ran a Prime95 test and it passed the Small FFTs, but failed the Large FFTs with a "Rounding error". So I'm thinking it may be the memory. What's weird is that it always fails on one worker within 2-3 minutes, but then the test runs perfectly fine for the next hour or more. A couple times the crash even resulted in a no POST and I had to reseat the CMOS battery to get to the bios.

Down-clocking the memory from 3200mhz to 3133mhz (timings same) seems to reduce the issue of Prime95 rounding error. However, I still got crashes/BSODs.

I also re-installed windows and all the drivers, including running DDU and reinstalling graphics drivers from AMD. No luck.

Today I've down-clocked all the way to 2133mhz (same settings as if XMP was never enabled), and no crashes for a while just crashed while editing this post 🙁

It SEEMS like a memory problem to me, but before I get it RMAed, wondering if you guys could help shed light into the problem more. Could voltages be the cause, and if so which voltages should I increase? Could it be the motherboard DIMM slots, how would I test that if my motherboard manual specifically states to slot in RAM in a certain order/slots? Is it most definitely not a CPU issue?

How do I 100% confirm its the RAM so I don't RMA it and still have the issues?

Appreciate any help. PC Specs below.

Cheers,
Pin

MSI X570 Gaming Plus
AMD Ryzen 5 3600X - no overclock
2x 8GB Corsair Vengeance LPX DDR4 3200mhz
Sapphire Pulse Radeon 5700XT
EVGA Supernova G3 550
Windows 10 Pro

RAM Timings
16-20-20-20-38
Voltage Settings:
MEM VDDIO 1.35v
MEM VTT Auto
VDDCR SOC 1.1v
CLDO VDDP 0.9v
CLDO VDDG 1.1v

All other motherboard settings on auto (no PBO)
 
Last edited:
Hmmm... OK.

I have two Samsung SSD Drives:

Samsung SSD 860 EVO
Samgsung SSD 850 EVO

---

Think I will just start replacing hardware soon. Will likely start with the CPU/Motherboard together (cheaper to buy together here)
 
Saw this red highlight of a stat in HWInfo:

WZk5es1.png


Maybe this has something to do with my problems?

9pmp767.png
 
So the full diagnostic passed on my 860 Evo drive, but the 850 Evo drive lists "this drive is not supported" in the diagnostic section. It is a 6 year old drive so maybe that feature wasn't out for it. Both drives are listed as "healthy" though.

Was not the display port cable, as I got another BSOD without it and only one monitor plugged in.

Did the following changes as a followup to potential disk drive problems.
  • Changed the SATA ports connected to by drives on motherboard side (to 5/6 from 1/2)
  • Re-seated all the SATA connections on both ends. I noticed that they were slightly ajar, but don't seem to have the ability to click in very tightly.
  • Swapped the power cable from PSU to both SSD drives with another power cable
  • Ordered new SATA cables, will swap out after the next BSOD
Also updated the Realtek LAN driver as suggested by Colif. But definitely thinking its a connection issue with cable or port somewhere as the frequency of BSODs increases or decreases when I go in and re-seat things or install/re-install things.
 
Last edited:
No BSODs for two days! Definitely think it has to do with SATA cables or ports.

Going to switch out the cables right now with new ones. Will report back in a few days if no more BSODs!
 
OK, plugged in the new cables. They are noticeably snugger than the old ones and don't wiggle at all (old ones wiggled a lot).

Considering when my issues started and how they increased or decreased after working on the PC, this really feels like the root cause. Again, will report back in a few days to confirm.

Colif, really have to thank you for being a sounding board on here and helping with the troubleshoot. Definitely helped in preserving my sanity! In a weird way this entire process was kind of fun.

Cheers!
 
Yup, premature celebration. The adventure continues with another BSOD

Dump: https://drive.google.com/file/d/1uWdeG8us2jipteIIhr7lHWjVEJRpFK2K/view?usp=sharing

Another driver failure pointing at network-related problem.

Code:
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the BugCheck
Arg2: fffff80083279a5f, Address of the instruction which caused the BugCheck
Arg3: ffffa004b28c2c90, Address of the context record for the exception that caused the BugCheck
Arg4: 0000000000000000, zero.

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2718

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 7406

    Key  : Analysis.Init.CPU.mSec
    Value: 327

    Key  : Analysis.Init.Elapsed.mSec
    Value: 10024

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 76


FILE_IN_CAB:  112521-6765-01.dmp

DUMP_FILE_ATTRIBUTES: 0x8
  Kernel Generated Triage Dump

BUGCHECK_CODE:  3b

BUGCHECK_P1: c0000005

BUGCHECK_P2: fffff80083279a5f

BUGCHECK_P3: ffffa004b28c2c90

BUGCHECK_P4: 0

CONTEXT:  ffffa004b28c2c90 -- (.cxr 0xffffa004b28c2c90)
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=ffffa004b28c3738 rsi=ffffc187817194b0 rdi=0000000000000009
rip=fffff80083279a5f rsp=ffffa004b28c3690 rbp=ffffa004b28c3739
 r8=0000000000000000  r9=ffffc18784f02870 r10=fffff800740bf720
r11=000000000000019b r12=ffffc1877eadb640 r13=0000000000000000
r14=ffffc1877f23bd00 r15=ffffc18777efc1c0
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050246
afd!AfdPoll64+0x2db:
fffff800`83279a5f 4c8b7818        mov     r15,qword ptr [rax+18h] ds:002b:00000000`00000018=????????????????
Resetting default scope

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  firefox.exe

STACK_TEXT:  
ffffa004`b28c3690 fffff800`8327976b     : 00000000`00000000 ffffc187`81719600 00000000`00000000 fffff800`00000009 : afd!AfdPoll64+0x2db
ffffa004`b28c37a0 fffff800`832791fc     : ffffc187`81719658 00000000`00000000 00000000`0000020c 00000000`00000000 : afd!AfdPoll+0x2b
ffffa004`b28c37d0 fffff800`7408f6f5     : ffffc187`817194b0 00000000`00000000 ffffa004`20206f49 00000000`00000001 : afd!AfdDispatchDeviceControl+0x7c
ffffa004`b28c3800 fffff800`74475a68     : ffffc187`817194b0 00000000`00000000 00000000`00000000 ffffc187`812f5080 : nt!IofCallDriver+0x55
ffffa004`b28c3840 fffff800`74475335     : 00000000`00012024 ffffa004`b28c3b80 00000000`00000005 ffffa004`b28c3b80 : nt!IopSynchronousServiceTail+0x1a8
ffffa004`b28c38e0 fffff800`74474d36     : 00000000`00000000 00000000`000004b8 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x5e5
ffffa004`b28c3a20 fffff800`74208cb8     : 000001e8`d7f3f000 000001e8`c9d29d60 ffffc187`6a72b040 ffffd780`63a00180 : nt!NtDeviceIoControlFile+0x56
ffffa004`b28c3a90 00007ffa`29ccce54     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
00000014`625bc698 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`29ccce54


SYMBOL_NAME:  afd!AfdPoll64+2db

MODULE_NAME: afd

IMAGE_NAME:  afd.sys

IMAGE_VERSION:  10.0.19041.1387

STACK_COMMAND:  .cxr 0xffffa004b28c2c90 ; kb

BUCKET_ID_FUNC_OFFSET:  2db

FAILURE_BUCKET_ID:  0x3B_c0000005_afd!AfdPoll64

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {d37025e0-0827-9818-560b-38665532b661}

Followup:     MachineOwner
---------

3: kd> lmvm afd
Browse full module list
start             end                 module name
fffff800`83220000 fffff800`832c5000   afd      # (pdb symbols)          C:\ProgramData\Dbg\sym\afd.pdb\FD84075009F20B65FAD8CD09BA167AB51\afd.pdb
    Loaded symbol image file: afd.sys
    Mapped memory image file: C:\ProgramData\Dbg\sym\afd.sys\176DC00Ca5000\afd.sys
    Image path: afd.sys
    Image name: afd.sys
    Browse all global symbols  functions  data
    Image was built with /Brepro flag.
    Timestamp:        176DC00C (This is a reproducible build file hash, not a timestamp)
    CheckSum:         000A1539
    ImageSize:        000A5000
    File version:     10.0.19041.1387
    Product version:  10.0.19041.1387
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    Information from resource tables:
        CompanyName:      Microsoft Corporation
        ProductName:      Microsoft® Windows® Operating System
        InternalName:     afd.sys
        OriginalFilename: afd.sys
        ProductVersion:   10.0.19041.1387
        FileVersion:      10.0.19041.1387 (WinBuild.160101.0800)
        FileDescription:  Ancillary Function Driver for WinSock
        LegalCopyright:   © Microsoft Corporation. All rights reserved.
 
You use ethernet? There isn't a USB dongle in the mix?

Maybe run an anti virus scan.

there used to be a program i could get you to run to let me see what windows error reporting is showing, but the last time I got someone to run it both my browsers told me file contained a virus so I can't really use that now.

Just wonder what I am missing here.
 
Yes I use ethernet, no USB dongle or WiFi involved.

Had the same thought as you about viruses, ran:
  • Windows full virus scan - no threats
  • Malware Bytes scan - no threats
  • Windows Defender offline scan - no threats
 
Below is from a previous dump, gardenman converted one for me already. its Sundays crash - https://jsfiddle.net/wLmrkajq/show - its how i found the driver info last time. I don't expect you to
read it.
Gardenman will reply later with a link I can use to look at crash.

i have seen these cause errors before but not lan related as far as i recall
Jul 20 2017SteamStreamingSpeakers.sysSteam Streaming Speakers driver (Valve Corporation)
Jul 28 2017SteamStreamingMicrophone.sysSteam Streaming Microphone driver (Valve Corporation)

this is part of wireguard vpn software
Dec 10 2019wintun.sysWintun Driver (WireGuard LLC)

only other drivers then that I haven't mentioned before are Realtek Sound drivers, and AMD drivers. Oh and HWINFO but I know its not cause, it came after dumps.
 
I've disabled the steam streaming devices in the Device Manager.

The Wireguard driver is for a VPN which is pretty important in my usage. I can try using another VPN protocol instead for a while and see if that helps.

I keep going back to when the problems started after I installed a cooler/cleaned PC. And how after messing around with connectoins inside sometimes it will start crashing every couple hours, or lower to every few days. Could it just be coincidence?
 
what cooler did you install?
it could be a coincidence, not sure how installing a CPU cooler (guessing) would cause network problems. the lan stuff is run off a realtek chip on motherboard, touching CPU shouldn't affect it.

the lan thing could be a symptom of something else. I don't have any suggestions as to what.

i wonder if uninstalling the realtek drivers and reinstalling them will fix it... its not a normal suggestion I would make. Obviously helps to have drivers to reinstall before you start or you won't have internet... http://www.uninstallhelps.com/how-to-uninstall-realtek-ethernet-controller-driver.html
 
Hi, I ran the dump file through the debugger and got the following information: https://jsfiddle.net/nus9d3az/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:112521-6765-01.dmp (Nov 24 2021 - 23:22:14)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Probably caused by:memory_corruption (Process: firefox.exe)
Uptime:1 Day(s), 1 Hour(s), 54 Min(s), and 32 Sec(s)

Possible Motherboard page: https://www.msi.com/Motherboard/MPG-X570-GAMING-PLUS
You are using the latest BETA BIOS.

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 
Since yesterday:
  • Uninstalled and re-installed WeGame (Steam for China), suspecting driver was problematic
  • Moved all of used programs off of suspected bad 860 Evo SSD
  • Re-secured ethernet cable connection??
  • Disabled both steam streaming audio devices
While troubleshooting LAN, found an ethernet cable in the network that was faulty and limiting transfer speeds by about 80%. Replaced that.

New PCI Express LAN card has arrived, won't install until after next BSOD.
 
Question: Have you run prime 95 recently?
Are you still getting rounding errors?

As that will cause this sort of thing if you are.

i didn't see this driver in rpevious crashes or I would have mentioned it
Jul 29 2013QMTgpNetflow764.sys(Tencent Technology (Shenzhen) Company Limited)

it might be okay but anything prior to 2015 might not be a win 10 driver.
 
Just ran Prime95 Large FFTs and it crashed. Didn't see the error message or anything the program just closed.

In the results log:
Code:
[Fri Nov 26 12:59:33 2021]
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!
Self-test 480K passed!

Running it again. Last 4 times I ran it there were no rounding errors.

Update: Ran again for about half an hour and all tests passed.
 
Last edited: