Question Continued blue screens even after fresh Windows install ?

Sep 13, 2023
14
0
10
Would like to preface this with saying that i am inexperienced and this is the first build i've had that isn't "hand-me-down" parts sorted out prior by a friend. So, for about a week and half I have been dealing with various blue screens( initially mostly memory management) after upgrading my cpu, ram and motherboard. At first i thought one if my parts were faulty upon delivery and installation. I reinstalled latest Bios and various drivers, ran cmd commands and tests like memtest86 and windows memory diagnostic with only the latter saying I had a memory problem and memtest coming back nothing. After a bit more prodding i was stumped and ended up reinstalling windows via usb but still had problems. I lastly ended up enabling xmp a long with the profile that matched my ram and I thought my problem almost appeared to be over. Nope!

So now my PC will run a bit longer before it blue screens and I've again run Windows memory diagnostic with no problems coming back, cmd (dism , sfc, chkdsk), reinstalled drivers, etc.

Available dump files

Build as a whole

MSI MAG B760 Tomahawk WIFI DDR4 - 7D96v15

CPU - i5 12600k

(F4-3200C16D-32GVK) Ripjaws V DDR4-3200 CL16

GPU - AMD XFX RX6600 - Adrenalin 23.9.1

New Dump files - 9/18/23
 
Last edited:
Welcome to the forums, newcomer!

Did you recreate the installer for your OS to rule out a corrupt installer? Speaking of installer, where did you source the installer for the OS?

You also install the OS in offline mode, meaning no www, then manually install all relevant drivers for your platform in an elevated command, i.e, Right click installer>Run as Administrator.
 
Welcome to the forums, newcomer!

Did you recreate the installer for your OS to rule out a corrupt installer? Speaking of installer, where did you source the installer for the OS?

You also install the OS in offline mode, meaning no www, then manually install all relevant drivers for your platform in an elevated command, i.e, Right click installer>Run as Administrator.
When i reinstalled i used Windows 10 media creation tool from their site as directed from this video. As for a corrupt installer, i think i had that on a first attempt at an installation and ran the tool again on a separate computer with no known issue. I've considered another reinstallation but wanted to post somewhere like here before trying anything else. For your last part, i don't think i did it in offline mode but might have installed as admin but can't totally recall
 
ran cmd commands and tests like memtest86 and windows memory diagnostic with only the latter saying I had a memory problem and memtest coming back nothing.
That's a HUGE clue right there. When any tool indicates a problem you MUST investigate.

Have you chacked that your RAM and CPU are compatible with the motherboard you installed? Check the QVL lists for the motherboard.

It appears that you have two 16GB RAM cards? In that case remove one and just run on the one card for a while. If it BSODs then swap the RAM cards over and run on the other card for a while. I it BSODs on one card and not the other then you've found the flaky card.
 
That's a HUGE clue right there. When any tool indicates a problem you MUST investigate.

Have you chacked that your RAM and CPU are compatible with the motherboard you installed? Check the QVL lists for the motherboard.

It appears that you have two 16GB RAM cards? In that case remove one and just run on the one card for a while. If it BSODs then swap the RAM cards over and run on the other card for a while. I it BSODs on one card and not the other then you've found the flaky card.
That part out of the chunk of issues I've had was probably because I hadn't enabled xmp plus the profile matching my ram. tried single stick and still had bsod during that particular time before enabling xmp. I stated later on that I enabled it (xmp) and ran Windows memory diagnostic a second time with no memory problems coming back. Something I should've mention was that I'm unsure if my ram itself is compatible or incompatible with my CPU as it's not listed on the motherboard end under k-series but is listed as compatible under non k series under CPU. I've tried googling it but got mixed answers to if it mattered or not.
 
Last edited:
Would like to preface this with saying that i am inexperienced and this is the first build i've had that isn't "hand-me-down" parts sorted out prior by a friend. So, for about a week and half I have been dealing with various blue screens( initially mostly memory management) after upgrading my cpu, ram and motherboard. At first i thought one if my parts were faulty upon delivery and installation. I reinstalled latest Bios and various drivers, ran cmd commands and tests like memtest86 and windows memory diagnostic with only the latter saying I had a memory problem and memtest coming back nothing. After a bit more prodding i was stumped and ended up reinstalling windows via usb but still had problems. I lastly ended up enabling xmp a long with the profile that matched my ram and i my problem almost appeared to be over. Nope!

So now my PC will run a bit longer before it blue screens and I've again run Windows memory diagnostic with no problems coming back, cmd (dism , sfc, chkdsk), reinstalled drivers, etc.

Available dump files

Build as a whole

MSI MAG B760 Tomahawk WIFI DDR4 - 7D96v15

CPU - i5 12600k

(F4-3200C16D-32GVK) Ripjaws V DDR4-3200 CL16

GPU - AMD XFX RX6600 - Adrenalin 23.9.1
With both sticks of ram installed in the proper slots run a pass of memtest86.

If that completes with no errors remove the gpu and connect to the igp.
Test.
 
If it blue screens with the gpu and does not with the igp that would seem to point to something with the gpu.
That doesn't quite make sense to me in this case? I only ran memtest86 and no more. Can you explain?

edit: Though when it comes to my gpu I've had prior error messages involving adrenlin and the noise reduction feature if i remember correctly? and one of the dump files from the handful i've posted mentions something of amd when i used bluescreen viewer,"091323-60531-01" is the file number
 
That doesn't quite make sense to me in this case? I only ran memtest86 and no more. Can you explain?

edit: Though when it comes to my gpu I've had prior error messages involving adrenlin and the noise reduction feature if i remember correctly? and one of the dump files from the handful i've posted mentions something of amd when i used bluescreen viewer,"091323-60531-01" is the file number
First you run memtest and make sure it passes.

Then you remove the gpu and run your other stuff using the igp.
Your looking to see if you get a blue screen using the igp.
 
You have Driver Verifier enabled, why is that? Has someone else asked you to enable Driver Verifier? This is a troubleshooting tool and not something you would normally run enabled.

One of the dumps (the 0xC1 bugcheck) is a Driver Verifier BSOD that happened during a storage device access. It's reporting that a driver has corrupted memory (RAM), however in this case there is no obvious third-party driver at fault. The other two dumps are for different bugchecks, one is a 0x3B that bugchecked during a graphics operations, and the other is a 0xD1 that bugchecked during a USB device access operation.

These three dumps show that the BSODs happened during different operations and for different reasons. The only thing they have in common is that there are no third-party drivers referenced in the dumps, that's a strong indication of a hardware problem.

When you add these dumps to your earlier statement that...
ran cmd commands and tests like memtest86 and windows memory diagnostic with only the latter saying I had a memory problem and memtest coming back nothing.
...I'm even more convinced now that you have some sort of RAM issue. Since Memtest cannot find a problem (and no memory tester can be 100% accurate) I suggest you remove one stick of RAM and run on just one stick for long enough to have experienced a BSOD. Then run on just the other stick for long enough to have experienced a BSOD. Does it BSOD on one stick but not the other? Does it BSOD on either stick? Does it not BSOD on either stick?
 
You have Driver Verifier enabled, why is that? Has someone else asked you to enable Driver Verifier? This is a troubleshooting tool and not something you would normally run enabled.

One of the dumps (the 0xC1 bugcheck) is a Driver Verifier BSOD that happened during a storage device access. It's reporting that a driver has corrupted memory (RAM), however in this case there is no obvious third-party driver at fault. The other two dumps are for different bugchecks, one is a 0x3B that bugchecked during a graphics operations, and the other is a 0xD1 that bugchecked during a USB device access operation.

These three dumps show that the BSODs happened during different operations and for different reasons. The only thing they have in common is that there are no third-party drivers referenced in the dumps, that's a strong indication of a hardware problem.

When you add these dumps to your earlier statement that...

...I'm even more convinced now that you have some sort of RAM issue. Since Memtest cannot find a problem (and no memory tester can be 100% accurate) I suggest you remove one stick of RAM and run on just one stick for long enough to have experienced a BSOD. Then run on just the other stick for long enough to have experienced a BSOD. Does it BSOD on one stick but not the other? Does it BSOD on either stick? Does it not BSOD on either stick?
I'm actually not sure why driver verifier is enabled. i can't recall trying anything that directly mentioned enabling or using it, unless a cmd command i mentioned does that? I also said at one point that i was unsure of my rams compatibility as it's listed compatible under the non k series cpu but not k series ones? I was trying to find a definitive answer but couldn't quite find one.(or just didnt look well enough🤷‍♀️ )I also believe i tried running things with one stick and still experienced a blue screen but can try again.

with USB I only have a mouse, keyboard and portable hard-drive regularly plugged in. or an xbox controller i plug in since bluetooth caused a crash or two

Graphics i did have an instance playing starfield where i was changing locations and did crash.
 
We may want to enable Driver Verifier at some point (possibly), but with very specific options. For now I suggest you disable it by opening a command prompt and entering the command verifier /reset. Then reboot.

I'd also suggest that you try the PC with the xbox controller unplugged (there are some potential issues with the Windows xbox driver package). Also unplug the external drive so that only the mouse, keyboard and one monitor are connected. See how it is in that state.

Run Windows Update in case there are new driver updates available, then click the 'View optional updates' link, expand the Driver Updates section, and decide whether any of the driver updates there need installing (post a screenshot if you're not sure).

I also looked up the QVL lists for that RAM and board and did see that it's not in the K series list. The motherboard is however listed in the RAM QVL list, so I don't think it's a compatibility issue.
 
Verifier disabled and rebooted. as for updates there wasn't much other than that there are 3 I'm unsure of. 2 in windows update, intel/western digital and 1 on intel's site with their driver assistant which i'd assume is the integrated graphics but i couldn't find anything in a quick search. Same goes for the western digital which is the brand of my external drive. On another note (and sorry i didn't fit this in somewhere) when i did a fresh install, i did clear everything so it started anew. incase there are settings that may need changing 🤔

Edit: Another restart after i stepped away while a game was launching

Dump for what happened
message from adrenalin
and another dump as i was about to get off

 
Last edited:
The first dump (the 'what happened' dump) is a 0xD1, DRIVER_IRQL_NOT_LESS_OR_EQUAL bugcheck, and appears to point at the graphics driver amdkmdag.sys...
Code:
0: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff803`77c83b08 fffff803`7501ee29     nt!KeBugCheckEx
01 fffff803`77c83b10 fffff803`7501a9e3     nt!KiBugCheckDispatch+0x69
02 fffff803`77c83c50 fffff80f`4b82a570     nt!KiPageFault+0x463
03 fffff803`77c83de8 fffff80f`4b81a4fc     amdkmdag+0x1fba570
04 fffff803`77c83df0 ffffdf8f`e3510000     amdkmdag+0x1faa4fc
05 fffff803`77c83df8 000002db`6da82794     0xffffdf8f`e3510000
06 fffff803`77c83e00 ffffdf8f`e3510a08     0x000002db`6da82794
07 fffff803`77c83e08 fffff80f`4f49b3b8     0xffffdf8f`e3510a08
08 fffff803`77c83e10 fffff803`77c84601     amdkmdag+0x5c2b3b8
09 fffff803`77c83e18 ffffdf8f`d9d02fa0     0xfffff803`77c84601
0a fffff803`77c83e20 00000000`00000000     0xffffdf8f`d9d02fa0
In the stack trace above (which you read from the bottom up), you can see amdkmdag.sys called several times. It's also the final driver called prior to the page fault and bugcheck.

However, there are several apparent function calls in that stack for which we don't have symbols, and one of them (the 0x000002db`6da82794 function call) is a call to a user-mode address (it begins with 0x0000, whilst kernel-mode code addresses begin with 0-xFFFF). This is something that would never happen normally, the kernel never calls user-mode code directly, so something is seriously amiss with this call stack.

The second dump file (the 'as I was about to get off' dump) contains the clue that we need to see what's most likely going wrong here. The dump is a 0x1A, a MEMORY_MANAGEMENT bugcheck. The argument 1 value of 0x4179 indicates that a corrupt PTE (Page Table Entry) was encountered. A PTE is part of the page table structure that is used in translating virtual addresses into real RAM page addresses.

If we display the PTE in question we can see that it's clearly incomplete...
Code:
2: kd> !pte ffffc6bfff90ee18
                                           VA 00007fff21dc3000
PXE at FFFFC6E371B8D7F8    PPE at FFFFC6E371AFFFE0    PDE at FFFFC6E35FFFC870    PTE at FFFFC6BFFF90EE18
Unable to get PXE FFFFC6E371AFFFE0
We see no details of what values the data areas contain, which we would expect to see in a valid PTE. We can see that the PTE is for a user-mode virtual address; 0x00007fff21dc3000, which is useful information. If we look at what was going on at the time via the call stack...
Code:
2: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff9988`4f9feeb8 fffff806`7948124a     nt!KeBugCheckEx
01 ffff9988`4f9feec0 fffff806`7942f7b6     nt!MiDeleteVa+0x153a
02 ffff9988`4f9fefc0 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x776
03 ffff9988`4f9ff060 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x88b
04 ffff9988`4f9ff100 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x88b
05 ffff9988`4f9ff1a0 fffff806`7942c8cb     nt!MiWalkPageTablesRecursively+0x88b
06 ffff9988`4f9ff240 fffff806`7947fae1     nt!MiWalkPageTables+0x36b
07 ffff9988`4f9ff340 fffff806`794407b0     nt!MiDeletePagablePteRange+0x4f1
08 ffff9988`4f9ff650 fffff806`798b07c9     nt!MiDeleteVad+0x360
09 ffff9988`4f9ff760 fffff806`7981f3c8     nt!MiUnmapVad+0x49
0a ffff9988`4f9ff790 fffff806`7981d83f     nt!MiCleanVad+0x30
0b ffff9988`4f9ff7c0 fffff806`7989aa18     nt!MmCleanProcessAddressSpace+0x137
0c ffff9988`4f9ff840 fffff806`798aeb6e     nt!PspRundownSingleProcess+0x20c
0d ffff9988`4f9ff8d0 fffff806`798817ee     nt!PspExitThread+0x5f6
0e ffff9988`4f9ff9d0 fffff806`796105f5     nt!NtTerminateProcess+0xde
0f ffff9988`4f9ffa40 00007fff`2d64d3d4     nt!KiSystemServiceCopyEnd+0x25
10 0000001c`3e7ff618 00000000`00000000     0x00007fff`2d64d3d4
From the function calls we can see that a process had ended (nt!NtTerminateProcess) and we're cleaning up the memory and page tables that were used by that process. You can see the Vad (Virtual Address Space Descriptor) being deleted, and then the page tables are 'walked' clearing up all references to the now deleted address space. Finally there is an attempt to delete the address space itself (nt!MiDeleteVa) which causes the bugcheck - because of that corrupted PTE (which must be for the user-mode address space being deleted).

This is all pointing very strongly at a RAM problem, just as everything else suggests. I think we can be clear now and say with some certainty that there is a problem with your RAM, either in the RAM itself, in the RAM slots, in the motherboard memory controller, or even in the motherboard itself.

I can't remember the last time I saw a set of dumps that pointed more clearly at a RAM problem than these do.

Have you tried removing one RAM stick at a time as I suggested in post#13?
 
The first dump (the 'what happened' dump) is a 0xD1, DRIVER_IRQL_NOT_LESS_OR_EQUAL bugcheck, and appears to point at the graphics driver amdkmdag.sys...
Code:
0: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff803`77c83b08 fffff803`7501ee29     nt!KeBugCheckEx
01 fffff803`77c83b10 fffff803`7501a9e3     nt!KiBugCheckDispatch+0x69
02 fffff803`77c83c50 fffff80f`4b82a570     nt!KiPageFault+0x463
03 fffff803`77c83de8 fffff80f`4b81a4fc     amdkmdag+0x1fba570
04 fffff803`77c83df0 ffffdf8f`e3510000     amdkmdag+0x1faa4fc
05 fffff803`77c83df8 000002db`6da82794     0xffffdf8f`e3510000
06 fffff803`77c83e00 ffffdf8f`e3510a08     0x000002db`6da82794
07 fffff803`77c83e08 fffff80f`4f49b3b8     0xffffdf8f`e3510a08
08 fffff803`77c83e10 fffff803`77c84601     amdkmdag+0x5c2b3b8
09 fffff803`77c83e18 ffffdf8f`d9d02fa0     0xfffff803`77c84601
0a fffff803`77c83e20 00000000`00000000     0xffffdf8f`d9d02fa0
In the stack trace above (which you read from the bottom up), you can see amdkmdag.sys called several times. It's also the final driver called prior to the page fault and bugcheck.

However, there are several apparent function calls in that stack for which we don't have symbols, and one of them (the 0x000002db`6da82794 function call) is a call to a user-mode address (it begins with 0x0000, whilst kernel-mode code addresses begin with 0-xFFFF). This is something that would never happen normally, the kernel never calls user-mode code directly, so something is seriously amiss with this call stack.

The second dump file (the 'as I was about to get off' dump) contains the clue that we need to see what's most likely going wrong here. The dump is a 0x1A, a MEMORY_MANAGEMENT bugcheck. The argument 1 value of 0x4179 indicates that a corrupt PTE (Page Table Entry) was encountered. A PTE is part of the page table structure that is used in translating virtual addresses into real RAM page addresses.

If we display the PTE in question we can see that it's clearly incomplete...
Code:
2: kd> !pte ffffc6bfff90ee18
                                           VA 00007fff21dc3000
PXE at FFFFC6E371B8D7F8    PPE at FFFFC6E371AFFFE0    PDE at FFFFC6E35FFFC870    PTE at FFFFC6BFFF90EE18
Unable to get PXE FFFFC6E371AFFFE0
We see no details of what values the data areas contain, which we would expect to see in a valid PTE. We can see that the PTE is for a user-mode virtual address; 0x00007fff21dc3000, which is useful information. If we look at what was going on at the time via the call stack...
Code:
2: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff9988`4f9feeb8 fffff806`7948124a     nt!KeBugCheckEx
01 ffff9988`4f9feec0 fffff806`7942f7b6     nt!MiDeleteVa+0x153a
02 ffff9988`4f9fefc0 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x776
03 ffff9988`4f9ff060 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x88b
04 ffff9988`4f9ff100 fffff806`7942f8cb     nt!MiWalkPageTablesRecursively+0x88b
05 ffff9988`4f9ff1a0 fffff806`7942c8cb     nt!MiWalkPageTablesRecursively+0x88b
06 ffff9988`4f9ff240 fffff806`7947fae1     nt!MiWalkPageTables+0x36b
07 ffff9988`4f9ff340 fffff806`794407b0     nt!MiDeletePagablePteRange+0x4f1
08 ffff9988`4f9ff650 fffff806`798b07c9     nt!MiDeleteVad+0x360
09 ffff9988`4f9ff760 fffff806`7981f3c8     nt!MiUnmapVad+0x49
0a ffff9988`4f9ff790 fffff806`7981d83f     nt!MiCleanVad+0x30
0b ffff9988`4f9ff7c0 fffff806`7989aa18     nt!MmCleanProcessAddressSpace+0x137
0c ffff9988`4f9ff840 fffff806`798aeb6e     nt!PspRundownSingleProcess+0x20c
0d ffff9988`4f9ff8d0 fffff806`798817ee     nt!PspExitThread+0x5f6
0e ffff9988`4f9ff9d0 fffff806`796105f5     nt!NtTerminateProcess+0xde
0f ffff9988`4f9ffa40 00007fff`2d64d3d4     nt!KiSystemServiceCopyEnd+0x25
10 0000001c`3e7ff618 00000000`00000000     0x00007fff`2d64d3d4
From the function calls we can see that a process had ended (nt!NtTerminateProcess) and we're cleaning up the memory and page tables that were used by that process. You can see the Vad (Virtual Address Space Descriptor) being deleted, and then the page tables are 'walked' clearing up all references to the now deleted address space. Finally there is an attempt to delete the address space itself (nt!MiDeleteVa) which causes the bugcheck - because of that corrupted PTE (which must be for the user-mode address space being deleted).

This is all pointing very strongly at a RAM problem, just as everything else suggests. I think we can be clear now and say with some certainty that there is a problem with your RAM, either in the RAM itself, in the RAM slots, in the motherboard memory controller, or even in the motherboard itself.

I can't remember the last time I saw a set of dumps that pointed more clearly at a RAM problem than these do.

Have you tried removing one RAM stick at a time as I suggested in post#13?
How would you like me to test the ram? One at a time in each slot or just the slots they are currently in? If it matters, allergies have me out to lunch
Edit: running 1st stick in the A2 slot 3 dumps? ended up losing connection to my mouse and keyboard here
2nd stick in a2 slot has been fine from about 9:30 to 12 watching videos to playing GTAV. Guess the one stick is bad?
 
Last edited: