Question Looping BSODs on new build

Dec 18, 2023
13
1
15
Ryzen 9 7950x3d
16gbx2 DDR5 6000mhz gskill flare
XFX 7900XT
MSI Tomahawk x670e (7E12v14)
Samsung 980 2tb nvme x2
Corsair RM1000x PSU

Hello. I started getting blue screens after a new build. It seemed as though I had managed to fix them by turning off memory context restore in bios. This caused the BSODs to cease, and the PC ran for five days without problem before I started getting looping BSODs, where the computer would crash within a few seconds to a few minutes of booting.

I tried to write down the BSODs as they occurred:

Kmode exception not handled – dxgkrnl.sys/netio.sys/afd.sys/iorate.sys
System service exception – wdfilter.sys/fltmgr.sys/ntfs.sys
system thread exception not handled
irql not less or equal – ntoskrnl
memory management
PFN list corrupt
registry error
page fault in non paged area
kernel security check failure
DPC watchdog violation
kernel mode heap corruption
critical process died
amdkmdag.sys
Cache Manager

Initial googling suggested ram problems, so I tried running the PC with each ram stick individually, but both still got BSODS alone. Next I ran memtest86, but both sticks passed without error. I then tried using safemode to run DISM restore health and sfc scannow prompts, but those came up clean as well.

Next I guessed it must be some kind of driver conflict, but I didn’t want to go through the process of using driver verifier to find out which driver. So I decided just to try a clean install of windows 11, but I ended up with BSODs during the windows reinstall. I saw some posts online saying that it could be a windows 11 issue. So I made a boot USB for windows 10 which successfully installed. The system ran normally for about 4 hours until the BSODs resumed, again looping on startup. They persisted until eventually windows 10 brought up the recovery screen suggesting windows may not be installed correctly, from there I reinstalled W10. Again, repeating blue screens along with a few hard locks with cuts to a black screen.

Now I’m at a loss. Are there any other steps for me to take before concluding hardware failure? And if it is hardware failure, what part is the likely culprit since my ram passed memtest86?

Thanks for any help.
 
Update: I checked MSI's memory compatibility list for my motherboard and found that the ram I had purchased wasn't on it. Originally, PC part picker had said that my ram was compatible with my motherboard and I hadn't bothered to look into it further. I sent back my original set for a refund and ordered a set of ram that is listed on MSI's compatiblity list, unfortunately that hasn't solved my problem.

After receiving the replacement ram, I cleared the CMOS, installed the new sticks and attempted a fresh install of windows 10. My first two attempts resulted in a memory management BSOD. My third popped up a windows installation error 0xC0000005 message and clicking ok resulted in a System service exception - ntfs.sys BSOD.

This new ram seems worse, in that I can't even get windows installed like I could before. Not sure what steps to take next.
 
Update: I checked MSI's memory compatibility list for my motherboard and found that the ram I had purchased wasn't on it. Originally, PC part picker had said that my ram was compatible with my motherboard and I hadn't bothered to look into it further. I sent back my original set for a refund and ordered a set of ram that is listed on MSI's compatiblity list, unfortunately that hasn't solved my problem.

After receiving the replacement ram, I cleared the CMOS, installed the new sticks and attempted a fresh install of windows 10. My first two attempts resulted in a memory management BSOD. My third popped up a windows installation error 0xC0000005 message and clicking ok resulted in a System service exception - ntfs.sys BSOD.

This new ram seems worse, in that I can't even get windows installed like I could before. Not sure what steps to take next.

Check this out. Follow the steps.

View: https://youtu.be/dlYxmRcdLVw?si=9uIQt10DVsSdWS1c
 
  • Like
Reactions: EmptNor
Thanks for the vid, I watched it and am wondering about his Tras time. He sets it to 28, but the manufacturer sets the Tras to 96. Is the recommended manual timing really that much lower?

Also, he states that these timings will only work on hynix dies and will fail to post with others. The G.skill site for my ram doesn't state the dies, but the MSI compatibility page lists them as SK hynix. I trust that means it will work?
 
  • Like
Reactions: drivinfast247
Well, after setting the timings and voltages exactly as they were in that video, I still suffered a memory management bsod during my windows install. Possibly of note is that after setting the timings and restarting my PC, it failed to post to bios. I left it running for about 5 minutes before shutting off by holding the power button. Attempting to boot a second time did reach bios, and after checking that my settings had saved, I attempted to install windows.
 
Update: I ran memtest86 with the custom ram settings, and after about ten minutes the test aborted due to too many errors. I then ran the test again twice, once with bios reset to default and again with expo on. Both aborted due to errors.

My original ram passed memtest, but I returned it for not being listed as compatible with my mobo. This new set is compatible (F5-6000J3038F16GX2-TZ5N), so if it's failing memtest that suggests physical defect, right?
 
Since it sounds like you're getting failures with or without EXPO I'd have to say it does sound like a faulty kit of DRAM.

I haven't really heard of this happening on Zen 4, but sometimes mounting pressure on the CPU can cause problems (either too tight or not tight enough). It would be worthwhile checking to make sure you've got good mounting pressure on the CPU, because while the other kit might not have been on the QVL it should have worked fine at base JEDEC timings/settings.

It's also not impossible there's something wrong with the actual DIMM slots themselves, but that's pretty hard to diagnose unless there's obvious marks/damage.
 
  • Like
Reactions: EmptNor
I imagine that the memtest results can't be due to mounting pressure, due to the differing results between sets. But it could be that I've both a faulty set and a bad mount. I had tried to be pretty careful in applying even pressure when I put on my cooler, but that doesn't mean I didn't make a mistake.

So I guess my next steps are to memtest each stick seperately, return them if necessary, and potentially reseat the cooler?

It's unlikely that both sticks are faulty and both will fail their tests, right? So if they do, does that actually suggest a mobo/cpu problem? Also, when reseating the cooler, is it ok to just loosen the screws and then attempt to retighten them evenly/firmly? Or should I completely pull the cooler, then clean and reapply thermal paste before remounting? I'm averse to this because I think I'd have to pull the mobo out to do it.
 
So I guess my next steps are to memtest each stick seperately, return them if necessary, and potentially reseat the cooler?
Yeah this would be the way to go first. If you test each stick individually and only one fails then just return the kit.

If they do both fail that doesn't necessarily mean that there's a CPU/mobo problem, but I'd say it's a distinct possibility just because you had problems with another kit.
Also, when reseating the cooler, is it ok to just loosen the screws and then attempt to retighten them evenly/firmly? Or should I completely pull the cooler, then clean and reapply thermal paste before remounting? I'm averse to this because I think I'd have to pull the mobo out to do it.
Depends on what cooler it is and how the mounting is done. For the most part I think you should be fine loosening the screws and then tightening making sure there is good tension.
 
Update: I ran memtest86 with the custom ram settings, and after about ten minutes the test aborted due to too many errors. I then ran the test again twice, once with bios reset to default and again with expo on. Both aborted due to errors.

My original ram passed memtest, but I returned it for not being listed as compatible with my mobo. This new set is compatible (F5-6000J3038F16GX2-TZ5N), so if it's failing memtest that suggests physical defect, right?
If you're getting memtest errors @ default settings I'd say the RAM is bad.
 
  • Like
Reactions: EmptNor
Update: I ran memtest86 on each ram stick individually. One passed, the other failed, so I sent them back for a replacement. The replacement arrived and I immediately ran memtest86 on the sticks together to make sure they were good. They passed and then I used the ram timings from the video above since I'd read many comments on the video stating that it solved their instability issues. Again ran memtest 86 with the custom timings and passed.

Windows installed without a problem, along with all the drivers and windows updates. Next I downloaded HCI memtest and ran it without issue (32 instances for each core, available ram divided among each instance, each instance ran to 400% coverage without error.)

Next I tried downloading a game off steam, but about halfway through the download the system crashed to black and rebooted. After that, I've again been suffering looping BSODs(memory management, system service exception ntfs.sys, irql not less or equal) within seconds of booting into windows.

I reverted the ram timings to default, and was able to get into windows. Since I had no errors during the HCI memtest yet hardcrashed during a download, I suspected nvme issues. So I downloaded samsung magician, ran its diagnostic scan, and got a BSOD (PFN list corrupt). Attempted the scan again and samsung magician crashed, attempted a third time and the system hard crashed to black and rebooted. Additionally, firefox began to crash frequently, taking me several attempts to download samsung magician without the browser crashing.

After a whole day, I'm left unclear. On one hand, the fact that switching my ram settings back to default somewhat stabilized things suggests that the custom ram timings were a problem(even though they passed HCI memtest). On the other, the crashes during the nvme diagnostic scan suggests a drive issue. Should I replace the NVME or does my experience indicate something else?
 
Installed windows to my second nvme, along with drivers and updates. Ran HCI memtest again to 200% coverage with no errors. Then ran the diagnostic scan in samsung magician and got a bsod within seconds. Tried twice more with same result. Tried to run sfc /scannow and DISM restore health in command prompt, but got a bsod from that as well.

Hard to imagine that the both nvme drives are faulty. And hard to imagine that either my ram or cpu are the problem after running HCI memtest twice without error. At this point I feel like the only possible avenue is to replace the mobo.
 
Sorry if it’s bad form to bump a thread this old, but my problems with the system returned and I wasn’t sure if I should make a new thread for help. Back in February I refunded my MSI board and bought an asrock x670e steel legend. One full rebuild later, I was able to get w11 running and I used the system without issue until May when my problems reappeared.

This time the crashes weren’t looping, but intermittent, sometimes BSODs and sometimes Black screens of death (yet event viewer and blue screen viewer still logged these as BSODs). I ran both memtest86 and HCI memtest without error, ran a SFC scan and DISM scan which found nothing, uninstalled and reinstalled my chipset drivers and GPU drivers (wiping with DDU in safe mode first), updated my bios, but still suffered intermittent BSODs. Finally I reformatted and reinstalled w11, but still got BSODs.

I saved the minidumps before reformatting, zipped up here along with the ones created after reformatting.
https://www.mediafire.com/file/9zbddmjm8frco81/minidumps.zip/file

Any help in diagnosing what could be wrong would be immensely appreciated. Seven months after the initial build and still having issues has me miserable. Thanks.
 

TRENDING THREADS