Question Random BSODs related to memory, yet no errors reported in Memtest86+ ?

  • Thread starter Thread starter Deleted member 2969713
  • Start date Start date
D

Deleted member 2969713

Guest
I recently upgraded my CPU and heatsink with some difficulty, and a number of times accidentally struck the motherboard with the screwdriver when it slipped. I'm worried that this is the cause of the crashes I'm experiencing, but I wonder if perhaps someone who knows more than me would know different. I also changed my NVMe SSD and reinstalled Windows at the same time as the CPU swap.

I've experienced two system crashes in the last couple of days. They're explained in more detail below.

After installing the CPU, I first manually set the memory in the BIOS to 3200, since I read that that's what the Ryzen 5600x, my new processor, officially supports. Then I experienced a BSOD, and the screen went by too fast for me to see any error code. But I learned you could view the error using Event Viewer. "The bugcheck was: 0x0000001a (0x0000000000041790, 0xffffe5000c52ed20, 0x0000000000000000, 0x0000000000000001)." According to Microsoft, it's a memory management error, indicating that a page table was corrupted.

After that happened, I went into the BIOS and disabled my manual clock setting and enabled the first XMP setting, which set the memory to 3600. Then I ran memtest86+ for 6 hours and eight passes, with no errors detected. I played a PC game for a while in the evening with no issues.

This morning I started up my PC, and when I turned on my monitor I was greeted with another BSOD (I had disabled auto-restart on crash). It was another memory-related crash. I went into Event Viewer expecting to find the same bugcheck, but this time it was 0x0000001a (0x0000000000041792, 0xffff97000005cc88, 0x0000000000040000, 0x0000000000000000). This indicates a corrupted PTE (page table entry?).

Given that I ran memtest86+ for a long time with a thorough 8 passes and encountered zero issues, I'm skeptical that my RAM sticks are the real source of the problem, especially since this issue only started after swapping my CPU and installing a new cooler. It might be worth mentioning that I initially used the stock cooler, and had removed and reapplied it several times without applying new thermal paste because I didn't have any on hand and I was having trouble installing the heatsink.

I didn't encounter any crashes with the stock cooler installed, but I didn't use my computer much either, because I ran stress tests and watched as the CPU temperature exceeded 90 degrees Celsius and knew I had to at least buy and reapply thermal paste. But I decided to actually buy an aftermarket cooler since I had read they could be much better than the stock one, and I specifically bought the Thermalright Assassin X120 Refined SE tower cooler.

I installed the new aftermarket cooler without any major difficulties, and I also removed my case's front acrylic panel to provide better airflow. I ran a stress test and was pleased to see that the CPU temps were now hovering around 60 degrees after minutes of full load instead of exceeding 90.

But now I'm getting BSODs.

Sorry for the long post, but I'm hoping someone might have insights I do not. I'm currently typing this post on my PC after rebooting after the second BSOD and it hasn't crashed yet...

I suppose I could try running the memory at the stock speeds and see if the BSODs keep happening, but I don't really want to do that since without tweaking the speed is at 2400 or something low like that, and I was previously running it at 2933 with my Ryzen 3200g.
 
Solution
It's unwise to run without a paging file - because you won't be able to write any dumps. Dumps are written initially to the paging file. Enter the command sysdm.cpl at the Run command prompt, click the Advanced tab, click the top Settings button (Performance), click the Advanced tab, click the Change button in Virtual Memory. In there ensure that the top checkbox (Automatically manage paging file size for all drives) IS checked. Windows will then size the paging file appropriately and place it on your fastest drive - which should be the system drive in any case.

That most recent dump (the 0x7A) is very useful because it indicates that the problem was in paging in a paged-out page. That means that the failure was either in RAM or in the...
Did the screwdriver "slips" cause any visible damage?

= = = =

Update your post to include full system hardware specs and OS information.

Include PSU: make, model, wattage, age?

Disk drive(s): make, model, capacity, how full.

Also look in Reliability History/Monitor. Much more end user friendly and the time line format may reveal patterns.

Any reported entries can be clicked for more details. The details may or may not be helpful.

Overall: increasing numbers of errors and varying errors make the PSU suspect.
 
Not anything obvious, but the lighting wasn't great to begin with because of the motherboard being surrounded by my case. The motherboard did end up flexing a little as well when I was struggling to screw in the stock heatsink without the backplate there. 🙄

If I did actually score the motherboard enough that traces were broken, would system instability be the likely result or would the PC more likely just not boot up or fail on boot?
 
Unless any damaged traces can be specifically identified via schematics etc., it may be difficult to determine what may or may not happen.

Depending on the full extent of "flexing a little" it is also possible that solder joints cracked somewhere.

Just continue using the tools and hopefully there will be a pattern of some sort revealed over time.
 
If you're getting BSODs please upload all dump files to a cloud service with a link to them here. You'll find the dumps in C:\Windows\Minidump.
What important information would the dump files reveal that the bugcheck data I posted wouldn't? If there's something in them that would be useful besides the type/cause of the crash, I'll post them, but if not, that information is already in my initial post.

Unless any damaged traces can be specifically identified via schematics etc., it may be difficult to determine what may or may not happen.

Depending on the full extent of "flexing a little" it is also possible that solder joints cracked somewhere.

Just continue using the tools and hopefully there will be a pattern of some sort revealed over time.
My best guess is that no traces were actually damaged, since I didn't see any visible marks and, when I first built this PC a few years ago, I recall the screwdriver slipping a few times then as well and I never encountered any issues as a result.

As for the flex, while it made me mildly concerned when it happened, I didn't hear any alarming cracks or other noises and the amount of bend was small. Still, I wish I had realized sooner about the backplate. It had been years since I put the PC together as my first built, so I completely forgot about its existence and it fell off when I was unscrewing the old heatsink. I heard a clunk, but at the time I had assumed it was the processor falling off of the heatsink as I lifted it up, which I should have realized was impossible since the CPU level was down and it was locked in place. Anyway.

I haven't had a BSOD since the second one. As far as system instability goes, this doesn't seem too terrible. Hopefully it was just a one (two?)-off issue, but even if it happens again it seems to happen infrequently enough that it won't be too much of a bother. I'll just have to be mindful to save my work more frequently just in case. And if it starts to get worse, I suppose I could always replace the motherboard, if indeed that is the issue...

I appreciate the help. I'll post again if I encounter another BSOD.
 
What important information would the dump files reveal that the bugcheck data I posted wouldn't? If there's something in them that would be useful besides the type/cause of the crash, I'll post them, but if not, that information is already in my initial post.
Seriously? You think I haven nothing better to do than ask people for memory dumps that aren't needed? Don't worry yourself, I'll move on to someone else who wants to be helped.
 
... Okay, does anyone else want to clarify what helpful information the minidump files contain besides the bugcheck info? It would be helpful to learn. I used BlueScreenView to examine the files and there's the bugcheck info I already knew about, as well as the driver that caused the crash. There's also some memory addresses and timestamps, but it's hard for me to imagine how exactly those would be useful.

For the first BSOD (corrupted page table) the driver was NT File System Driver (ntfs.sys).

For the second BSOD (corrupted PTE) the "driver" was NT Kernel & System (ntoskrnl.exe).
 
Well, don't give it up and you're back to blindly guessing, MS didn't release their WinDbg toolkit just for kicks. BlueScreenView can be useful but it's also very limited without an experienced pair of eyes scanning the full results.

Someone with experience using WinDbg can decipher the codes stored and backtrace to find cause and work out fixes to most BSOD types.

If you have a few spare weeks:- https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools
 
  • Like
Reactions: ubuysa
... Okay, does anyone else want to clarify what helpful information the minidump files contain besides the bugcheck info? It would be helpful to learn. I used BlueScreenView to examine the files and there's the bugcheck info I already knew about, as well as the driver that caused the crash. There's also some memory addresses and timestamps, but it's hard for me to imagine how exactly those would be useful.

For the first BSOD (corrupted page table) the driver was NT File System Driver (ntfs.sys).

For the second BSOD (corrupted PTE) the "driver" was NT Kernel & System (ntoskrnl.exe).
No they weren't. This is where BlueScreenView is useless. Those are Microsoft modules and they are not at fault. For example, ntoskrnl.exe is the Windows kernel, it that was faulty the whole world would know!

What has likely happened here is that a third-party driver (or device) has fouled-up in some way and that the error wasn't detected until a Microsoft high-level driver (like ntfs.sys - the filesystem driver), or the Windows kernel (ntoskrnl.exe) got control and validated what they were being asked to do. Because the kernel has no idea what that third-party driver was trying to do, nor whether data corruption is likely, the kernel BSODs - with the high-level driver or kernel as the failing module. A good 80% of BSODs flag ntoskrnl.exe as the failing module for example.

A BSOD always writes a memory dump so that someone with the knowledge, skills and experience to analyse the dump can walk back through the operations to attempt to discover what caused the original failure. That's why we ask for the dumps.

TBH if you're going to come an a forum and ask for help it's not wise to try and tell the people trying to help what they do and don't need....
 
if ntoskrnl was the cause, windows wouldn't work

NTOSKRNL = windows kernel. It handles all driver requests, power management, and memory management. It sits between Hardware and Applications. It got blamed but its not the cause

Its involved in almost every action on your PC

Do yourself a favor and supply the dump ubuysa asked for. You can stop guessing then :)
 
Well, don't give it up and you're back to blindly guessing, MS didn't release their WinDbg toolkit just for kicks. BlueScreenView can be useful but it's also very limited without an experienced pair of eyes scanning the full results.

Someone with experience using WinDbg can decipher the codes stored and backtrace to find cause and work out fixes to most BSOD types.

If you have a few spare weeks:- https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools
Thank you. I may not be able to do anything advanced with the dump files now, but if I can learn, it could be useful in the future.

No they weren't. This is where BlueScreenView is useless. Those are Microsoft modules and they are not at fault. For example, ntoskrnl.exe is the Windows kernel, it that was faulty the whole world would know!

What has likely happened here is that a third-party driver (or device) has fouled-up in some way and that the error wasn't detected until a Microsoft high-level driver (like ntfs.sys - the filesystem driver), or the Windows kernel (ntoskrnl.exe) got control and validated what they were being asked to do. Because the kernel has no idea what that third-party driver was trying to do, nor whether data corruption is likely, the kernel BSODs - with the high-level driver or kernel as the failing module. A good 80% of BSODs flag ntoskrnl.exe as the failing module for example.

A BSOD always writes a memory dump so that someone with the knowledge, skills and experience to analyse the dump can walk back through the operations to attempt to discover what caused the original failure. That's why we ask for the dumps.
Thank you for providing more information. The reason I didn't immediately post the dump files was because I had heard there are privacy/security concerns with publicly posting those, and I was unaware of how those would be useful beyond providing the basic bugcheck information, which is why I asked.

TBH if you're going to come an a forum and ask for help it's not wise to try and tell the people trying to help what they do and don't need....
Except I didn't:
What important information would the dump files reveal that the bugcheck data I posted wouldn't? If there's something in them that would be useful besides the type/cause of the crash, I'll post them, but if not, that information is already in my initial post.
I asked a clarifying question, and specifically said that I would post them if there was a need. I'll admit that I assumed that you didn't read my full post, since your response looked like a stock BSOD-support post.

if ntoskrnl was the cause, windows wouldn't work

NTOSKRNL = windows kernel. It handles all driver requests, power management, and memory management. It sits between Hardware and Applications. It got blamed but its not the cause

Its involved in almost every action on your PC

Do yourself a favor and supply the dump ubuysa asked for. You can stop guessing then :)
Indeed. Well, I was hoping that the problem would go away, but once again this morning I started up my PC and was greeted with a blue screen after turning on my monitor. It was the same crash type as the second one, but with different parameters 2 and 3.

So, here are the minidump files:

Crash 1
Crash 2
Crash 3 (most recent)

Here is WinDbg's Bugcheck analysis of the most recent dump file:

Code:
MEMORY_MANAGEMENT (1a)
    # Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000041792, A corrupt PTE has been detected. Parameter 2 contains the address of
    the PTE. Parameters 3/4 contain the low/high parts of the PTE.
Arg2: ffff89806d2c0750
Arg3: 0000000000100000
Arg4: 0000000000000000

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2218

    Key  : Analysis.Elapsed.mSec
    Value: 9611

    Key  : Analysis.IO.Other.Mb
    Value: 27

    Key  : Analysis.IO.Read.Mb
    Value: 16

    Key  : Analysis.IO.Write.Mb
    Value: 48

    Key  : Analysis.Init.CPU.mSec
    Value: 171

    Key  : Analysis.Init.Elapsed.mSec
    Value: 49982

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 100

    Key  : Bugcheck.Code.LegacyAPI
    Value: 0x1a

    Key  : Dump.Attributes.AsUlong
    Value: 808

    Key  : Dump.Attributes.KernelGeneratedTriageDump
    Value: 1

    Key  : Failure.Bucket
    Value: MEMORY_CORRUPTION_ONE_BIT

    Key  : Failure.Hash
    Value: {e3faf315-c3d0-81db-819a-6c43d23c63a7}

    Key  : Hypervisor.Enlightenments.ValueHex
    Value: 1497cf94

    Key  : Hypervisor.Flags.AnyHypervisorPresent
    Value: 1

    Key  : Hypervisor.Flags.ApicEnlightened
    Value: 1

    Key  : Hypervisor.Flags.ApicVirtualizationAvailable
    Value: 0

    Key  : Hypervisor.Flags.AsyncMemoryHint
    Value: 0

    Key  : Hypervisor.Flags.CoreSchedulerRequested
    Value: 0

    Key  : Hypervisor.Flags.CpuManager
    Value: 1

    Key  : Hypervisor.Flags.DeprecateAutoEoi
    Value: 0

    Key  : Hypervisor.Flags.DynamicCpuDisabled
    Value: 1

    Key  : Hypervisor.Flags.Epf
    Value: 0

    Key  : Hypervisor.Flags.ExtendedProcessorMasks
    Value: 1

    Key  : Hypervisor.Flags.HardwareMbecAvailable
    Value: 1

    Key  : Hypervisor.Flags.MaxBankNumber
    Value: 0

    Key  : Hypervisor.Flags.MemoryZeroingControl
    Value: 0

    Key  : Hypervisor.Flags.NoExtendedRangeFlush
    Value: 0

    Key  : Hypervisor.Flags.NoNonArchCoreSharing
    Value: 1

    Key  : Hypervisor.Flags.Phase0InitDone
    Value: 1

    Key  : Hypervisor.Flags.PowerSchedulerQos
    Value: 0

    Key  : Hypervisor.Flags.RootScheduler
    Value: 0

    Key  : Hypervisor.Flags.SynicAvailable
    Value: 1

    Key  : Hypervisor.Flags.UseQpcBias
    Value: 0

    Key  : Hypervisor.Flags.Value
    Value: 4853999

    Key  : Hypervisor.Flags.ValueHex
    Value: 4a10ef

    Key  : Hypervisor.Flags.VpAssistPage
    Value: 1

    Key  : Hypervisor.Flags.VsmAvailable
    Value: 1

    Key  : Hypervisor.RootFlags.AccessStats
    Value: 1

    Key  : Hypervisor.RootFlags.CrashdumpEnlightened
    Value: 1

    Key  : Hypervisor.RootFlags.CreateVirtualProcessor
    Value: 1

    Key  : Hypervisor.RootFlags.DisableHyperthreading
    Value: 0

    Key  : Hypervisor.RootFlags.HostTimelineSync
    Value: 1

    Key  : Hypervisor.RootFlags.HypervisorDebuggingEnabled
    Value: 0

    Key  : Hypervisor.RootFlags.IsHyperV
    Value: 1

    Key  : Hypervisor.RootFlags.LivedumpEnlightened
    Value: 1

    Key  : Hypervisor.RootFlags.MapDeviceInterrupt
    Value: 1

    Key  : Hypervisor.RootFlags.MceEnlightened
    Value: 1

    Key  : Hypervisor.RootFlags.Nested
    Value: 0

    Key  : Hypervisor.RootFlags.StartLogicalProcessor
    Value: 1

    Key  : Hypervisor.RootFlags.Value
    Value: 1015

    Key  : Hypervisor.RootFlags.ValueHex
    Value: 3f7

    Key  : MemoryManagement.PFN
    Value: 100


BUGCHECK_CODE:  1a

BUGCHECK_P1: 41792

BUGCHECK_P2: ffff89806d2c0750

BUGCHECK_P3: 100000

BUGCHECK_P4: 0

FILE_IN_CAB:  010724-20250-01.dmp

TAG_NOT_DEFINED_202b:  *** Unknown TAG in analysis list 202b


DUMP_FILE_ATTRIBUTES: 0x808
  Kernel Generated Triage Dump

MEMORY_CORRUPTOR:  ONE_BIT

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  svchost.exe

STACK_TEXT:
ffffd404`5d0cf1e8 fffff801`040821e2     : 00000000`0000001a 00000000`00041792 ffff8980`6d2c0750 00000000`00100000 : nt!KeBugCheckEx
ffffd404`5d0cf1f0 fffff801`03e505ef     : ffffc283`b73e4740 00000000`00000000 00000000`00000002 ffffd404`5d0cf3e0 : nt!MiDeleteVa+0x231a62
ffffd404`5d0cf2e0 fffff801`03e4a67a     : ffffc283`b73e4590 ffffc283`b68440c0 ffffd404`5d0cf680 ffffc283`b68447c0 : nt!MiDeletePagablePteRange+0x2cf
ffffd404`5d0cf5f0 fffff801`042e5107     : 00000000`00000000 00000000`00000001 00000000`0da57a7f 000000da`57980000 : nt!MiDeleteVirtualAddresses+0x4e
ffffd404`5d0cf640 fffff801`03ec6977     : 000000da`58080000 00000000`00000000 ffffd404`5d0cf8c0 00000000`00000000 : nt!MiDeleteVad+0x1b7
ffffd404`5d0cf700 fffff801`04309360     : 00000000`00000000 00000000`00000000 ffffd404`5d0cf860 fffff801`00000000 : nt!MiFreeVadRange+0xa3
ffffd404`5d0cf760 fffff801`045b1ae1     : 00000000`00000000 ffffd404`5d0cfaa0 000000da`57720000 ffffc283`b68440c0 : nt!MmFreeVirtualMemory+0x3b0
ffffd404`5d0cf8a0 fffff801`044c17bc     : 00000000`00000000 000000da`58080000 00000000`00000001 00000000`00000000 : nt!PspFreeCurrentThreadUserShadowStack+0x81
ffffd404`5d0cf910 fffff801`04370613     : 00000000`00000000 00000000`00000000 ffffc283`b6844134 000000da`57720000 : nt!PspExitThread+0x1e5778
ffffd404`5d0cfa10 fffff801`0437059a     : 00000000`00000000 00000000`00000000 ffffc283`b68440c0 fffff801`043403a9 : nt!PspTerminateThreadByPointer+0x53
ffffd404`5d0cfa50 fffff801`0402bbe8     : 00000000`00000000 ffffc283`b68440c0 ffffd404`5d0cfb20 ffffc283`00000000 : nt!NtTerminateThread+0x4a
ffffd404`5d0cfaa0 00007ffe`a2e4fdd4     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
000000da`57a7f9a8 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffe`a2e4fdd4


MODULE_NAME: hardware

IMAGE_NAME:  memory_corruption

STACK_COMMAND:  .cxr; .ecxr ; kb

FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_ONE_BIT

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {e3faf315-c3d0-81db-819a-6c43d23c63a7}

Followup:     MachineOwner
---------

According to this SuperUser Q&A, the issue might be with my SSD, which makes sense since I recently changed the SSD. Is there some kind of equivalent for Memcheck that I could run to test my SSD?
 
Last edited by a moderator:
Did the screwdriver "slips" cause any visible damage?

= = = =

Update your post to include full system hardware specs and OS information.

Include PSU: make, model, wattage, age?

Disk drive(s): make, model, capacity, how full.

Also look in Reliability History/Monitor. Much more end user friendly and the time line format may reveal patterns.

Any reported entries can be clicked for more details. The details may or may not be helpful.

Overall: increasing numbers of errors and varying errors make the PSU suspect.
I just noticed all the stuff below the "====". I automatically skipped over that because my brain interpreted that as "forum signature."

Here's the full system specs (GPU might actually be XLR8 version, but that's irrelevant for this):

PCPartPicker Part List: https://pcpartpicker.com/list/Yp3fVW

CPU: AMD Ryzen 5 5600X 3.7 GHz 6-Core Processor
CPU Cooler: Thermalright Assassin X 120 Refined SE 66.17 CFM CPU Cooler
Motherboard: MSI B450M PRO-M2 MAX Micro ATX AM4 Motherboard
Memory: G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3600 CL18 Memory
Storage: Western Digital Green 480 GB 2.5" Solid State Drive
Storage: MSI SPATIUM M371 1 TB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive
Video Card: PNY VCG16606SSFPPB GeForce GTX 1660 SUPER 6 GB Video Card
Case: Cougar MG120 MicroATX Mini Tower Case
Power Supply: Corsair CX450 (2017) 450 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 11 Home Retail - Download 64-bit

Generated by PCPartPicker 2024-01-07 13:11 EST-0500

The PSU was bought in 2020. The system drive, the M371, has 792 GB free.

And here are my system specs prior to upgrading, when my system was stable without BSODs:

PCPartPicker Part List: https://pcpartpicker.com/list/sQwwCd

CPU: AMD Ryzen 3 3200G 3.6 GHz Quad-Core Processor
Motherboard: MSI B450M PRO-M2 MAX Micro ATX AM4 Motherboard
Memory: G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3600 CL18 Memory
Storage: Crucial P2 250 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive
Storage: Western Digital Green 480 GB 2.5" Solid State Drive
Video Card: PNY VCG16606SSFPPB GeForce GTX 1660 SUPER 6 GB Video Card
Case: Cougar MG120 MicroATX Mini Tower Case
Power Supply: Corsair CX450 (2017) 450 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 11 Home Retail - Download 64-bit

Generated by PCPartPicker 2024-01-07 13:34 EST-0500

I didn't know about the Reliability Monitor, that's cool.
 
Last edited by a moderator:
The Corsair PSU is a likely suspect being only 450 watts and 4 years old.
I wasn't aware that PSUs could start to fail that soon. If it's the real cause of the problems, that's good, since a PSU won't be as expensive as a motherboard to replace. But would the only way to know for certain that it was the source be to buy a new PSU, swap it out, and wait for (a lack of) BSODs?
 
Being four years old likely means that the PSU may be nearing its' designed in EOL (End of Life). Especially if there is a history of heavy gaming use, video editing, or even bitmining.

Things that can be done:

1) Borrow and install another known working (no problems on current host system) PSU into your computer. If modular be sure to use only the cables that come with the modular PSU.

2) Test the PSU. If you have a multimeter and know how to use it (or know someone who does) you can check the voltages. Not a full test because the PSU is not under load. However, any voltages out of spec make the PSU suspect.

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

3) Continue looking at Reliability History/Monitor and Event Viewer. Increasing numbers of varying errors are a sign of a faltering/failing PSU.
 
Being four years old likely means that the PSU may be nearing its' designed in EOL (End of Life). Especially if there is a history of heavy gaming use, video editing, or even bitmining.

Things that can be done:

1) Borrow and install another known working (no problems on current host system) PSU into your computer. If modular be sure to use only the cables that come with the modular PSU.

2) Test the PSU. If you have a multimeter and know how to use it (or know someone who does) you can check the voltages. Not a full test because the PSU is not under load. However, any voltages out of spec make the PSU suspect.

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

3) Continue looking at Reliability History/Monitor and Event Viewer. Increasing numbers of varying errors are a sign of a faltering/failing PSU.
Thanks for the info!

I don't think my system has a history of heavy use. I have used it for gaming, but not daily and sometimes going long stretches of time without, and definitely haven't done any crypto-mining. I have powered on and off my system a lot, probably more than a typical user.

I unfortunately don't know anyone to borrow a PSU from, and don't have a multimeter nor the know-how to use one, and wouldn't trust myself not to electrocute myself and/or my computer components. I think I'd be better off just buying a new PSU and hoping that gets rid of the problem, if no other likely cause surfaces. In the meantime I'll watch out for increasing variance of errors like you said with the Reliability Monitor.
 
its only just 4 years old. Everything bought in 2020 doesn't suddenly become 4 years old just because its 2024 now... months exist. Pats his PC made in August 2020, you not that old yet (that and I replaced most of it last year).

Wait and see what dumps reveal. I can read them but I let Ubuysa as he understands them better than I can.
 
There is no issue with posting dumps. No personally identifying information is contained in a kernel dump or minidump.

The three dumps you uploaded may indicate a RAM problem. All are 0x1A MEMORY_MANAGEMENT bugchecks indicating a corrupt page table entry (PTE), however when we look at the affected PTE we see a couple of different things.

In two dumps the referenced page was referenced after it had already been freed. Here's one of the referenced PTEs...
Code:
 5: kd> !pte ffff97000005cc88
                                           VA 000000000b991000
PXE at FFFF974BA5D2E000    PPE at FFFF974BA5C00000    PDE at FFFF974B800002E0    PTE at FFFF97000005CC88
contains 0A0000040C50B867  contains 0A0000040C50C867  contains 1A0000041AFF7867  contains 0000000000040000
pfn 40c50b    ---DA--UWEV  pfn 40c50c    ---DA--UWEV  pfn 41aff7    ---DA--UWEV  not valid
                                                                                  Page has been freed
You can see at the bottom there that the page has been freed. In both of these dumps the bugcheck happened during address space cleanup as the memory used by the deleted address space was being freed up. This might be a RAM issue, we'd need to eliminate RAM first in any case, but it could potentially be a third-party driver issue.

The third dump also fails with a 0x1A bugcheck and a corrupt PTE, but the problem here is slightly different. The bugcheck happened whilst memory was being freed, but not during address space deletion. This was memory being freed by the application (address space) that allocated it. If we look at the PTE in this dump we can see that it was a large page...
Code:
5: kd> !pte ffffe5000c52ed20
                                           VA ffffe5000c52ed20
PXE at FFFF924924924E50    PPE at FFFF9249249CA000    PDE at FFFF924939400310    PTE at FFFF927280062970
contains 0A0000043F37F863  contains 0A0000043F37E863  contains 8A00000438C009E3  contains 0000000000000000
pfn 43f37f    ---DA--KWEV  pfn 43f37e    ---DA--KWEV  pfn 438c00    -GLDA--KW-V  LARGE PAGE pfn 438d2e
Memory pages are normally 4K in size but for some purposes it makes sense to allocate memory in larger units called large pages, these are typically 2MB in size but can be larger. It's the application that decides whether to use large pages or not and so I think the corruption in this PTE is more likely to be RAM related rather than a rogue third-party driver.

In any case you need to eliminate RAM as a cause before we look elsewhere.
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86, and do another four iterations.
This will find about 95% of RAM issues. Even a single error is a failure.

If Memtest86 finds no RAM problems we'll look at other options.
 
Alright, thanks. I did already run Memtest86+, but I'll run Passmark's Memtest86 like you asked just to be extra sure, and I'll run it again if the first run passes.

I created the boot drive using a different PC. I saw that there was a 14th test that could be enabled, but that it was experimental, so I left it off. I'm running the test now and will update with the results.

Is there potential memory errors that the Pro version would catch that the free version wouldn't?

Oh, and there was another BSOD this morning after startup, memory management again. I tend to turn my computer on and then leave it for a while before turning on my monitor and starting to use it, and as was the case before, when I turned on the screen I was greeted with the BSOD. I restarted the computer and immediately logged in with no issues. I'll post the dump file after the tests are done running, but I doubt it'll be much different from the other ones.
 
Ok then, the other possibility is a rogue driver that trashes the PTE and then ends so that we don;t get a BSOD until that page is referenced later. To try and catch this driver you need to enable Driver Verifier.

Driver Verifier subjects selected drivers (typically all third-party drivers) to extra tests and checks every time they are called. These extra checks are designed to uncover drivers that are misbehaving. If any selected driver fails any of the Driver Verifier tests/checks then Driver Verifier will BSOD. The resulting minidump should contain enough information for us to identify the flaky driver. It's thus essential to keep all minidumps created whilst Driver Verifier is enabled.

To enable Driver Verifier do the following:

1. Take a System Restore point and/or take a disk image of your system drive (with Acronis, Macrium Reflect, or similar). It is possible that Driver Verifier may BSOD a driver during the boot process (some drivers are loaded during boot). If that happens you'll be stuck in a boot-BSOD loop.

If you should end up in a boot-BSOD loop, boot the Windows installation media and use that to run system restore and restore to the restore point you took, to remove Driver Verifier and get you booting again. Alternatively you can use the Acronis, Macrium Reflect, or similar, boot media to restore the disk image you took.

Please don't skip this step. it's the only way out of a Driver Verifier boot-BSOD loop.

2. Start the Driver Verifier setup dialog by entering the command verifier in either the Run command box or in a command prompt.

3. On that initial dialog, click the radio button for 'Create custom settings (for code developers)' - the second option - and click the Next button.

4. On the second dialog check (click) the checkboxes for the following tests...
  • Special Pool
  • Force IRQL checking
  • Pool Tracking
  • Deadlock Detection
  • Security Checks
  • Miscellaneous Checks
  • Power framework delay fuzzing
  • DDI compliance checking
Then click the Next button.

5. On the next dialog click the radio button for 'Select driver names from a list' - the last option - and click the Next button.

6. On the next dialog click on the 'Provider' heading, this will sort the drivers on this column (it makes it easier to isolate Microsoft drivers).

7. Now check (click) ALL drivers that DO NOT have Microsoft as the provider (ie. check all third-party drivers).

8. Then, on the same dialog, check the following Microsoft drivers (and ONLY these Microsoft drivers)...
  • Wdf01000.sys
  • ndis.sys
  • fltMgr.sys
  • Storport.sys
These are high-level Microsoft drivers that manage lower-level third-party drivers that we otherwise wouldn't be able to trap. That's why they're included.

9. Now click Finish and then reboot. Driver Verifiier will be enabled.

Be aware that Driver Verifier will remain enabled across all reboots and shutdowns. It can only be disabled manually.

Also be aware that we expect BSODs. Indeed, we want BSODs, to be able to identify the flaky driver(s). You MUST keep all minidumps created whilst Driver Verifier is running, so disable any disk cleanup tools you may have.

10. Leave Driver Verifier running for 48 hours, use your PC as normal during this time, but do try and make it BSOD. Use every game or app that you normally use, and especially those where you have seen it BSOD in the past. If Windows doesn't automatically reboot after each BSOD then just reboot as normal and continue testing.

Note: Because Driver Verifier is doing extra work each time a third-party driver is loaded you will notice some performance degradation with Driver Verifier enabled. This is a price you'll have to pay in order to locate any flaky drivers. And remember, Driver Verifier can only test drivers that are loaded, so you need to ensure that every third-party driver gets loaded by using all apps, features and devices.

11. To turn Driver Verifier off enter the command verifier /reset in either Run command box or a command prompt and reboot.

Should you wish to check whether Driver Verfier is enabled or not, open a command prompt and enter the command verifier /query. If drivers are listed then it's enabled, if no drivers are listed then it's not.

12. When Driver Verifier has been disabled, navigate to the folder C:\Windows\Minidump and locate all .dmp files in there that are related to the period when Driver Verifier was running (check the timestamps). Zip these files up if you like, or not as you choose. Upload the file(s) to the cloud with a link to it/them here (be sure to make it public).
 
All right, thanks. I've created a system restore point, got a USB ready with Windows 11, and have driver verifier running. No boot loop occurred after restarting the system after enabling driver verifier. I did run a scheduled system update after the restore point and before turning on driver verifier.

but do try and make it BSOD. Use every game or app that you normally use, and especially those where you have seen it BSOD in the past.
Not sure quite how to do that, as all BSODs except one occurred without me doing anything except starting the PC and waiting a bit before sitting down to use it. The one that occurred while I was using the PC was the outlier with the different bugcheck info that hasn't occurred again since. Weird to say this, but hopefully I get a BSOD so the cause can be ascertained.