MEMORY_MANAGEMENT BSODs, accompanied by occasional app crashes/GPUTDR.

Status
Not open for further replies.

square-synth

Distinguished
Jan 14, 2012
6
0
18,510
System Specs
===================
PSU: Corsair HX 850w
MB: Gigabyte H55-USB3
CPU: Core i7 875k unlocked
GPU: Galaxy Geforce GTX 460
HD: OCZ Vertex2 SSD
RAM: 4x2GB Corsair XMS3
OS: Win7 64bit

Hey guys,

I've been having BSOD issues with my PC every several months. This may not make any sense but seems that if i reboot/powercycle enough times the PC will hit its "sweet spot" and run without any issues for weeks or months. But then when I have to reboot, it will be sketchy until another several reboots/powercycles/reseating parts. I had to move around my desk a couple days ago and since then it started acting up again, and my usual plug/unplug antics do not seem effective. I didn't make any driver updates that I know of before this, and it was running fine under heavy use for weeks before.

When it is acting flaky and ready to BSOD, I will get random application crashes, whether it's games crashing to desktop or Firefox crashing. I'm sometimes even getting temporary video card failure - screen goes black for 5 seconds and alert comes up that my GPU failed and successfully recovered (TDR I believe). Whenever these crashes happen a BSOD will usually soon follow. I see from the other sticky thread that Nvidia is having these TDR issues with most recent versions but I never started getting them until it's accompanied by BSOD. The last thing I want is my RAM to crap out half way through a driver reinstallation so I'm trying to to worry about the GPU too much unless someone has a valid suggestion.

I ran memtest86 overnight for over 8 hours and there were zero errors. Back in June, I replaced the motherboard from an intel DP55KG, as it had a bad RAM slot according to a PC shop tech, it had BAD_POOL_HEADER and ACCESS_VIOLATION BSODs and at that time memtest actually showed errors. I'm dreading there has always been an issue with my CPU (or maybe the BSODs on this mobo are completely different cause?). On the DP55KG i used the RAM's XMP profile which overclocks my processor slightly to work at 1600mhz, an intel rep tried to tell me that this could have damaged my processor, but I've never seen it over 36 degrees. Since getting the new MB I put the RAM on the default settings (1333mhz I believe) but want to hold of on experimenting with the BIOS settings, especially after the clear memtest.

BluescreenView reports are below. I had another M_M crash today but it did not save the minidump for some reason.

011212-11029-01.dmp
2012-01-12 9:52:15 PM
===================
Bug Check String : MEMORY_MANAGEMENT
Bug Check Code: 0x0000001a
Parameter 1: 00000000`00041284
Parameter 2: 00000000`fff85001
Parameter 3: 00000000`00000000
Parameter 4: fffff700`01080000
Caused By Driver: ntoskrnl.exe
Caused By Address: ntoskrnl.exe+705c0
File Description: NT Kernel & System
Product Name: Microsoft® Windows® Operating System
Company: Microsoft Corporation
File version: 6.1.7600.16841 (win7_gdr.110622-1503)
Processor: x64
Crash Address: ntoskrnl.exe+705c0


011312-10483-01.dmp
2012-01-13 8:35:00 PM
===================
Bug Check String : MEMORY_MANAGEMENT
Bug Check Code: 0x0000001a
Parameter 1: 00000000`00041790
Parameter 2: fffffa80`00e1a890
Parameter 3: 00000000`0000ffff
Parameter 4: 00000000`00000000
Caused By Driver: ntoskrnl.exe
Caused By Address: ntoskrnl.exe+705c0
File Description: NT Kernel & System
Product Name: Microsoft® Windows® Operating System
Company: Microsoft Corporation
File version: 6.1.7600.16841 (win7_gdr.110622-1503)
Processor: x64
Crash Address: ntoskrnl.exe+705c0

Any insight is greatly appreciated and I will try to be prompt on getting any additional information or troubleshooting steps that might help. Thanks.
 
Solution
After reading the post three times, I'm leaning toward a RAM problem over a GPU problem. While TDR points squarely at the GPU, the fact normal applications are randomly crashing points squarely at RAM.

Parameter 1: 00000000`00041284

0x41284: A PTE or the working set list is corrupted.

Parameter 1: 00000000`00041790

Other
An unknown memory management error occurred. [IE: An undocumented code...]

So it looks like something is corrupting your RAM [in the first case, the Page Table Entry is corrupted], rather then a RAM failure itself. And given the TDR problems, its possible the GPU/GPU driver is at fault. Do you have another GPU to test with?

Also, might want to run a chkdsk.

square-synth

Distinguished
Jan 14, 2012
6
0
18,510


I could try installing Windows on my backup HD, but what makes you suspect this? Is there any other test I can do to confirm?
 

square-synth

Distinguished
Jan 14, 2012
6
0
18,510
No more BSODs since removing Microsoft Essentials, on Saturday. I noticed in event viewer it tended to have a lot of errors around the same times as crashes. replaced with AVG. However, still getting random TDR while browsing firefox and windows.

I am just getting around to backing up all my data to do a full reformat on both my SSD and secondary HDD. i'm going to install Windows on the HDD to both test for crashes and also to be able to update the Vertex 2's firmware, as it was running 1.25.

Before I go through with the reformats i'm going to do some quick testing by rolling back video card drivers (even though 285.62 were fine for weeks), tweaking memory timings to XMP (and thus overclocking it a bit, but cooling isn't an issue), maybe run prime95 to force BSODs to get more minidumps to look at.

any of these a bad idea, and any other opinions on those bluescreen reports?
 
After reading the post three times, I'm leaning toward a RAM problem over a GPU problem. While TDR points squarely at the GPU, the fact normal applications are randomly crashing points squarely at RAM.

Parameter 1: 00000000`00041284

0x41284: A PTE or the working set list is corrupted.

Parameter 1: 00000000`00041790

Other
An unknown memory management error occurred. [IE: An undocumented code...]

So it looks like something is corrupting your RAM [in the first case, the Page Table Entry is corrupted], rather then a RAM failure itself. And given the TDR problems, its possible the GPU/GPU driver is at fault. Do you have another GPU to test with?

Also, might want to run a chkdsk.
 
Solution

square-synth

Distinguished
Jan 14, 2012
6
0
18,510
No TDRs since upgrading from Nvidia 285.62 to their beta 290.xx drivers. Still no bluescreens since removing Security Essentials. I have run chkdsk and it found no errors. I also uninstalled a lot of the extra Gigabyte software that came with my motherboard.

I will see how it runs until the weekend, if it's smooth until then I'm going to try a few reboots to test stability, as the system might be on one of it's "good" bootups by fluke (as I mentioned, it runs fine for weeks/months running 24/7 before acting up after a reboot).
 

square-synth

Distinguished
Jan 14, 2012
6
0
18,510
Hey guys, sorry for late update.

This ended up being a RAM problem. Although initial memtest passed, i tried it again right after another bluescreen and it showed errors right away. it seemed like memtest would only pass if the computer was "cold" but doing it right after prime95 or a gaming session it usually failed. i removed 2 DIMMs and it ran fine for several days - memtested overnight for awhile and passed every time. no app crashes or BSODs.

to get back up to 8gb RAM i replaced the 2 remaining with Gskill Ripjaws 4gb sticks that are designed for H55 chipsets at 1333mhz, they have been working fine as well.

My GPU TDR frequency went down significantly but it still happens in rare situations (once when applying an overclock, big whoop). not correlated to temperature or anything. i think it's just a driver issue that Nvidia will eventually sort out, and my poor stability somehow exacerbated it.
 
Status
Not open for further replies.