Getting BSOD's - different error messages every time

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510
Hello!

I've started to get random BSOD's on my computer a few weeks back. They seem to have, in all cases but one, been triggered while running a game in fullscreen. A couple of times the computer also just froze, without any BSOD. The funny part is; I've gotten at least four different kinds of error messages on those BSODs. Here are the details:

BSOD error messages:

- BAD_POOL_HEADER
- SYSTEM_SERVICE_EXCEPTION
- IRQL_NOT_LESS_OR_EQUAL
- PFN_LIST_CORRUPT

I have suspected faulty RAM, and the GPU possibly overheating, but memtest ran through 6 passes with zero errors, and furmark got the GPU temperatures higher than any of the games - without any problems.

Here are four minidumps: Minidumps.zip

Here are my specs:

OS
MS Windows 7 Ultimate 64-bit SP1

CPU
AMD Phenom II X6 1090T 28 °C
Thuban 45nm Technology

RAM
8.0GB Dual-Channel DDR3 @ 668MHz (10-10-10-25)

Motherboard
ASUSTeK Computer INC. M4A87TD/USB3 (AM3) 42 °C

Graphics
LA2405 (1920x1200@60Hz)
AMD Radeon HD 6800 Series

Hard Drives
313GB SAMSUNG SAMSUNG HD321KJ ATA Device (SATA) 35 °C
977GB Western Digital WDC WD10EALX-229BA0 ATA Device (SATA) 38 °C
313GB SAMSUNG SAMSUNG HD321KJ ATA Device (SATA) 31 °C

Optical Drives
TSSTcorp CDDVDW SH-S203B ATA Device

Audio
M-Audio Fast Track Ultra
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
Hey eggsodus,

I've got a few questions for you:

1) What's you power supply, and how old is it? I'm not sure if this is the issue in your case but your PS information is always a good thing to know - but that's because I am in the camp that believes that (at least for a gaming/high-performance machine) your power supply is the most important part of your system.

2) What brand of memory and model number do you have? Is it overclocked (doesn't look like it, but more information is better).

3) What is your CPU's temperature when this happens? Have you monitored it under load? How's your case ventalation?

4) (Not applicable if you have not overclocked your processor) - The maximum CPU power supported by your MB is 140 W. Now I might be wrong here as I've never worked w/overclocking a Phenom x6. I hope that someone w/real world experience will chime in if I am incorrect about this:

I believe that the increase in power consumed by your processor is linear - ie if you increase the speed by 20% then you increase the power by 20%, and if you increase voltage on the processor then that also would linearally increase your power consumption. If so then if your processor uses 125 watts @stock speed and voltage, an increase to 3.6 Ghz (w/stock voltage) would increase your power consumption above 140 watts. I think - hopefully someone can confirm or correct this.

So, here are the troubleshooting steps that I would try:

A) If your case doesn't have good airflow, open it up (and test)! Also, this is a good time to make sure that your CPU cooler is not chocked with dust - if so, clean it up!

B) Disconnect any HDD's that you don't need to run these games and test. This is a step that is primarily testing the power supply more than the HDD's themselves...

C) Run the system @stock speeds and voltages for all devices and see if the errors still occur. If you still get an error, then any processor overclock (if you use one) is most likely not an issue, so the next step I'd take is:

D) Run w/one ram module at a time (best way) and see if the error occurs, though you could also just swap the positions of your RAM modules (weaker way).

E) If you have one, put in a low(er) power video card and test (one that doesn't require a direct connection to the PS would be best). I am pretty sure that you would need to lower game video resolutions and settings when testing, but I'm sure that you'll live through it! BTW, this step is really for helping to determine if you have power issues on your 12V rail(s) as opposed to a faulty video card. It's just a lot easier if you have another video card than pulling and replacing the power supply to test).


If the error still occurs at this point then I'd say that your problem is most likely either your Power Supply, or you have a motherboard issue. For your motherboard you could google/search for overheating issues (or anything else) w/one of the on-board chips (sorry, I'm not doing that right now).

For the power supply, that's usually my first suspect when I get inconsistent error messages like this - something happens where the power supply can no longer supply the required power on one of the rails - it's just often so much easier to test the other items first. IF you have another power supply of sufficient rated power, this would be the time to replace and test. BTW, even if a PS tester shows that it's voltage levels are correct, most PS test units just test voltage output because they are not capable of putting a load on the power supply to test the amperage output for each rail.

I have had rare instances where I'd get random error messages and BSOD's where a Power Supply's 3.3V and/or 5V rails were not able to supply sufficient power for a HDD, though if this is the case then there would be a good chance of having issues booting (HDD doesn't always spin up properly).


Eggsodus, I hope that this helps! Let us know what happens - hopefully it'll work out quickly, easily and cheaply for you!
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510
Can u u run chkdsk? Could be HDD related, since your RAM past.
Actually the chkdsk runs at startup every time the computer reboots with a BSOD, and I've also scheduled a run for it once without any errors.

--

For Paul, thanks for the lengthy response. Here are my answers:

1. My power supply is Fractal Design Newton R2 1000W - bought it this summer.

2. My memory is Kingston HyperX blu 1600MHz DDR3 CL9 XMP (two 2x2GB) - I have tried running them both at the stock CL9 speed, and also CL10. (Also, running at 1333MHz)

3. Here's a screenshot of HWmonitor while running Crysis: screenshot

4. I have in fact overclocked nothing (even though I have the typical components for it). Still, as you can see from the HWmonitor screenshot above, my CPU power does peak at ~141W. :eek:

One thing I'd also like to add, that the only change I can think of lately that could possibly have something to do with these errors, is that I disabled the pagefile at one point. But after a game required it, I turned it back on. Could it have any effect?

Also, do you think these errors are definitely related to hardware, and not by any chance software? The trickiest part is that I can't replicate the error. Even when running the games, they may happen, and may not happen. They also may happen almost immediately after starting the game and not after stressing the system out for long.

Thanks again for your response, I will continue to test out the things I can. (Already tested the airflow by opening the case on one side and using a big floor fan directly at it. XD Still got a BSOD though.)

ADDITION: The computer has also just rebooted itself a couple of times, without any BSOD. And one time my BIOS-settings were back on default.
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
Hey Eggsodus!!

Read a couple of reviews on your power supply - I certainly doubt that that's the issue! It gets very good reviews, and it supplies much more power than your current system needs (thumbs up to that). Efficient and some headroom for the future!

Likewise, you also are using RAM that should not be a problem - at least not from the standpoint of it's specs. It seems that you have gone out of your way to get components that won't be stressed by your configuration. My complements. QUESTION: in your last note you said that you had 2x2Gb modules - did you mean 2x4? Your first note said that you had 8Gb. I would still recommend testing by running tests w/ one module removed at a time - hove you tried this?

Also, you may want to test w/PRIME95 - it allows you to stress your CPU constantly, though I honestly don't know how it stresses RAM (if at all).

RE. your pagefile - I've removed mine (and replaced it) before w/o a problem. Removing it can speed things up but there would be a real problem if you ran out of RAM resources (but I've never done that, so I don't know exactly what would happen). Does it reside on a secondary HDD? If so you might want to try

to put it back onto your main hdd if you unplug the HDD's that you don't need attached for this test. When I troubleshoot a difficult-to-diagnose PC (which yours has become) I always try to run with as few components as possible - 1 hdd, no optical drives, etc.

Are your two samsungs in a RAID array for booting? I just noticed that they're the same...

RE: this being a S/W issue - It's plausible, but I doubt it. I'll try to look at your dump files later (I need to get some actual work for money done today) though. My experience with S/W issues has shown that the errors are more consistant - I think that "IRQL_NOT_LESS_OR_EQUAL" is actually a CPU (or at least a systemboard) error, but I need to do some research on your errors before I can make an exact statement. Also, the spontaneous re-boots don't bode well, but I saved the important part for last.

Your CPU power is exceeding the SPECS of your motherboard. This is not good. Although your CPU is rated at 1.4V I have an idea that it's running at a voltage of ~1.485-1.5V. My 3Ghz Athlon II X4 does the same in my msi MB - taking it all the way down to 1.4v did seem to cause some problems (as I recall). I don't know if the voltage is somehow set by AMD of if it's a BIOS issue for your MB. You COULD try updating your MB's BIOS - there's always some risk w/doing that as you never know if doing so will create new problems, so I'd leave that as a last option. I did a search on your MB and found nothing to worry about,

though it seems that I've read people talking about ASUS MB's and how some users have complained about voltage stepping issues - then again, MY human RAM may be faulty here...

So, here's the only really important part of my note:

1) I'd recommend going into BIOS and lowering CPU voltage by ~.02V (to ~1.47V if your CPU is currently ~1.49V) and then testing. This would lower your CPU's total WATTS consumed at MAX load by ~3.5 WATTS which would bring your CPU's wattage below your MB's max CPU power rating. This may cause a problem for your CPU but I'd doubt it (though going down to 1.40V may not work). Another possibility would be to disable a couple of cores of your CPU - if Crysis stresses all 6 cores to near of fully 100% usage this would lower power consumption (and unfortunately, performance, but that's what testing is all about).

2) If you haven't tested your RAM modules individually, please do so (unless you lower your voltage on your CPU first and that gets rid of your errors). Your RAM has great SPECS but bad RAM is bad RAM, and the more components that you can eliminate from the "potential problems", the easier it is to isolate the problem.

Let me know how it goes - I should have time later to research your dump files and see if there's any indication of S/W problems.

Paul
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510
A quick update before I'm off to work.

Yesterday I made a fresh Install of a new Windows 7 Home Premium x64 on a new Western Digital Caviar Blue SATA III -drive. I also installed Security Essentials instead of Avast. (some told my me some of my crashes were related to my old antivirus avast)

I left my dropbox syncing overnight, only to find out that the computer had crashed during the night - in idle state. I've attached a new minidump here. I will try lowering the CPU voltage when I get back to home from work.

This is really starting to get to me, I'd have a ton of work to do on my computer, but I I'm afraid to work with it due to the crashes. :/
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
Good idea to try reinstall. I've never had problems like that w/avast - ran it on XP Pro & Win7 ultimate, but that's just me...

I'll try taking a look at those dump files and see if I can figure anything out - I sure wish that there was someone who was better at deciphering those things than me trying to help out - perhaps someone bored @work this week will do so...
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510


Ah, sorry for being unclear. I have bought two 2x2Gb kits, so I do have four 2Gb modules. :) I have also ran memtest only with all the modules attached. Just to be clear, did you mean I should run the test so that I start with all and then remove one at a time, or start with one and add the rest. :)



The pagefile was always on the same HDD as the OS, and the two drives aren't set up for RAID at the moment, though I did have that in mind when I bought them. :)



I've already updated my BIOS to the newest version.



Will try these, but maybe even starting with the RAM, as I have in fact bought the other 2x2Gb kit after the initial components (though the crashes didn't start to occur immediately after installing the two new modules).
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510

Please don't. As that was merely an observation, not a generalization. This is just how it has been with my BSODs. :)
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
Eggsodus, After a quick look at your dump files I can tell you this:

1) All errors originate in ntsokrnl.exe. One is also associated with aswSnx.SYS (an avast file) but one is associated with dxgmms1.sys which is a DirectX file. So, I wouldn't say that these errors are caused by avast.

2) Your error meanings that I could find are (listed w/relevant bug check strings):

BAD_POOL_HEADER: The pool block header size is corrupt.

DRIVER_POWER_STATE_FAILURE: A device object has been blocking an IRP for too long a time.

MEMORY_MANAGEMENT: An unknown memory management error occurred.

SYSTEM_SERVICE_EXCEPTION: The main parameter code had no meaning but it gives this explanation - "This error has been linked to excessive paged pool usage and may occur due to user-mode graphics drivers crossing over and passing bad data to the kernel code."

PFN_LIST_CORRUPT: A driver attempted to free a page that is still locked for IO.

The three crashes caused by ntsokrnl.exe all were causes by the same memory address (ntsokrnl.exe+7cc40), and ALL 5 crashes occured at this address. Did you try testing your RAM by running one module at a time? If errors still occur I'd have to say that one of your RAM sticks is most likely suspect.

These errors all seem unrelated except possibly for the DirectX and video driver errors. If you tested each RAM module individually and have lowered your CPU voltage as I recommended, then you should strive to get the newest drivers for your video card, DirectX, on-board motherboard devices, etc. Since they do not seem related overall I'd still guess that it's a hardware issue but if you've reinstalled on a new hdd it's a great time to see if new drivers can help.

Please let me know what happens, and good luck.

 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510


Ah, thank you again for a quick and in-depth response Paul. It's definitely starting to look like one of the RAM-modules might in fact be bad. Going to test them one by one as soon as I get home from work. That would even be kind of the best case, as those are easy to replace both by warranty or by just purchasing new ones. :)
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
Just saw your reply - I started writing before you posted! To test your ram you can use two modules at a time - just be sure that neither of the first 2 modules are the second two for the first round of tests (ie test each pair separately). If you find a set of two that gives you errors and a set that does not then you can use one of the good ones (in the 2nd position is usually best) and try both of the ones that gave you an error in the first position - if RAM is the problem then probably only one is bad and you can isolate them this way.

Also, are all four modules the same model? I think so from what you said but if not you could have a memory mismatch - some modules require different voltages...

Bedtime - vry late - good luck!
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510
Update.

After testing the RAM-modules in pairs with 0 errors, I noticed I had messed up my fresh installation as the bootloader was still on the old HDD. XD

I did a new install and completely removed the old HDD, hoping it was the culprit - no crashes so far, but I really need to have an uptime of at least a week before I am convinced. :)

Also, regarding the CPU voltage setting you suggested Paul: I set the CPU and RAM clocking to manual instead of auto. Now HWMonitor shows the CPU power usage as a steady ~141W, and no problems, so I guess we can rule that out?

- Jani
 

Paul Tomato

Distinguished
Sep 17, 2011
15
0
18,520
If the CPU can be run at that voltage and the motherboard can handle the power then this can certainly be ruled out.

The thing that I do not like about errors like this can be hard to diagnose. I've worked on a few dead machines (wouldn't even POST) - when this happens I strip it down to power supply, MB & CPU only - power it up to see if it beeps. Then progressively add components one at a time to see what component keeps it from booting, only to spend a hour rebuilding it and it works just fine (so final diagnosis is at best static electricity buildup or dust). Then 4 months later I end up replacing RAM or a power supply...

Hopefully it'll work out OK - if you have no issues without the HDD that doesn't necessarily even mean the HDD is dead...

A buggy PC can be a frustrating thing (I've had my share)! Let me know how it works out.

Now I gotta see if anyone's posted about hotmail's apparent GIANT SNAFU...

Paul
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510

tulx

Distinguished
May 17, 2009
220
0
18,690
I think I'm experiencing something similar with my secondary pc. I (actually my girlfriend) was getting different BSOD messages all the time - sometimes it was memory management, other times some ATI driver file. At first I suspected RAM issues. I ran the windows memory diagnostic and it did indeed find some errors (when testing all four sticks together). When I tested the one by one, tests ran fine. After putting all four back in, no errors reappeared. I just ran Memtest86 for 9 passes and no errors occurred.
Now I suspect that it MIGHT have something to do with the video card/drivers. The card is a 6770 in a Asus M2N-SLI Deluxe. The Mobo has an nForce chip and is quite old. Might be some incompatibility.
 

eggsodus

Distinguished
Sep 17, 2011
9
0
18,510


Hi tulx!

I cannot say for sure, but in the end I believe my BSODs were due to the usb-ports on the mobo having a some sort of meltdown - figuratively speaking. I ended up changing my motherboard after everything else and haven't had a problem since!