Question Seemingly random BSOD's

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Jan 28, 2020
39
4
45
Hey!

So I made a new build back in September, and have been getting what seems to be random BSOD's since. It varies, but it happens once or twice a day, on average. It might be a coincidence, but it does not seem to happen while gaming.

I have tried updating various drivers (realtek, video, LAN as well as every single one in device manager), windows update is up to date, BIOS is up to date, a couple of weeks ago i tried resetting windows, I've run sfc /scannow, I've run chkdsk, I've tried to run memetest86 but get blackscreen everytime (tried changing boot-options in BIOS, didn't help). Tried windows memory diagnostics instead with 10 passes, extended and cache on everything, which found nothing.
I've disabled fastboot, I've changed the system power settings.
XMP-profile was not set initially, but setting it didn't help either. I've tried moving the RAM-sticks from A2/B2 to A1/B1 and then to B2/A2.

I have not done any overclocking.

Hardware:

Motherboard: Gigabyte Aorus Elite x570
CPU: AMD Ryzen 7 3700x
GPU: AMD Radeon RX 5700 XT
SSD: Samsung SSD 970 EVO 1TB
RAM: G.Skill TridentZ F4-3600C17D-16GTZSW

Dump files: https://drive.google.com/open?id=1YKyVwygJvsA_dBkZIfgi3j9LVkPeHT5x
Memory-dump: https://drive.google.com/open?id=1_xu7cNbrr77cncze-VB4kwqVFvdu8CE3

I have tried a lot of things from various threads on various forums, and still feel no closer to a solution.

I'd be grateful if anyone here can pinpoint the error.
 
Last edited:
Jan 28, 2020
39
4
45
I'll give it a go. Would you also think I should avoid using the browser version of it? It's the VOIP of choice for my entire friend group, so ditching it completely would be an issue.

EDIT: Nevermind this, my headset can connect to both my phone and PC at the same time, so I have a workaround.
 
Last edited:
Jan 28, 2020
39
4
45
The intel-detection-software claims it doesn't find any intel drivers or software, which I find strange since it did identify Intel® I211 Gigabit Network Connection. I went to find it manually, but I'm not sure if I found the correct one: https://downloadcenter.intel.com/do...k-Adapter-Driver-for-Windows-10?product=36773
There's also this one, although they seem very similar, but with different productIDs: https://downloadcenter.intel.com/do...k-Adapter-Driver-for-Windows-10?product=64403

Sysnative results: https://1drv.ms/u/s!AnVqCzqrKBXsqaFwxjW5LqD-uYpwgA?e=ceQ5PW
 

Colif

Win 11 Master
Moderator
According to your system you have Intel I211 Gigabit Network Controller drivers dated 25/2/20 version number 12.18.9.6 - only result i can find of these is in Chinese, and its on the intel site. Seems they had problems with the LAN not working with those drivers - link - and the current version of windows.

note: date above might be creation date, not actual driver date. That driver was signed by Microsoft in November last year

Issued by: Intel External Issuing CA 7B
Issued to: Intel(R) INTELND1820
Revocation Status: OK
Serial number: 560000077b478c76c9afcafcaf00000000077b
Signing time: ‎Monday, ‎November ‎25, ‎2019 8:22:00 AM
Valid from: 8/9/2018 to 8/8/2020

Issued by: Microsoft Windows PCA 2010
Issued to: Microsoft Windows Hardware Compatibility Publisher
Revocation Status: OK
Serial number: 330000061929b7720a7076e4b4000000000619
Signing time: ‎Monday, ‎December ‎2, ‎2019 5:53:44 PM
Valid from: 2/20/2019 to 7/31/2020
 
Jan 28, 2020
39
4
45
I'm sorry if I'm being a little dense here, but what should my takeaway from your reply be? Go back to the driver suggested on the MB-page?

EDIT: Can now confirm Discord is not the culprit.
 
Last edited:

Colif

Win 11 Master
Moderator
MY answer seems to confirm you are using a MIcrosoft driver, not one from Intel directly but one which came as an update in win 10. I don't think trying too update it will fix error.

I could be chasing my tail, I am not always right in my deductions.

Have I suggested running driver verifer yet? it beats guessing what could be cause. See post 2 here - https://forums.tomshardware.com/thr...nclude-in-blue-screen-of-death-posts.3468965/

There are others around I linked to above but they aren't as regular as I would appreciate. But we are all volunteers here so I can't expect anything from them.
 
Jan 28, 2020
39
4
45
Ahh, right, gotcha!

You might have. I've had it running at some point at least, but I'll set it to run again when I boot up next time.

And again, thank you all for the time you're spending, it's much appreciated!
 

gardenman

Splendid
Moderator
I ran the dump file through the debugger and got the following information: https://unconqueredbaboon.htmlpasta.com/

File information:022920-17031-01.dmp (Feb 28 2020 - 19:24:45)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 13 Min(s), and 02 Sec(s)

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 11 Master
Moderator
Error mentioned non paged pool. Nonpaged pool is kernel memory which can't be paged out into the pagefile when Windows runs out of free physical memory. Its in ram. While it is an IRQ error its also a page fault. Page faults are not errors - badly named actions. They are necessary for virtual memory systems. So try running this on your ssd and run benchmarks, check its smart score and see if you have any firmware updates - https://www.samsung.com/semiconductor/minisite/ssd/download/tools/ (you want samsung Magician)

windows sees storage as memory as its where the page file is. Not getting a dump file after DV could also mean its the ssd as its not going to record an error if its the cause. PSU being less known could have (Maybe) damaged other parts. That is risk of cheap PSU. I learned that long ago, killed a few hdd along way. I could be wrong, but I am leaning towards hardware cause as drivers usually not this hard to find.

slaps DV for not producing a dump. It could mean it isn't a driver error.

Do you get any errors in safe mode?

You replaced RAM & PSU so far? Ram is on motherboard list now.
Newest BIOS on motherboard. Newest LAN drivers (not going there again)
 
Jan 28, 2020
39
4
45
I've added another dump to the previously linked folder.

Magician benchmark and SMART-score: https://1drv.ms/u/s!AnVqCzqrKBXsqaF60-MbdOuLbQJzdw?e=SDzlMU

Firmware is up to date.

I have yet to try in safe mode, but it might be a bitch to test out, since I've had four days of no crashes at least twice. I'll try to leave the computer running in safe mode overnight and while at work after the weekend.

I have indeed replaced RAM and PSU.
 
Jan 28, 2020
39
4
45
Scratch that.

Yes, I do get errors in safe mode.

EDIT: I've added the two safe-mode BSOD dumps to the Magician result folder linked in the previous post, in case they are of any use.
 
Last edited:

Colif

Win 11 Master
Moderator
i have to assume smart is okay since only one column shows OK, rest say NA.

I do hate problems that don't happen all the time. I lived with one for 6 months.

Errors in safe mode points to a hardware problem.

CD drive - Asus DRW 24F
Sound - AMD sound device? obviously onboard.
GPU - AMD Radeon RX 5700 XT (who made card?)
Motherboard: Gigabyte Aorus Elite x570
CPU: AMD Ryzen 7 3700x
SSD: Samsung SSD 970 EVO 1TB

what mouse/kb do you have?
any other USB devices?

i assume you have run memtest on new ram by now?
Tried Prime 95? https://www.mersenne.org/download/
 
Jan 28, 2020
39
4
45
GPU is a PowerColor Red Dragon
Keyboard is a CM Storm Devastator
Mouse is a Logitech G502 Lightspeed (Wireless)
I have a Steelseries Arctis Pro Wireless headset.
My monitors USB-ports are also connected to the system. They aren't in use, though.

I have not actually run Memtest on these sticks. I'll run it overnight when I go to bed tomorrow.

Very small sample size, but errors in safe mode seemed to happen way faster, and while not doing anything with the PC.
 

Colif

Win 11 Master
Moderator
Asked about GPU as was curious if was made by Asus. I heard they were problem cards.

Only asked about USB as my odd problem ended up being an old mouse, but I wasn't getting BSOD, you are. How old are they all?

3rd party drivers don't run in safe mode. So it removes software as generally windows built in drivers are rock solid (If you disregard the possibly buggy lan drivers)

which safe mode did you run? with or without networking?

Not sure why it would happen faster in safe mode... perhaps some of the AMD drivers slow down the error from happening in normal mode. Safe mode puts less stress on hardware, so its odd your PC crashes faster in it.
 
Jan 28, 2020
39
4
45
As mentioned, extremely small sample size, so it's probably a coincidence.

Safe mode without networking

Mouse is a few weeks older than the rig, so about six months.

Keyboard is as old as the old rig, so about five and a half years.

Headset is a little over a year.
 

gardenman

Splendid
Moderator
Please put new dumps into a new folder each time, or provide a link directly to the new dump file(s) each time. It makes it difficult for me to determine which files are old and which ones I've already done when they are all mixed up. Thanks.

Results from newest dump: https://asphericcatfish.htmlpasta.com/

File information:022920-9796-01.dmp (Feb 29 2020 - 08:27:07)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: chrome.exe)
Uptime:0 Day(s), 13 Hour(s), 01 Min(s), and 41 Sec(s)

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 11 Master
Moderator
My interpretation of that error is probably wrong... I saw an almost identical error yesterday and I thought at time it was GPU errors. A lot of your errors have made me look (every time) if it isn't an Nvidia GPU. As they look like the errors I see from Nvidia.

GPU drivers don't normally cause IRQ errors and yet I have seen a few recently. This error isn't pointing at a GPU driver.

I am probably reading this incorrectly - win32kbase!DirectComposition::CAnimationMarshaler::SetReferenceProperty

Direct Composition uses the GPU for flashy transitions but later on, the error itself happened in virtual memory.

How did memtest go? Did you run Prime 95 as it tests CPU & Memory (since the memory controller is on CPU)

only so many things deal with data. CPU, RAM, GPU ram (to a lesser degree), storage & PSU (cause it effects everything). I expect Dark Breeze didn't let you buy a cheap PSU so it shouldn't be it and you were having the errors before you swapped them.
 
Jan 28, 2020
39
4
45
Will do, gardenman.

I am planning on running Memtest tonight when I go to bed, and let it run until I get home from work tomorrow. Just to be sure since I see different recommendations in various places, but should I run Memtest86 or Memtest86+?

I ran Prime95 a while back on Darkbreezes recommendation, and it showed no errors. That was before the hardware changes, however. I'll run it tomorrow night before I go to bed.

He did not. As far as I understood, it's not top of the line, but should be solid.
 

Colif

Win 11 Master
Moderator
i don't know what difference is between the 2 memtest versions. @Darkbreeze probably knows more about that than I do.

So the last 2 dumps we read were in safe mode? as that doesn't seem right as it included 3rd party drivers in listing and if it was in safe mode, none should be listed. Least of all the GPU drivers.

shame its a Ryzen and we can't run it without the gpu. I still get the impression its the GPU involved. The 2 crashes in safe mode are what holds me back from suggesting running ddu.

I have to go but will look through thread tomorrow and see if I missed anything.
 
Short version - have you considered a faulty CPU?

Read below why I ask that, sorry for the wall of text, I have tried to make it easy to read, as I think it could be relevant.


I've been following this thread for a while, because it reminds me of a series of BSODs I had. But I wasn't too sure, I'm still not, if my experinces can actually contribute with anything.
But I'll try and share it anyways - it might be a bit long, but maybe it is easier to disregard if it sounds completely unlike what OP experiences, or maybe it sounds familiar and could be helpful - you never know.

I know OP has a Ryzen CPU, and mine is an Intel - but everything OP derscribes sounds very similar to what I experienced.

I built a completely new system, built from entirely new parts, except for the PSU, which was a 2 year old Corsair AX860 that didn't show any signs of being faulty with my previous build.

In the beginning I had a random BSOD here and there, but nothing that I really considered a problem, I felt it might be some driver issue - and there would be several days between the BSODs occuring.

But the BSODs kept comming, so after a while I did a clean re-install of Windows 10.

Still I had BSODs, so after a period of time with 4 complete re-installs, and several different driver version for various components, I felt there had to be something more seriously wrong.

The problem was, the BSODs primarily occured when I was away from the computer, or when it was under very light load. There were no real obvious indications to what it could be.

Some of the BSODs, but not all, shared the stop code 0x124 - which indicates hardware failure, but the dump files identified something new as the cause every time - which I later learned can point towards CPU error.

When analyzing the dump files, it pointed to every possible thing you can think of. Firefox, Samsungs NVMe drivers, ntoskrnl.exe, hal.dll, Nvidia drivers, LAN drivers... -and the list goes on and on.

I spent a lot of time chasing dead ends, trying different drivers for all the stuff the dump files indicated as the being the cause of the BSODs.

I finally realized, that when a certain file was singled out in the dump file, it actually never meant anything other than that program happened to be running at he time. (when it wasn't Firefox, it could be iTunes, VLC Player, Origin, explorer.exe or pretty much any program that happened to be running).

The most difficult bit was, that although some BSOD returned the same faulty items, most of them were completely random.

All stresstests and benchmarks passed error free.

I couldn't in any way provoke the BSODs while benchmarking, Prime95 both Small FTTs and Blend Test with or without AVX passed 12 hours of testing, OCCT ran for 20 hours error free, AIDA64 System Stability Test ran without problems for 12 hours.

Even Intel's own "Processor Diagnostic Tool" gave the CPU a "Pass" with no signs of error.

But I was beginning to suspect the CPU being faulty. But up to that point in time, I had only once in 20 years come across a faulty (factory new) CPU, so I was reluctant to conclude that it was a faulty CPU.

After 4 passes of MemTest86 it returned no errors. I bought the Pro version of MemTest86, which lets you run more than 4 passes, and after 8 passes I recieved a warning, but no errors.

Googling the warning, it was referred to as being something that could be ignored, a warning that can be the result of high frequency RAM, and isn't generally considered a big issue.

But I bought new RAM anyways, which didn't help.

Even a second set of new RAM didn't help.

Then I bought a new PSU, which didn't help.

So I bought a new motherboard, which didn't help either

I tried my previous graphics card, which didn't help.

I tried a different system drive, which didn't help.

Then I bought a dirt cheap 9th Gen i3 9100F CPU, and the errors went away.

Finally I contacted the store where I bought the CPU, and they recommeded RMA'ing it.

When I recieved a brand new CPU from Intel, no BSODs or errors ever returned, and it was almost as if it had only been a bad dream :p

I had kept all the new parts of the original setup, and I had the new parts bought for troubleshooting. So eventhough I was positive it was the CPU being faulty, I actually - out of my own curiosity - rebuilt the original setup, and the setup built from the troubleshooting parts, and both systems were stable as a rock, finally confirming that it actually WAS a faulty CPU, and nothing else.

I could have saved myself a lot of money for new hardware, if I had listened to my gut feeling concerning the faulty CPU instead of replacing so many things for no reson.

I'm sorry for how long my post is, I've tried to make it easily readable.

When I read this thread for the first time, I thought it might be a faulty CPU, but I have very little experience with Ryzen CPUs, so I decided I wouldn't be of much help.

But still, every time I return to this thread, it sound a lot like the process I went through with my faulty CPU.

It took me almost two months, from the first BSOD until I finally RMA'ed the CPU.

Although I couldn't identify 100% what was faulty on the CPU, everything pointed towards the integrated memory controller, but I can't say for sure.

What made this process so difficult, was that the system passed every single test designed to indentify instasbility or faulty hardware, with no hint of problems.

The best way to provoke the errors, was actually to leave the computer alone for a while.

So I might be completely wrong, but maybe you should consider if the CPU is the cause of all this.
 
Last edited:
Status
Not open for further replies.