Question Seemingly random BSOD's

Status
Not open for further replies.
Jan 28, 2020
39
4
45
Hey!

So I made a new build back in September, and have been getting what seems to be random BSOD's since. It varies, but it happens once or twice a day, on average. It might be a coincidence, but it does not seem to happen while gaming.

I have tried updating various drivers (realtek, video, LAN as well as every single one in device manager), windows update is up to date, BIOS is up to date, a couple of weeks ago i tried resetting windows, I've run sfc /scannow, I've run chkdsk, I've tried to run memetest86 but get blackscreen everytime (tried changing boot-options in BIOS, didn't help). Tried windows memory diagnostics instead with 10 passes, extended and cache on everything, which found nothing.
I've disabled fastboot, I've changed the system power settings.
XMP-profile was not set initially, but setting it didn't help either. I've tried moving the RAM-sticks from A2/B2 to A1/B1 and then to B2/A2.

I have not done any overclocking.

Hardware:

Motherboard: Gigabyte Aorus Elite x570
CPU: AMD Ryzen 7 3700x
GPU: AMD Radeon RX 5700 XT
SSD: Samsung SSD 970 EVO 1TB
RAM: G.Skill TridentZ F4-3600C17D-16GTZSW

Dump files: https://drive.google.com/open?id=1YKyVwygJvsA_dBkZIfgi3j9LVkPeHT5x
Memory-dump: https://drive.google.com/open?id=1_xu7cNbrr77cncze-VB4kwqVFvdu8CE3

I have tried a lot of things from various threads on various forums, and still feel no closer to a solution.

I'd be grateful if anyone here can pinpoint the error.
 
Last edited:

gardenman

Splendid
Moderator
Hi, I ran the dump files through the debugger and got the following information: https://transeuntvideo.htmlpasta.com/
File information:012820-5750-01.dmp (Jan 28 2020 - 14:06:32)
Bugcheck:KERNEL_SECURITY_CHECK_FAILURE (139)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 2 Hour(s), 50 Min(s), and 55 Sec(s)

File information:012820-17812-01.dmp (Jan 28 2020 - 15:49:20)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 52 Min(s), and 18 Sec(s)

File information:012820-14625-01.dmp (Jan 28 2020 - 14:56:13)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 19 Min(s), and 23 Sec(s)

File information:012720-5765-01.dmp (Jan 27 2020 - 17:18:13)
Bugcheck:DRIVER_OVERRAN_STACK_BUFFER (F7)
Probably caused by:memory_corruption (Process: System)
Uptime:1 Day(s), 9 Hour(s), 40 Min(s), and 47 Sec(s)

File information:012620-6250-01.dmp (Jan 26 2020 - 07:36:39)
Bugcheck:APC_INDEX_MISMATCH (1)
Probably caused by:memory_corruption (Process: chrome.exe)
Uptime:0 Day(s), 11 Hour(s), 09 Min(s), and 05 Sec(s)

File information:MEMORY.DMP (Jan 28 2020 - 15:49:20)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 52 Min(s), and 18 Sec(s)
Possible Motherboard page: https://www.gigabyte.com/us/Motherboard/X570-AORUS-ELITE-rev-10/support#support-dl
It appears you have the latest BIOS already installed, as you said.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 11 Master
Moderator
Almost refreshing to find a BSOD I can't blame on Nvidia drivers. I am helping someone who is having problems with G Hub so it could be cause (but then I am using it too (but that doesn't prove anything))

I hate how Gigabyte sort the memory list, we supposed to know what family CPU is part of (even if we don't own one). Lucky I remembered its Matisse.

these are the G Skill ram types for CPU and motherboard it has on site, at the ram speed.
3600 G.SKILL 4GB F4-3600C17Q-16GVK SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C17Q-32GVK SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C16D-16GVK SS 16-16-16-36 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C16Q-32GFX SS 16-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C19Q-32GSXW SS 19-19-19-39 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C17Q-32GTZR SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C18D-16GTZRX SS 18-22-22-42 1.35v V V v 2133

yours according to system - F4-3600C17-8GTZSW - can't find on actual G SKill page, but i know it exists.

Stats - 16-16-16-32-48-1 (tCAS-tRC-tRP-tRAS-tCS-tCR)

It could be a simple matter of ram not being right for motherboard. I can't see any obvious drivers in the results.
 
Jan 28, 2020
39
4
45
Hey both, thanks for your replies!

I'll try uninstalling G-Hub when I get home. No harm in trying, I guess.

A friend of mine has the same build, and while he has had some issues (stuttering while gaming, iirc), he doesn't get BSOD's. But hey, computers be computers, I guess.

If I had a compatible pair of RAM-sticks to test out I would, but it's also a decent chunk of money to throw at it if it turns out not to be the issue. But at this point I guess it's a small price to pay for my sanity. I'll look into it.
 

Colif

Win 11 Master
Moderator
after looking aty other guys problem with G Hub, his isn't similar. His appears to be a case of having G Hub & LGS both trying to run same hardware. Its dumb that G Hub doesn't uninstall LGS on install, so if you didn't notice you can have both running. I doubt thats problem here.

Ryzen can be picky about ram, especially if its not in list. I know codes change but none of them in list even have same speed ranges, so it could be the timings.
 
Jan 28, 2020
39
4
45
I just double checked my receipt, and I will double check the box when I get home if I still have it, but the ones are paid for are these: https://www.gskill.com/product/165/168/1536220507/F4-3600C17D-16GTZSW-Overview
I take it the main difference is that the system recognizes the individual sticks, whereas g.skill has them paired up?
The timings are also different though.

The box (And XMP in BIOS) also says that the ones linked are what I have.

Additionally: I got Memtest running, and after 9 passes over ~18 hours I got 0 errors. If I understood Memtests documentation correctly, it should give errors on uncompatibility, right?
 
Jan 28, 2020
39
4
45
1: Wouldn't memtest produce errors if incompatiblity is the issue at hand?

2: A friend of mine has the exact same setup, and does not have this issue. Shouldn't he if the components are incompatible?
 
Not necessarily. Incompatible memory kits CAN work, but they often are at the mercy of other factors. No two systems are EVER exactly the same. Ever.

The memory kits are not the same. Ever.

The CPU is not the same. Ever.

The motherboard is not the same. Ever.

What they are, is similar. Same model, means "similar", not same. Two different memory kits can't always achieve the same results when overclocking OR when being used together, and by the same token they can't always achieve the same results when being used with other hardware. In fact, unless they came off the same production run and have serial numbers that are in series with each other, and possibly not even then, they may not even be compatible with each other. If they didn't come from the same production run they might not even use the same components to make up the module, because at different points in time the manufacturer may change up the makeup of the model BUT keep the model number because technically the overall aspect, being the speed and timings, primarily remains the same. That does not however mean that they "are the same".

As seen here:

https://forums.tomshardware.com/threads/amd-ram-compatibility.3210050/#post-19785792

Probably, if they work on his system they OUGHT to work on yours, in theory, but what makes sense in theory does not always prove truthful in reality. Also, there may be differences in your silicon, meaning, your CPUs. No two CPUs are exactly the same either, hence the "silicon lottery". Some are better than others. One may be stronger than another. One might be more capable at any given clock speed than another and in fact be able to run at that clock speed with a higher margin of stability WITH a lower amount of voltage. Lot's of variables that COULD be in play. I'm not able to say that they ARE, however, the basic rule of thumb is that if the memory kit is not listed on the MEMORY MANUFACTURERS compatibility lists, you are going to chase ghosts and are setting yourself up for a bad day. It's as simple as that.

Yes, memory, any memory that is the right "kind" of memory, CAN work on any given board given a adept enough tinkerer and enough time, usually. Whether it can run at the rated speed and timings, or with all the sticks in the kit, is another story.

As far as memtest is concerned, Memtest is a good basic indicator of problems with the physical memory. It can tell you if the memory ITSELF is screwed up or not, AND it can sometimes tell you if there are problems with the configuration. There are however a lot of things it can't tell you and this is highlighted by the fact that there are additional tests provided in the PAID version of Memtest86 that are not included in the free version. Obviously, if there were not OTHER problems that could exist that the free version does not detect there would be no NEED for the additional tests OR a paid version.

Fortunately there are OTHER ways to test the memory further, although it is STILL not an absolutely conclusive end result. I've seen, MANY times, memory that passed ALL software testing but the problem was only resolved by replacement of the memory kit with a new kit. It's just that simple. You can however try running the Windows extended memory diagnostic test AND this test, outlined below.



Final testing with Prime95

It is highly advisable that you do a final test using Prime95 WITH AVX and AVX2 disabled, and run a custom configured Blend test. You can also use the Blend mode option as is, but after a fair amount of personal testing, asking questions from some long time members with engineering level degrees that have forgotten more about memory architectures than you or I will ever know, and gathering opinions from a wide array of memory enthusiasts around the web, I'm pretty confident that the custom option is a lot more likely to find errors with the memory configuration, and faster, if there are any to be found.

Please note as this is rather important, if you prefer, or have problems running version 26.6 because you have a newer platform that doesn't want to play nice with version 26.6, you can use the latest version of Prime95 with the Custom test selected but you will need to make the following change.

In the bottom of the Torture test selection popup menu there will be some options for disabling AVX. I recommend that you do so, not because we are doing thermal testing and require a steady state workload (Which AVX wouldn't affect anyhow, as Computronix explained to me), but because the last thing you need during memory testing is having to worry about CPU temperatures, and you will, with AVX enabled.

So, uncheck the option for AVX2. That will un-gray the option for AVX, and uncheck that box as well.

Now open Prime95.

Click on "Custom". Input a value of 512k in the minimum FFT size field. Leave the maximum FFT size field at 4096k. In the "Memory to use" field you should take a look at your current memory allocation in either HWinfo or system resource monitor. Whatever "free" memory is available, input approximately 75% of that amount. So if you currently have 16GB of installed memory, and approximately 3GB are in use or reserved leaving somewhere in the neighborhood of 13GB free, then enter something close to 75% of that amount.

So if you have 13GB free, or something reasonably close to that, then 75% of THAT would be 9.75GB, which, when multiplies times 1024 will roughly equal about 9984MB. You can average things out by simply selecting the closest multiple of 1024 to that amount just to keep it simple, so we'll say 10 x 1024= 10240mb and enter that amount in the field for "Memory to use (MB)". We are still well within the 13GB of unused memory BUT we have left enough memory unused so that if Windows decides to load some other process or background program, or an already loaded one suddenly needs more, we won't run into a situation where the system errors out due to lack of memory because we've dedicated it all to testing.

I've experienced false errors and system freezes during this test from over allocating memory, so stick to the method above and you should be ok.


Moving right along, do not change the time to run each FFT size. Leave that set to 15 minutes.

Click run and run the Custom test for 8 hours. If it passed Memtest86 and it passes 8 hours of the Custom test, the memory is 100% stable, or as close to it as you are ever likely to get but a lot of experts in the area of memory configuration suggest that running the extended Windows memory diagnostic test is also a pretty good idea too.

If you get errors, (and you will want to run HWinfo alongside Prime95 so you can periodically monitor each thread as Prime will not stop running just because one worker drops out, so you need to watch HWinfo to see if there are any threads not showing 100% usage which means one of the workers errored and was dropped) then you need to either change the timings, change the DRAM voltage or change the DRAM termination voltage, which should be approximately half of the full DRAM voltage.

There are also other bios settings that can affect the memory configuration AND stability, such as the SOC, VCCIO and system agent voltages, so if you have problems with stability at higher clock speeds you might want to look at increasing those slightly. Usually, for Intel at least, something in the neighborhood of 1.1v on both those is pretty safe. There are a substantial number of guides out there covering those two settings, but most of them are found within CPU overclocking guides so look there in guides relevant to your platform.

As a further measure of assurance that your WHOLE configuration is stable, you can download and run Realbench for 8 hours. If the system freezes or fails when running Realbench with your full memory amount set, try running it again but select only half your amount of installed memory.


There might be something else that you find useful here, IDK.

 
  • Like
Reactions: Colif and DMAN999
Jan 28, 2020
39
4
45
Thank you for the detailed response.

I'd just like to start out by saying, I'm not at all certain it's not a compatibility issue, I'd just very much like to be fairly certain before spending another ~$200.

I've started prime95 and will leave it on overnight.
I've already run Windows Memory Diagnostics extended testing for 10 passes with no errors.

I'll give Realbench a shot aswell if Prime95 doesn't produce any errors

Again, thank you so much for your time. Should all these come back error-free I'll just eat the cost, and see if I know anyone who could use a pair of memory sticks.
 
Move your memory modules to the second and fourth slots, A2 and B2, and then LEAVE them there. There are NO other population combinations that are correct. This is the two DIMM population assignment for ALL, ALL four DIMM dual channel DDR4 (And MOST if not ALL four DIMM dual channel DDR3) motherboard architectures on consumer platforms. In other words, every single motherboard you could use your CPU and memory on that accepts four memory modules recommends using those same two DIMM slots no matter what they CALL the DIMM slots. Even the ASUS X570 boards, where they have idiotically reversed the location of the single DIMM population rule location STILL has the same two DIMM population rule which is the second and fourth slots over from the CPU socket.

So, put them there, and leave them there. If they do not work there, then something ELSE is the problem.

Other things that you can and should check for are a CPU cooler that is not evenly tightened, which can affect the way the CPU rides in the socket and of course anything related to the CPU can affect both memory and BSOD occurrences.

Bent CPU pins.

Motherboard standoffs that are located in standoff locations in the case portion of the motherboard tray that do not correspond with a matching mounting hole in the motherboard itself.

The WRONG standoffs for the CPU cooler mounting hardware. As seen here:

https://forums.tomshardware.com/thr...ual-channel-ram-problem.3483682/post-21058497

A weak or POS power supply.

What is the EXACT model of your power supply and how LONG has it been in service?
 

DMAN999

Honorable
Ambassador
As per dumps:
BIOS
VENDOR: American Megatrends Inc.
VERSION: F11
DATE: 12/06/2019

He has it already. Thanks for the inputs anyway :)
Good catch, that was just a Hail Mary. ;)
Since the RAM kit he has isn't listed as being compatible (by G.Skill or Gigabyte) the OP's best bet is to sell (or return) that kit and buy a kit that is listed as compatible.
Or
He can spend untold hours trying to adjust the speed and timings in the BIOS in the hope that he can get this kit stable.

Personally I would sell this kit (F4-3600C17D-16GTZSW) and get a compatible kit to save myself the headache.
This kit should work out of the box:
F4-3600C16D-16GVKC
https://www.gskill.com/configurator...524715126&chipset=1562634988&model=1562637486

I have that (F4-3600C16D-16GVKC) kit (it has Hynix DJR IC's) in my Asus ROG Strix X470-F / 3700x rig running at 3733 and it si 100% stable.
 
Compatibility is the most probable reason.

I'm wondering though, when this system was built back in September, was a CLEAN install of the OS done or does this Windows installation predate the build? Resetting Windows is not the same as a clean install because a Reset simply puts things back to how they were when the installation was originally installed, which is not the same as how the installation would be configured upon the first installation with new hardware.

If a CLEAN install has never been done since the new build was done, that should be the very FIRST thing that gets done before you go ANY further, and even if you DID a clean install originally, it might be a good idea to do it again as something you've installed since then might be responsible for the BSODs you are experiencing and even uninstalling it might not remove the problem it has created in the registry in some cases.

 
Last edited:
Jan 28, 2020
39
4
45
Right, so..

Prime95 gave no errors after ~8.5 hours, but as you might have alluded to, the amount of tests (~320) varied from worker to worker. Not sure if I should interpret that as errors or not.

The memory sticks are in A2/B2. I tried moving them to the other slots at one point as part of the troubleshooting, but moved them back (and swapped them) when it didn't change anything.

I'll open it up when I get the chance and check the CPU-cooler.

Now, the power supply.. It's a leftover from the old rig, which means we're closing in on 5 and a half years. Just checked my receipt, and all I have on it there is "FSP 750W". I'll check the unit itself when I open up the case. It should be powerful enough, but no idea whether or not it's garbage. We did also clean it, where I guess we might have screwed it up. Not at all opposed to replacing this. If you have any recommendations, I'm all ears.

We did a clean install of windows back in September, and I then did a reset in i think early January. I'll look into doing a clean installation again.


I'll probably also go to an electronics store nearby and make sure they'll let med return memory I've used in case a new set of sticks doesn't solve it. While I'd like to save the money, I'd also like to not spend an awful lot of time tinkering with BIOS, should the above options fail ;)
 
Get that PSU out of there. It's an absolute piece of trash.

It used to be a tier four unit on our old tier list, which hasn't even been around for at least two or three years, and was on there for many years prior to that. In fact, the FSP Raider was on the fourth tier of the Dottorent tier list for as long as I've been a member of Tom's so at least five or six years minimum. Reviews of the FSP raider 750w unit are from 2012 so that unit was probably manufactured about 7-8 years ago which means that the capacitors in it are likely 8 or 9 years old on top of being a poor quality unit to begin with.

I would absolutely replace it, immediately, with a good quality unit. You really only need a 550w unit, but I'd give myself some headroom and get a high quality 650w unit because these cards are known to experience spikes, which is likely where your BSODs are coming from, maybe. Could also be that the spikes are pulling down voltage and causing memory errors.

This should help with finding a good model.



Personally I'd suggest that based on cost, your bang for the buck is likely going to be a Corsair RMx, Antec Earthwatts Gold or maybe a Corsair TX unit but sales and rebates as well as your region often affect pricing so using PC part picker is the best way usually to find a good price on one of the better units.
 
Jan 28, 2020
39
4
45
Damn, didn't imagine it was that bad :D

I can get the Corsair RMi on sale for $140, which is $15 more than the RMx at the moment. Would you reckon that's worth it?
Alternatively I could get the Antec Earthwatts Gold for around $110, $30 less than the RMi.

Sidenote: the RMx seems to have a regular and a 2018 version. Does that make any difference?


And if the prices seems off, I'm not from the US, which I imagine might make a difference.

EDIT: Redid a sentence that might have been confusing
 
Last edited:
Status
Not open for further replies.