Question Seemingly random BSOD's

Status
Not open for further replies.
Jan 28, 2020
39
4
45
Hey!

So I made a new build back in September, and have been getting what seems to be random BSOD's since. It varies, but it happens once or twice a day, on average. It might be a coincidence, but it does not seem to happen while gaming.

I have tried updating various drivers (realtek, video, LAN as well as every single one in device manager), windows update is up to date, BIOS is up to date, a couple of weeks ago i tried resetting windows, I've run sfc /scannow, I've run chkdsk, I've tried to run memetest86 but get blackscreen everytime (tried changing boot-options in BIOS, didn't help). Tried windows memory diagnostics instead with 10 passes, extended and cache on everything, which found nothing.
I've disabled fastboot, I've changed the system power settings.
XMP-profile was not set initially, but setting it didn't help either. I've tried moving the RAM-sticks from A2/B2 to A1/B1 and then to B2/A2.

I have not done any overclocking.

Hardware:

Motherboard: Gigabyte Aorus Elite x570
CPU: AMD Ryzen 7 3700x
GPU: AMD Radeon RX 5700 XT
SSD: Samsung SSD 970 EVO 1TB
RAM: G.Skill TridentZ F4-3600C17D-16GTZSW

Dump files: https://drive.google.com/open?id=1YKyVwygJvsA_dBkZIfgi3j9LVkPeHT5x
Memory-dump: https://drive.google.com/open?id=1_xu7cNbrr77cncze-VB4kwqVFvdu8CE3

I have tried a lot of things from various threads on various forums, and still feel no closer to a solution.

I'd be grateful if anyone here can pinpoint the error.
 
Last edited:

gardenman

Splendid
Moderator
Hi, I ran the dump files through the debugger and got the following information: https://transeuntvideo.htmlpasta.com/
File information:012820-5750-01.dmp (Jan 28 2020 - 14:06:32)
Bugcheck:KERNEL_SECURITY_CHECK_FAILURE (139)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 2 Hour(s), 50 Min(s), and 55 Sec(s)

File information:012820-17812-01.dmp (Jan 28 2020 - 15:49:20)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 52 Min(s), and 18 Sec(s)

File information:012820-14625-01.dmp (Jan 28 2020 - 14:56:13)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 19 Min(s), and 23 Sec(s)

File information:012720-5765-01.dmp (Jan 27 2020 - 17:18:13)
Bugcheck:DRIVER_OVERRAN_STACK_BUFFER (F7)
Probably caused by:memory_corruption (Process: System)
Uptime:1 Day(s), 9 Hour(s), 40 Min(s), and 47 Sec(s)

File information:012620-6250-01.dmp (Jan 26 2020 - 07:36:39)
Bugcheck:APC_INDEX_MISMATCH (1)
Probably caused by:memory_corruption (Process: chrome.exe)
Uptime:0 Day(s), 11 Hour(s), 09 Min(s), and 05 Sec(s)

File information:MEMORY.DMP (Jan 28 2020 - 15:49:20)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process: System)
Uptime:0 Day(s), 0 Hour(s), 52 Min(s), and 18 Sec(s)
Possible Motherboard page: https://www.gigabyte.com/us/Motherboard/X570-AORUS-ELITE-rev-10/support#support-dl
It appears you have the latest BIOS already installed, as you said.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 11 Master
Moderator
Almost refreshing to find a BSOD I can't blame on Nvidia drivers. I am helping someone who is having problems with G Hub so it could be cause (but then I am using it too (but that doesn't prove anything))

I hate how Gigabyte sort the memory list, we supposed to know what family CPU is part of (even if we don't own one). Lucky I remembered its Matisse.

these are the G Skill ram types for CPU and motherboard it has on site, at the ram speed.
3600 G.SKILL 4GB F4-3600C17Q-16GVK SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C17Q-32GVK SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C16D-16GVK SS 16-16-16-36 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C16Q-32GFX SS 16-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C19Q-32GSXW SS 19-19-19-39 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C17Q-32GTZR SS 17-18-18-38 1.35v V V V v 2133
3600 G.SKILL 8GB F4-3600C18D-16GTZRX SS 18-22-22-42 1.35v V V v 2133

yours according to system - F4-3600C17-8GTZSW - can't find on actual G SKill page, but i know it exists.

Stats - 16-16-16-32-48-1 (tCAS-tRC-tRP-tRAS-tCS-tCR)

It could be a simple matter of ram not being right for motherboard. I can't see any obvious drivers in the results.
 

Colif

Win 11 Master
Moderator
after looking aty other guys problem with G Hub, his isn't similar. His appears to be a case of having G Hub & LGS both trying to run same hardware. Its dumb that G Hub doesn't uninstall LGS on install, so if you didn't notice you can have both running. I doubt thats problem here.

Ryzen can be picky about ram, especially if its not in list. I know codes change but none of them in list even have same speed ranges, so it could be the timings.
 
Not necessarily. Incompatible memory kits CAN work, but they often are at the mercy of other factors. No two systems are EVER exactly the same. Ever.

The memory kits are not the same. Ever.

The CPU is not the same. Ever.

The motherboard is not the same. Ever.

What they are, is similar. Same model, means "similar", not same. Two different memory kits can't always achieve the same results when overclocking OR when being used together, and by the same token they can't always achieve the same results when being used with other hardware. In fact, unless they came off the same production run and have serial numbers that are in series with each other, and possibly not even then, they may not even be compatible with each other. If they didn't come from the same production run they might not even use the same components to make up the module, because at different points in time the manufacturer may change up the makeup of the model BUT keep the model number because technically the overall aspect, being the speed and timings, primarily remains the same. That does not however mean that they "are the same".

As seen here:

https://forums.tomshardware.com/threads/amd-ram-compatibility.3210050/#post-19785792

Probably, if they work on his system they OUGHT to work on yours, in theory, but what makes sense in theory does not always prove truthful in reality. Also, there may be differences in your silicon, meaning, your CPUs. No two CPUs are exactly the same either, hence the "silicon lottery". Some are better than others. One may be stronger than another. One might be more capable at any given clock speed than another and in fact be able to run at that clock speed with a higher margin of stability WITH a lower amount of voltage. Lot's of variables that COULD be in play. I'm not able to say that they ARE, however, the basic rule of thumb is that if the memory kit is not listed on the MEMORY MANUFACTURERS compatibility lists, you are going to chase ghosts and are setting yourself up for a bad day. It's as simple as that.

Yes, memory, any memory that is the right "kind" of memory, CAN work on any given board given a adept enough tinkerer and enough time, usually. Whether it can run at the rated speed and timings, or with all the sticks in the kit, is another story.

As far as memtest is concerned, Memtest is a good basic indicator of problems with the physical memory. It can tell you if the memory ITSELF is screwed up or not, AND it can sometimes tell you if there are problems with the configuration. There are however a lot of things it can't tell you and this is highlighted by the fact that there are additional tests provided in the PAID version of Memtest86 that are not included in the free version. Obviously, if there were not OTHER problems that could exist that the free version does not detect there would be no NEED for the additional tests OR a paid version.

Fortunately there are OTHER ways to test the memory further, although it is STILL not an absolutely conclusive end result. I've seen, MANY times, memory that passed ALL software testing but the problem was only resolved by replacement of the memory kit with a new kit. It's just that simple. You can however try running the Windows extended memory diagnostic test AND this test, outlined below.



Final testing with Prime95

It is highly advisable that you do a final test using Prime95 WITH AVX and AVX2 disabled, and run a custom configured Blend test. You can also use the Blend mode option as is, but after a fair amount of personal testing, asking questions from some long time members with engineering level degrees that have forgotten more about memory architectures than you or I will ever know, and gathering opinions from a wide array of memory enthusiasts around the web, I'm pretty confident that the custom option is a lot more likely to find errors with the memory configuration, and faster, if there are any to be found.

Please note as this is rather important, if you prefer, or have problems running version 26.6 because you have a newer platform that doesn't want to play nice with version 26.6, you can use the latest version of Prime95 with the Custom test selected but you will need to make the following change.

In the bottom of the Torture test selection popup menu there will be some options for disabling AVX. I recommend that you do so, not because we are doing thermal testing and require a steady state workload (Which AVX wouldn't affect anyhow, as Computronix explained to me), but because the last thing you need during memory testing is having to worry about CPU temperatures, and you will, with AVX enabled.

So, uncheck the option for AVX2. That will un-gray the option for AVX, and uncheck that box as well.

Now open Prime95.

Click on "Custom". Input a value of 512k in the minimum FFT size field. Leave the maximum FFT size field at 4096k. In the "Memory to use" field you should take a look at your current memory allocation in either HWinfo or system resource monitor. Whatever "free" memory is available, input approximately 75% of that amount. So if you currently have 16GB of installed memory, and approximately 3GB are in use or reserved leaving somewhere in the neighborhood of 13GB free, then enter something close to 75% of that amount.

So if you have 13GB free, or something reasonably close to that, then 75% of THAT would be 9.75GB, which, when multiplies times 1024 will roughly equal about 9984MB. You can average things out by simply selecting the closest multiple of 1024 to that amount just to keep it simple, so we'll say 10 x 1024= 10240mb and enter that amount in the field for "Memory to use (MB)". We are still well within the 13GB of unused memory BUT we have left enough memory unused so that if Windows decides to load some other process or background program, or an already loaded one suddenly needs more, we won't run into a situation where the system errors out due to lack of memory because we've dedicated it all to testing.

I've experienced false errors and system freezes during this test from over allocating memory, so stick to the method above and you should be ok.


Moving right along, do not change the time to run each FFT size. Leave that set to 15 minutes.

Click run and run the Custom test for 8 hours. If it passed Memtest86 and it passes 8 hours of the Custom test, the memory is 100% stable, or as close to it as you are ever likely to get but a lot of experts in the area of memory configuration suggest that running the extended Windows memory diagnostic test is also a pretty good idea too.

If you get errors, (and you will want to run HWinfo alongside Prime95 so you can periodically monitor each thread as Prime will not stop running just because one worker drops out, so you need to watch HWinfo to see if there are any threads not showing 100% usage which means one of the workers errored and was dropped) then you need to either change the timings, change the DRAM voltage or change the DRAM termination voltage, which should be approximately half of the full DRAM voltage.

There are also other bios settings that can affect the memory configuration AND stability, such as the SOC, VCCIO and system agent voltages, so if you have problems with stability at higher clock speeds you might want to look at increasing those slightly. Usually, for Intel at least, something in the neighborhood of 1.1v on both those is pretty safe. There are a substantial number of guides out there covering those two settings, but most of them are found within CPU overclocking guides so look there in guides relevant to your platform.

As a further measure of assurance that your WHOLE configuration is stable, you can download and run Realbench for 8 hours. If the system freezes or fails when running Realbench with your full memory amount set, try running it again but select only half your amount of installed memory.


There might be something else that you find useful here, IDK.

 
  • Like
Reactions: Colif and DMAN999
Move your memory modules to the second and fourth slots, A2 and B2, and then LEAVE them there. There are NO other population combinations that are correct. This is the two DIMM population assignment for ALL, ALL four DIMM dual channel DDR4 (And MOST if not ALL four DIMM dual channel DDR3) motherboard architectures on consumer platforms. In other words, every single motherboard you could use your CPU and memory on that accepts four memory modules recommends using those same two DIMM slots no matter what they CALL the DIMM slots. Even the ASUS X570 boards, where they have idiotically reversed the location of the single DIMM population rule location STILL has the same two DIMM population rule which is the second and fourth slots over from the CPU socket.

So, put them there, and leave them there. If they do not work there, then something ELSE is the problem.

Other things that you can and should check for are a CPU cooler that is not evenly tightened, which can affect the way the CPU rides in the socket and of course anything related to the CPU can affect both memory and BSOD occurrences.

Bent CPU pins.

Motherboard standoffs that are located in standoff locations in the case portion of the motherboard tray that do not correspond with a matching mounting hole in the motherboard itself.

The WRONG standoffs for the CPU cooler mounting hardware. As seen here:

https://forums.tomshardware.com/thr...ual-channel-ram-problem.3483682/post-21058497

A weak or POS power supply.

What is the EXACT model of your power supply and how LONG has it been in service?
 

DMAN999

Dignified
Ambassador
As per dumps:
BIOS
VENDOR: American Megatrends Inc.
VERSION: F11
DATE: 12/06/2019

He has it already. Thanks for the inputs anyway :)
Good catch, that was just a Hail Mary. ;)
Since the RAM kit he has isn't listed as being compatible (by G.Skill or Gigabyte) the OP's best bet is to sell (or return) that kit and buy a kit that is listed as compatible.
Or
He can spend untold hours trying to adjust the speed and timings in the BIOS in the hope that he can get this kit stable.

Personally I would sell this kit (F4-3600C17D-16GTZSW) and get a compatible kit to save myself the headache.
This kit should work out of the box:
F4-3600C16D-16GVKC
https://www.gskill.com/configurator...524715126&chipset=1562634988&model=1562637486

I have that (F4-3600C16D-16GVKC) kit (it has Hynix DJR IC's) in my Asus ROG Strix X470-F / 3700x rig running at 3733 and it si 100% stable.
 
Compatibility is the most probable reason.

I'm wondering though, when this system was built back in September, was a CLEAN install of the OS done or does this Windows installation predate the build? Resetting Windows is not the same as a clean install because a Reset simply puts things back to how they were when the installation was originally installed, which is not the same as how the installation would be configured upon the first installation with new hardware.

If a CLEAN install has never been done since the new build was done, that should be the very FIRST thing that gets done before you go ANY further, and even if you DID a clean install originally, it might be a good idea to do it again as something you've installed since then might be responsible for the BSODs you are experiencing and even uninstalling it might not remove the problem it has created in the registry in some cases.

 
Last edited:
Get that PSU out of there. It's an absolute piece of trash.

It used to be a tier four unit on our old tier list, which hasn't even been around for at least two or three years, and was on there for many years prior to that. In fact, the FSP Raider was on the fourth tier of the Dottorent tier list for as long as I've been a member of Tom's so at least five or six years minimum. Reviews of the FSP raider 750w unit are from 2012 so that unit was probably manufactured about 7-8 years ago which means that the capacitors in it are likely 8 or 9 years old on top of being a poor quality unit to begin with.

I would absolutely replace it, immediately, with a good quality unit. You really only need a 550w unit, but I'd give myself some headroom and get a high quality 650w unit because these cards are known to experience spikes, which is likely where your BSODs are coming from, maybe. Could also be that the spikes are pulling down voltage and causing memory errors.

This should help with finding a good model.



Personally I'd suggest that based on cost, your bang for the buck is likely going to be a Corsair RMx, Antec Earthwatts Gold or maybe a Corsair TX unit but sales and rebates as well as your region often affect pricing so using PC part picker is the best way usually to find a good price on one of the better units.
 
If you don't mind my asking, where ARE you from, as I might be able to better guide you towards a good unit if I know? Any of those units is fine, I'd definitely go for the 2018 version of the RMx over the older version though. Either the RMi or RMx are better than the RM, which has an inferior capacitor selection and an inferior fan compared to the other two.

The Earthwatts Gold is a fine unit too. Not as good as the RMx, but better by FAR than what you had.


This is what our old tier list, that was fairly accurate as compared to ALL other existing or defunct tier lists, had to say. The old HardOCP review for the FSP Raider said it was a POS too, but that review is gone now because all of the old HardOCP reviews are lost now that the entire site has been taken down.

Looking at it's companions there, the quality of it's company on that level of the tier list was extremely poor. Not the worst, as they weren't tier five, but all these units listed below were just really bad power supplies. Any time I come across one of these they go straight into the trash can.



Tier Four - Not for overclocking systems or high end gaming rigs. May not even output labeled power and fail standard ATX specifications slightly. May even use cheap components to meet a price
Aerocool
GT Series 500s 700s 1050s
Integrator 600w
Strike-X series 800w 1100w
Templarius Imperator series

E-power units
Firepower Fatal1ty 2013
FSP Raider series
InWin Glacier
LC power
LEPA MX-F1 series (Trigger-happy Overcurrent protection and very poor quality capacitors)

NZXT
Hale82N 650w
Hale82 V2 700w

Thermaltake Smart series[/b]
Zalman US Fanless 400w (HardOCP reports some poor quality issues at such a high price)
 
Last edited:
Either of them are good. The only REAL difference is that the RMi integrated the use of the Corsair link for some internal monitoring while the RMx does not have this feature. Mostly, this is not a necessary feature as it is not present on any but a handful of power supplies across the industry.

This looks like your best choice right now.


PCPartPicker Part List

Power Supply: Corsair RMx (2018) 650 W 80+ Gold Certified Fully Modular ATX Power Supply (873.00kr @ Alternate)
Total: 873.00kr
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2020-02-06 18:53 CET+0100
 
I think it's about a 50/50 chance to be honest. But if the PSU is not the problem, I think it makes it a lot easier to figure out what it IS from that point on though. Plus, given what you had as a power supply, it was really a necessity anyhow and you can sleep better at night, well, I can anyhow, knowing that your machine isn't likely to burn up anytime soon due to the PSU.
 
As per my good friend and fellow moderator Computronix:

Memory specifications and overclocking can be very deceiving. If you're unaware, here's the formula for "True Latency":

1 / Frequency (not DDR) x Latency = True Latency (nanoseconds).

Stock 3200 @ 14 is faster than Stock 3600 @ 16:

1 / 1.600GHz x 14 = 8.75nS
1 / 1.800GHz x 16 = 8.89nS

Stable Overclock with 3733 @ 16 is faster:

3733 @ 16 is 1 / 1.867GHz x 16 = 8.57nS

So I'd go with the CL 14 3200mhz kit, which in fact is a Samsung B-die kit anyhow, meaning it is a terrific memory kit using the absolute best memory chips on the market. Regardless that it is Ripjaws and not Trident Z or Dominator Platinum, they are still superb memory modules and should actually be faster than the 3600mhz kit in terms of actual latency, which is where the rub lies anyhow.
 
Jan 28, 2020
39
4
45
Hey both, thanks for your replies!

I'll try uninstalling G-Hub when I get home. No harm in trying, I guess.

A friend of mine has the same build, and while he has had some issues (stuttering while gaming, iirc), he doesn't get BSOD's. But hey, computers be computers, I guess.

If I had a compatible pair of RAM-sticks to test out I would, but it's also a decent chunk of money to throw at it if it turns out not to be the issue. But at this point I guess it's a small price to pay for my sanity. I'll look into it.
 
Jan 28, 2020
39
4
45
I just double checked my receipt, and I will double check the box when I get home if I still have it, but the ones are paid for are these: https://www.gskill.com/product/165/168/1536220507/F4-3600C17D-16GTZSW-Overview
I take it the main difference is that the system recognizes the individual sticks, whereas g.skill has them paired up?
The timings are also different though.

The box (And XMP in BIOS) also says that the ones linked are what I have.

Additionally: I got Memtest running, and after 9 passes over ~18 hours I got 0 errors. If I understood Memtests documentation correctly, it should give errors on uncompatibility, right?
 
Status
Not open for further replies.