Question Strange PC Crashes? (No BSOD)

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
After installing a refurbished motherboard from Amazon, I've been having strange crashes (though I might have had them before, as I was watching a YouTube video and it froze up, gave a BSOD for about a second and then crashed out of that). One crash caused the computer to shut itself down and reboot to a black screen, which I needed to hard reboot out of in order to get back to Windows. The most recent one froze up the computer entirely with no BSOD, requiring a reboot. After this, I updated my chipset drivers, and while waiting to see if it worked, I wanted to ask around to see if this is a hardware issue, software issue, or sounds familiar to a solved issue.
For reference, here's my Speccy: http://speccy.piriform.com/results/wnECGTVxL7xeOF7c4cjDKGB
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
Do you reinstall Windows after the change?
I did not, however one of the bigger parts that concerns me is, beyond the system freezing, that it crashed out of MemTest, which I don't know if should be attributed to the Hammer Test or general system instability. Is it possible that a previously installed copy of Windows could cause freezing in the manner that I've been seeing it, though?
 

Karadjgne

Titan
Ambassador
Crashing like that is usually a driver conflict. When you say you updated the chipset drivers, do you mean the gpu drivers (for some odd reason this is the only driver most ppl ever update) or the motherboard chipsets from MSI support? If it's the chipset, which ones? Audio, Lan, pcie, USB family, Sata, all of them?

Have you used ccleaner or other and gone through and fully cleaned up the pc, getting rid of orphans and other such stuff in temp files and used a registry tool and cleaned that out too?
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
Crashing like that is usually a driver conflict. When you say you updated the chipset drivers, do you mean the gpu drivers (for some odd reason this is the only driver most ppl ever update) or the motherboard chipsets from MSI support? If it's the chipset, which ones? Audio, Lan, pcie, USB family, Sata, all of them?

Have you used ccleaner or other and gone through and fully cleaned up the pc, getting rid of orphans and other such stuff in temp files and used a registry tool and cleaned that out too?
When I said chipset, yeah, I meant Audio, LAN, PCIE, USB, and SATA, all done through the tool provided by AMD. I haven't updated the graphics driver, though, as I'd previously updated that prior to changing out the board. I also haven't gone through with CCleaner to clean the registry and getting rid of orphans, but that seems like something I should do.
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
What slots is the ram in? (1 next to cpu, 4 furthest away)
When using registry tool, say 'Yes' to backup, and just run it at default levels.
RAM was in slots 2 and 4, but after the subsequent crash and freeze, I shifted them over to 1 and 3, which the motherboard warned me weren't optimal slots, but I might as well try to see if it gives me any more stability.
 

Karadjgne

Titan
Ambassador
Yeah, those are secondary slots, if anything they'll make performance and stability worse. Should stick with 2/4.

If your bios has it listed, check geardown mode (GDM), it's usually enabled by default, but with Cas15 timings, it doesn't look that way, or they'd be registering as 16-15-15-39 instead.
 

Karadjgne

Titan
Ambassador
Op has 2x8Gb gskill 2400, most likely Aegis or RipJaws V. Programs (and now windows) read the Jedec tables which use Data Rate ( DDR4 2400 is dual data rate 2400, so Data Rate is 1200MHz) which reads the correct Data Rate, not the Dual amount.

Sided is wrong. That's single or dual rank, which is the way the ic's are stacked and used on the DIMM. Channel is the way the DIMMs are stacked on the mobo. Single Rank ram is where data coming into the ram has full access to the entire set of ic's in series, so for 8Gb, it'd start in the first IC and roll on down to the last. In dual rank, it's like sli, there's 2 sets of single rank, so for 8Gb you've got access to only 4Gb, then it flips to the other 4Gb, back and forth. Basically a parallel setup. And even that doesn't determine sides. That's on the vendor/OEM, a 1Gb IC is cheaper than a 2Gb or 4Gb IC, so may use 8x 1Gb ic's (8 on 1x side) vs 4x 2Gb ic's (4 on 1x side) or may have a contract with using 16x 512Mb (8 per side) instead.
 
Last edited:

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
Yeah, those are secondary slots, if anything they'll make performance and stability worse. Should stick with 2/4.

If your bios has it listed, check geardown mode (GDM), it's usually enabled by default, but with Cas15 timings, it doesn't look that way, or they'd be registering as 16-15-15-39 instead.
It doesn't look like GDM is enabled, otherwise it'd be running at 15-15-15-39, right?
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
Op has 2x8Gb gskill 2400, most likely Aegis or RipJaws V. Programs (and now windows) read the Jedec tables which use Data Rate ( DDR4 2400 is dual data rate 2400, so Data Rate is 1200MHz) which reads the correct Data Rate, not the Dual amount.
Close, but the G.Skill RAM is Flare X, which, if it's a faulty component, may be my fault for going with a relatively cheaper set as opposed to something known to work like the Corsair Vengeance kit.
 

Karadjgne

Titan
Ambassador
No, the FlareX is designed specifically for compliance with Ryzens, the Corsair Vengeance isn't and required an AGESA microcode updates ever since Ryzen 1st gen to get them to work right. It's very decent ram, regardless of price point.

GDM allows the RAM to use a clock that’s one half the true DRAM frequency for the purposes of latching (storing a value) on the memory’s command or address buses. This conservative latching can potentially allow for higher clockspeeds, broader compatibility, and better stability. But to do so requires some things. Notibly it forces bios to ignore the set command rate (the 1T or 2T at the end of the primary timings) and attempts to keep the command rate at 1T if possible. So it's flexible for best stability. In order to accomplish this, GDM needs to set an 'even' Cas like 14 or 16, because it cannot run at one half of an 'odd' number like Cas15. With GDM enabled, you'd see your ram listed as 16-15-15-39, not the Jedec set 15-15-15-39.
 
Last edited:

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
No, the FlareX is designed specifically for compliance with Ryzens, the Corsair Vengeance isn't and required an AGESA microcode updates ever since Ryzen 1st gen to get them to work right. It's very decent ram, regardless of price point.

GDM allows the RAM to use a clock that’s one half the true DRAM frequency for the purposes of latching (storing a value) on the memory’s command or address buses. This conservative latching can potentially allow for higher clockspeeds, broader compatibility, and better stability. But to do so requires some things. Notibly it forces bios to ignore the set command rate (the 1T or 2T at the end of the primary timings) and attempts to keep the command rate at 1T if possible. So it's flexible for best stability. In order to accomplish this, GDM needs to set an 'even' Cas like 14 or 16, because it cannot run at one half of an 'odd' number like Cas15. With GDM enabled, you'd see your ram listed as 16-15-15-39, not the Jedec set 15-15-15-39.
As a note, I did go through and run the Registry cleaner, cleaning about 450 items over the course of 3 runs. As I said, though, due to the nature of issues like this, I kind of just have to wait to see if it crashes again in the mean time. Otherwise, is there any other information that would be good to know with issues like these, like if they're generally attributed to hardware/software, if these can come about due to chipset reasons, etc?
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
If you wouldn't mind, go into windows startup, administration, Event Viewer and looking for any red flags, especially if you've hit a very recent bsod. Those are critical errors and windows will log them if it's able.
Well, that's one of the issues, because the crashes are either full system freezes or go to BlueScreen and THEN crash again, it never generates a clear dump report. The two most useful events (I think) are that, in the case of the freeze, the Kernel Power event said that the system's shutdown at 10:07 was unexpected, even though when the system was frozen the clock read 10:11, and in the case of the BlueScreen,just saying that a dump file could not be generated.
 

Karadjgne

Titan
Ambassador
A bsod is a critical error, and should register as a red flagged critical error every time. About the only thing it'll generate is that error with an error code, it'll say something like nt-kernal error 0x0041 or some such. Being a critical system error it won't be able to create a dump file, those are usually from sudden shutdowns or freezes in programs where windows can point a finger and say 'hey! This thing here messed up'.

Sudden shutdown is slightly different, that's when windows is working, and then isn't. So it cannot generate anything other than to notice that it stopped mid sentence, and will only notice that on restart of windows.

It's like if your hand quit working, you could look at your hand and it'd register in your head that your hand is no longer doing what it should be doing, but if your brain quit working, you'd not have a clue until you woke up in the hospital.
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
A bsod is a critical error, and should register as a red flagged critical error every time. About the only thing it'll generate is that error with an error code, it'll say something like nt-kernal error 0x0041 or some such. Being a critical system error it won't be able to create a dump file, those are usually from sudden shutdowns or freezes in programs where windows can point a finger and say 'hey! This thing here messed up'.

Sudden shutdown is slightly different, that's when windows is working, and then isn't. So it cannot generate anything other than to notice that it stopped mid sentence, and will only notice that on restart of windows.

It's like if your hand quit working, you could look at your hand and it'd register in your head that your hand is no longer doing what it should be doing, but if your brain quit working, you'd not have a clue until you woke up in the hospital.
With that in mind, I guess I should elaborate on every time Windows has stopped working in the past week.

In addition, to contextualize something, I had been having BlueScreens a few months back, which were probably caused by Windows updating chipset drivers on its own, but for the time, were seemingly resolved when only using one stick of RAM as opposed to two. Whether this was a RAM, driver, or Motherboard issue, I wasn't sure, so up until I installed my new MB, I was using a single stick of RAM.

1. Before I put in my new Motherboard (around a week ago), I was watching a YouTube video when Windows froze up, including sound. This crash didn't throw a BlueScreen, and restarted the computer on its own after about 10 seconds.
1.5. When installing the Motherboard, I ran MemTest86 in order to check the RAM. I was surprised when it threw many errors in the Hammer Test, but evidently people said this is normal, especially after having gotten a message that said my RAM may be susceptible to high-frequency bit flipping. However, the other issue cropped up here when MemTest unexpectedly shut off in the middle of the second Hammer Test, and I don't know whether this should be hugely concerning or not.
  1. After putting in the new Motherboard, with both sticks of RAM, after around 10 hours of uptime, my system threw a BlueScreen. However, before it could give me an error code or create a dump file, the system restarted on its own to a black screen (essentially, all I saw of the BSOD was the sad face before it cut out), which required me to reboot to get back to Windows.
  2. After about 48 hours of uptime after this, I went to shower, and this is the point where Windows was frozen up completely. A reboot was necessary, and after this point is when I switched DIMM slots, cleaned the registry, and updated the chipset drivers, and that brings me to where I am now, which is watching and waiting to see if the system goes down again. Like I said, due to the nature of these issues, I basically have to wait to see if anything is going wrong.
 

Karadjgne

Titan
Ambassador
Ok then lol. That was a handful, but o guess I asked for it 😅
Row hammering is a design vulnerability that an attacker can exploit to write to memory that it would normally not be permitted to write to. Although it has been demonstrated, it's unreliable and virtually impossible to exploit in the wild; all proof-of-concepts have been carefully crafted and are run in laboratory conditions.
From everything I've read on row hammer bit flipping, all DDR4 is susceptible and you'll get that warning every time you run the latest memtest since that includes a row hammer test.
Ultimately, the researchers also carried out successful Rowhammer attacks against other Crucial- and Micron-branded DDR4 modules, as well as DIMMs from Geil. Interestingly, DIMMs from G.Skill were able to withstand the tests.
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
Ok then lol. That was a handful, but o guess I asked for it 😅

From everything I've read on row hammer bit flipping, all DDR4 is susceptible and you'll get that warning every time you run the latest memtest since that includes a row hammer test.
So I heard after I was asking about it, which is why I figured people were telling me it was normal to error the hell out of that test lol. Anything to dissect out of the other crashes, like out of MemTest or the BlueScreen?
 

Karadjgne

Titan
Ambassador
The ram situation was odd from the start. Usually when removing 1 stick and problems go away, it's a decent indication of 1 of 2 things. Either the socket is bad, or the removed stick is bad. Then you changed mobo's. So my only guess would be to try each stick individually in A2 and run the memtest on each. If you get the same results from both, then either both are bad or both are good, depending on the results. Likelihood of both failing is pretty small, which could point to a bad cpu, or maybe there's not enough voltage on the system agent/memory controller. Maybe try manually changing the Cas to 16.

Or maybe even the psu isn't putting out enough stable voltages when the cpu goes into lower power modes.

Are you running Ryzen Master? Maybe try going back in bios revisions to the last revision before 3rd gen cpus were included, I've heard rumors that there was some stuff dumped to make room for 3rd gen stuff/fixes.

Windows on Balanced power mode?
 

The_Gen_Eric

Reputable
Aug 23, 2016
59
0
4,530
The ram situation was odd from the start. Usually when removing 1 stick and problems go away, it's a decent indication of 1 of 2 things. Either the socket is bad, or the removed stick is bad. Then you changed mobo's. So my only guess would be to try each stick individually in A2 and run the memtest on each. If you get the same results from both, then either both are bad or both are good, depending on the results. Likelihood of both failing is pretty small, which could point to a bad cpu, or maybe there's not enough voltage on the system agent/memory controller. Maybe try manually changing the Cas to 16.

Or maybe even the psu isn't putting out enough stable voltages when the cpu goes into lower power modes.

Are you running Ryzen Master? Maybe try going back in bios revisions to the last revision before 3rd gen cpus were included, I've heard rumors that there was some stuff dumped to make room for 3rd gen stuff/fixes.

Windows on Balanced power mode?
Wall of text again, so I apologize in advance.

I guess I'll dial it back a bit further to my previous issues; while I'm still waiting to see if this random stop happens again, it can never hurt to give more information.

I bought my parts around June of this Summer, and everything seemed to work fine up until the Fall Windows Creators Update, at which point it began to throw often, and seemingly random BlueScreens. I don't know if it was SPECIFICALLY caused by a Windows update, but it only occurred after this update. The BlueScreens it threw almost always threw blame at ntsokrnl (the system kernel) or nvlddmkm (NVIDIA driver files), which apparently isn't too uncommon for 1060 owners, but uncommon enough that there doesn't seem to be a clear answer. However, because the errors it threw commonly had to do with memory, I decided to run MemTest86. It errrored out hard on Test 1 only, and only when both sticks were in. Otherwise, after around 4 passes on each stick, in each slot, it seemed like each stick tested fine individually (which didn't end up giving me errors until the first crash I talked about before installing my board).

However, after settling on the one-stick solution, I decided to go back and see which drivers were updated when the Creators Update came around, and the only significant one that came up was the AMD RYZEN chipset. This is why I'm so speculative on if the chipset, specifically the PCI/E one, could have caused these issues, potentially causing the Graphics Card to miscommunicate with the RAM. I decided to not jeopardize it and leave it with the single stick as long as it ran stabilly.

However, the two that concern me the most are the MemTest crash and the crash out of BlueScreen. If Windows wasn't actually running when these two hit the fan, it worries me that the issues may be on a hardware level as opposed to a software level, but as you can tell, I'm not the most educated on issues like this.