Question BOSD after BSOD after BSOD

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

RainOfPain125

Honorable
Feb 24, 2017
125
0
10,680
https://drive.google.com/open?id=1-YFeHtM1MdsrAsiV6aFVN4a8h26lult6
https://valid.x86.fr/63rxyj

jg37cDu.png

I run multiple servers on this machine, and its such a painnnnnnnnnn aaaaaaa
 

RainOfPain125

Honorable
Feb 24, 2017
125
0
10,680
If you don't see useplatformclock at all, then it's up to specific software to address the HPET if it's designed to.
I see this

Have you updated any drivers recently?
Yeah. Quite a few. not like I remember lol. There was one for AHCI. I updated it from the default Standard SATA AHCI Controller to AMD SATA Controller

I've seen cases where a SATA controller could not properly handle requests after a hard drive was put into sleep mode and would throw a BSOD when trying to address them.
My HDD is constantly running and never really sleeps. But how do I disable it?

I would also stop using any of the effected processes where possible, such as "unturned.exe" and see if the issue persists without those processes trying to access memory.
"Unturned.exe" is the servers I am running.
 
My HDD is constantly running and never really sleeps. But how do I disable it?
To disable hard disk sleep, you need to find your way to the Advanced settings dialog for the power plan you wish to customize.

Right mouse-click the Start Button, select Run, type control.exe powercfg.cpl,,3 and click OK.

Select the power plan you wish to modify from the drop down list. Each plan has it's own settings for hard disks, so you need to modify this setting for each profile you use and wish it changed on. Also note that power profiles can be changed automatically by software without your consent, such as when using VR headsets, so even if you never manually switch to the High performance power plan, it may be used from time to time.

Expand the Hard disk branch, followed by the Turn off hard disk after branch.

Set the timeout interval to Never (Never is used in place of 0.)

If you don't see useplatformclock at all, then it's up to specific software to address the HPET if it's designed to.

I see this
I would recommend deleting the useplatformclock BCD variable using the instructions I provided earlier. There is generally no benefit to an external hardware clock on newer CPUs as their internal timers are much lower latency, and you can experience weird issues such as mouse cursors not disappearing over media playback windows and multimedia sync issues if HPET latency is not well handled. Software that specifically targets the HPET in your system will still be able to access the HPET (provided you leave it enabled in BIOS), but rather than forcing Windows to use timing routines based around the HPET, it's best to let it use the constant / invariant TSC (Time Stamp Counter) in your CPU instead.

"Unturned.exe" is the servers I am running.
That's what I was suspecting. If you can run the system for a period, with as little of the aforementioned software active as you can, and the problem goes away, I would suspect the software you weren't running was causing the memory corruption.

An anecdotal example I've run into is with Windows 8.1 Pro with Media Center on Ryzen. After Microsoft decided to nix support for new CPUs (including Ryzen) in Windows 8.1, calling them unsupported, allowing the media center icon to show up in the System Notification area will trigger random memory corruption, which of course leads to a BSOD. Go figure. Turning off the notification icon doesn't break Windows Media Center and, without the icon the system is otherwise perfectly stable. It only takes one tiny bit of misbehaving code to make Windows very upset.

You might check with the devs for your software to see if there are updates available, or recent updates that might need be rolled back.

Have you updated any drivers recently?

Yeah. Quite a few. not like I remember lol. There was one for AHCI. I updated it from the default Standard SATA AHCI Controller to AMD SATA Controller
Some drivers may give you the option to roll them back, but the only ones I would take much interest in would be those specific to the modules that caused Windows to generate BSOD crash dumps.
 

RainOfPain125

Honorable
Feb 24, 2017
125
0
10,680
I would recommend deleting the useplatformclock BCD variable using the instructions I provided earlier. There is generally no benefit to an external hardware clock on newer CPUs as their internal timers are much lower latency, and you can experience weird issues such as mouse cursors not disappearing over media playback windows and multimedia sync issues if HPET latency is not well handled. Software that specifically targets the HPET in your system will still be able to access the HPET (provided you leave it enabled in BIOS), but rather than forcing Windows to use timing routines based around the HPET, it's best to let it use the constant / invariant TSC (Time Stamp Counter) in your CPU instead.
I see this, as in it DOESN'T show up in the console. useplatformclock is already disabled and/or is not in that list.

You might check with the devs for your software to see if there are updates available, or recent updates that might need be rolled back.
Unturned is a steam game, there is no such thing as "rolling back" updates. I'm not sure why Unturned would be causing the crashes, but I would agree it probably is it, considering I've been crash-free for a while. I could of course try to report this to the dev, but what use is there if theres nothing to pin-point the cause of the crash beyond the small memory dumps?

Some drivers may give you the option to roll them back, but the only ones I would take much interest in would be those specific to the modules that caused Windows to generate BSOD crash dumps.
Then, according to the crash reports you guys have given me, none of the new drivers have caused the BSOD's
 

gardenman

Splendid
Moderator
I ran the dump files through the debugger and got the following information: https://pste.eu/p/qynd.html

File information:042019-31687-01.dmp (Apr 20 2019 - 13:35:41)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:memory_corruption (Process: conhost.exe)
Uptime:3 Day(s), 9 Hour(s), 28 Min(s), and 44 Sec(s)

File information:042019-26968-01.dmp (Apr 20 2019 - 14:19:52)
Bugcheck:KERNEL_SECURITY_CHECK_FAILURE (139)
Driver warnings:*** WARNING: Unable to verify timestamp for WdFilter.sys
Probably caused by:WdFilter.sys (Process: System)
Uptime:0 Day(s), 0 Hour(s), 40 Min(s), and 42 Sec(s)

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 

gardenman

Splendid
Moderator
The debugger is part of the Windows SDK and it's available from Microsoft. I'm using an older version that's included in the SDK version 10.0.14393.795 and it's available here. You can download the SDK setup and install only the debugger which will allow you open dump files and "analyze" them. Or you can download the newer one from the Windows Store.

The debugger is a complicated piece of software. I only know a few basic commands with it (lmv to list drivers and a few others). Other people on here know much more than I do about the debugger.

My own software just takes the info from the debugger, makes it more human readable, assigns descriptions to drivers, and puts the info in a webpage format that I can use online.
 
Everything is pointing to memory corruption. You're getting errors in all sorts of processes, leading me to believe the processes are not the culprit.

Dedicated memory tests can not track down all types of memory issues. Under normal usage conditions on modern systems, there can be a high degree of fluctuation on power rails in a power supply, and a high degree of both clock gating and frequency adjustment in a CPU. Plus you are very likely to have different temperature ranges that the components are operating under during the memory test conditions. A memory test is usually a pretty controlled environment where these extra influences are not likely to be present. Memory that appears stable in a test, may not do so under normal usage conditions. Also, some errors occur very infrequently and only under extended testing. Like Colif suggested, go for at least 8 consecutive passes for each memory module and if an extended test is available, use it.

You mention successfully completing a chkdsk of drive D and encountering BSODs when checking drive C. Have you successfully completed a chkdsk on drive C? If you consistently BSOD when running chkdsk on drive C, I would use this as a simple test to find the source of what seems to be manifesting as memory corruption rather than running a memory test after each setting you change. Of course, this doesn't help as much if a chkdsk of drive C is only inconsistently triggering stop errors.

Things I would start with:

  • Replace any SATA cables in the system with new ones, or at least replace the SATA cable to your primary OS drive, even if all you can do is swap the cable the drive is using with a different drive.
  • If you have an unused hard drive or SSD, swap that in place of the one you're currently using followed by a fresh Windows installation to test for a few days if the drive itself is the source of your corruption. If the drive is the problem, you can try recommissioning the problem drive by removing all current partitions from it and letting the Windows Installer create partitions as necessary during installation to the drive. Make sure there are no other connected drives with available partitions on them however, or the Windows Installer will likely dump important boot information on one of those drives too.
  • It doesn't appear your primary OS drive is an SSD, but if it were I would recommend checking the manufacturer's site for updated firmware.
  • Make sure your CPU is running at stock settings.
  • Finally, if none of the above works, I would start tinkering with the memory modules and CPU memory voltages:
    • Try your memory modules at 2133 for a while
    • Try loosened timings (bigger numbers, not smaller) such as 17-17-17-40-60-2T for a while
    • If you run at 1T command rate, run with Gear Down Mode turned on
    • Bump the memory module voltage to 1.3 - 1.35V in BIOS
    • If available on your B350 board, try boosting the CPU SOC voltage a few mV at a time to 1.1V (can go higher, but depending on cooling solution, watch temps under load)
    • If available on your B350 board, try boosting the CPU VDDP a few mV at a time, up to 0.9V, or +0.2 over default
    • If available on your B350 board, DDRVtt should be as close to half of memory module voltage as you can get (ex. if modules are 1.3V then DDRVtt would be 0.65V)
 
Last edited:
  • Like
Reactions: Colif

RainOfPain125

Honorable
Feb 24, 2017
125
0
10,680
OK...

This may sound like the dumbest shit to ever exist, but I opened my case up, unplugged the SATA cables and power cables to each HDD, then re-connected them, and I have yet to get a BSOD despite running all servers, playing intensive games, and etc all at the same time.

I guess the idea of 'turning it off and on' really works in the context of 'unplug replug' although I did not actually try new ones like suggested, thanks. I'll be back if a BSOD sneaks up on me again

Replace any SATA cables in the system with new ones, or at least replace the SATA cable to your primary OS drive, even if all you can do is swap the cable the drive is using with a different drive.
 
That sounds like some good news. :)
There's still the option to try new or different cables, in the event you start bumping into errors again.

If you move a cable which you have suspicion of contributing to the problem, it may serve as a handy reminder to put a piece of tape on it or write a note somewhere detailing which cable you suspect was the issue.

The earliest SATA cables had connectors without metal clips on them. That connector style has a tendency to deform over time and weaken the electrical connection, sometimes to the point of the drive disappearing. If you are using this older style of connector with no metal reinforcement or clip, I would recommend a plan to replace them eventually.