Question System stability: Having trouble pinning down issue.

Feb 25, 2019
4
0
10
Hey guys, been trying quite a bit of troubleshooting on my own, but I've finally run out of ideas, and I'm hoping someone might have something I've not thought of.

System Specs:
CPU: Intel I7-9700k
MB: Gigabyte Ultra Durable Z370 P D3
P/S: EVGA 700W Gold
RAM: 4x8 Crucial 2400 DDR4
GPU: Gigabyte 1070 TI Gaming

Issue I'm having:
For the most part in general use, my PC is perfectly stable and fine, it's only once I start booting up more intensive games that the error rears its head, which is annoying when that's what you built with the intent. When it hits, my entire system freezes on whatever screen I currently have up, and audio breaks into a buzzing loop. BSOD doesn't always take over, but when it does, I run into either:
A: Clock Watchdog Timeout (Occurs more often)
Or
B: DPC Watchdog Violation.
More often then not, though, it will sit on whatever screen I have up, waiting until I manually restart my system via the power button to reboot. As another note, when the error hits, all the fans on my 1070 ti and my CPU liquid cooling brick jump into overdrive like they're fighting 100 degree temps, which neither my cpu or gpu ever come close to.
I've read multiple posts around the net saying that with Clock Watchdog Timeouts, you want to examine your RAM for memory errors, I've run 4 Memtests through windows, none of which ever found an error.
I've also tried reinstalling the IDE ATA/ATAPI controller driver, no dice.
My system always reboots without any issue, and there can be times where it stays stable for hours or even a day without issue, only to then lock multiple times within hours. Anyone have any ideas? At this point I'm pretty well open to anything.
Final note: I have my OS loaded onto a 500 gb Samsung Evo 860 SSD, along with the two games in question that seem to cause the crashes. (Rainbow 6 Siege with HD texture pack, and Monster Hunter World)

I also apologize for the long-windedness of this post, I just hope to provide as many clues and details as possible, if any of it can help nail down the issue.
 
Last edited:
Running latest mainboard BIOS?

Set to optimal defaults, go with everything to auto regarding CPU and RAM.... (avoid XMP profiles, no automated overclocking, etc...) for now....

Drop down to but a pair of RAM sticks in recommended slots (prob 2nd and 4th to right/away from CPU) for dual channel config, and run default RAM speeds/timings/RAM voltages, even if slow 1333/1666/2000 MHz, and retest for a few days...repeat with the other pair of RAM stick in same slots....

Infrequent lockups like that can be maddening to narrow down, so let's rule out the easy stuff that requires almost no effort first...(RAM is easy, just be sure to completely remove power from rig when swapping modules, don't swap them merely when rig is shutdown but has power applied to rear of PSU..)

The fans spinning like mad occurs normally during a hard reset/initial power up, it's not until MB can sense temp data and begin to make some sense of the data that it will lower fan speeds per the cooling profile...

Skip any Windows RAM tests, and try Memtest86 on a USB.....let it run for hours...

Or, run P95 v26.6 Blended mode, which utilizes lots of data in and out of RAM...

CHeck HWmonitor for temps during P95 run....blended mode should induce temps of 70C , if you have adequate cooling....
 
Feb 25, 2019
4
0
10
Running latest mainboard BIOS?

Set to optimal defaults, go with everything to auto regarding CPU and RAM.... (avoid XMP profiles, no automated overclocking, etc...) for now....

Drop down to but a pair of RAM sticks in recommended slots (prob 2nd and 4th to right/away from CPU) for dual channel config, and run default RAM speeds/timings/RAM voltages, even if slow 1333/1666/2000 MHz, and retest for a few days...repeat with the other pair of RAM stick in same slots....

Infrequent lockups like that can be maddening to narrow down, so let's rule out the easy stuff that requires almost no effort first...(RAM is easy, just be sure to completely remove power from rig when swapping modules, don't swap them merely when rig is shutdown but has power applied to rear of PSU..)

The fans spinning like mad occurs normally during a hard reset/initial power up, it's not until MB can sense temp data and begin to make some sense of the data that it will lower fan speeds per the cooling profile...

Skip any Windows RAM tests, and try Memtest86 on a USB.....let it run for hours...

Or, run P95 v26.6 Blended mode, which utilizes lots of data in and out of RAM...

CHeck HWmonitor for temps during P95 run....blended mode should induce temps of 70C , if you have adequate cooling....
Update:
I've reset my bios to outright defaults via pulling the CMOS battery while powered down and disconnected.
I did notice that of my two pair of ram modules: While both are Crucial DDR4 2400, One pair is single rank, while the other is Dual. I've made a point of only testing the two single together, same for the dual.
I've run 4 Memtest86 sets, for a total of 8 passes on each pair, no errors ever detected.
In terms of real performance, I seem to have the largest windows of uptime while using my Single rank pair, in the second and fourth slots from the CPU, as you suggested. Dual Rank also gives a fair length of time so long as they are the only pair in the slots, but Single seems to edge it out.
Both pair seem to outright flounder in the other two sockets, first and third from the cpu, as lockups occur <1 hr. of any real load.
Unfortunately though, it doesn't seem to of cleared the error in simply testing them out. Can I trouble you for more guidance?
 
what temps and clock speeds are noted in HWMonitor during an actual CPU-Z/bench/stress cpu? (If pushing 100C, you've found your issue...; you'd expect to see at least one or two cores hitting 4.8 GHz or so; I'd disable MCE if that's an option on your mainboard, and leave everything cpu wise in BIOS to 'auto'...

(As your RAM is but 2400 MHz spec, let's run at that speed/default timings, or at 2133, no fancy XMP profiles above RAM specs...)

I'm pretty sure I just read something on one of Gamer's Nexus YOutube thumbnails about a disappointing BIOS update for a Z370 or 390...; might want to see if your version applies)
 
Feb 25, 2019
4
0
10
what temps and clock speeds are noted in HWMonitor during an actual CPU-Z/bench/stress cpu? (If pushing 100C, you've found your issue...; you'd expect to see at least one or two cores hitting 4.8 GHz or so; I'd disable MCE if that's an option on your mainboard, and leave everything cpu wise in BIOS to 'auto'...

(As your RAM is but 2400 MHz spec, let's run at that speed/default timings, or at 2133, no fancy XMP profiles above RAM specs...)

I'm pretty sure I just read something on one of Gamer's Nexus YOutube thumbnails about a disappointing BIOS update for a Z370 or 390...; might want to see if your version applies)
was this a fresh full WIn reinstall (recommended), or did this drive with your OS on it come from a previous build, with a CPU/MB/RAM swap, reboot and hope the correct drivers work out...? :)
With a CPU-Z Stress Test, running for 15 minutes solid, my CPU itself averaged at 70 degrees, with each individual core fluctuating 1~2 degrees.
Each core comes in at 4.6, not 4.8, but it does sit at that peak consistently during the test.
MCE, Multi-Core Enhancement(?) was set to Auto, which is now disabled.
All RAM timings and voltages are defaults, at least per the BIOS, unless underclocking them might make the difference?
Originally, my SSD was cloned from a HDD to carry the OS over, but as the problems started to happen, one of the first things I did was fully format the drive and start with a new and fresh install.
I will check out the video you said, hopefully it can shed some light.
One thing I did notice while I was stressing my CPU:
While the CPU itself doesn't exceed 70 degrees, one of my six Motherboard Thermal sensors climbed to a peak of 105 before I shut the test down.
Looking at the placement of the sensors, there is one near the power supply, but the case I have is elevated and well cooled otherwise, no other sensor climbed to any alarming numbers, my GPU didn't even climb over its ambient 28 degrees from the heat of the test.