Question Constant BSODs and stress test crashes. At my wit's end

Aug 6, 2022
21
0
10
I have been having constant BSODS, seemingly out of nowhere, for the last few weeks and I can't seem to diagnose the problem. I have even bought new RAM, as I thought that my old DIMMS might have been the culprit, to no avail. My PC will also completely freeze during OCCT CPU and power stress tests, within 30 seconds. I am not overclocking ANY components (XMP is off too).

Things I've tried:
Resetting CMOS
Updating BIOS
Rolling back GPU drivers and installing newer ones
Buying new RAM
Switching DIMMs to different slots
MemTest (no errors)
Reinstalling windows (10 and 11) on fully formatted drives
Installing windows on a separate drive
System file checker (sfc /scannow)
CHKDSK
Enabling/disabling MCE
Disabling C-States
Installing chipset drivers
Running through onboard graphics

Setup:
ASUS Z690-E
Corsair Dominator DDR5
Intel i9 12900k
ASUS TUF RTX 3080 Ti
SuperFlower 1000w Leadex Platinum 80+
Custom loop cooling (GPU+CPU block, front mounted distro plate, 3x360mm rads)

Temps:
Idle - CPU = 25c, GPU = 25c
Load - CPU = 65c, GPU = 45c

Here are the two latest dump files, if anyone wants to take a crack at it:
https://files.catbox.moe/153at9.dmp
https://files.catbox.moe/4q3ysm.dmp
 
Last edited:

Aeacus

Titan
Ambassador
My PC will also completely freeze during OCCT CPU and power stress tests, within 30 seconds. I am not overclocking ANY components (XMP is off too).

It could be a bad CPU. :unsure:

Few days ago, we got a person here with similar symptoms and after extensive troubleshoot, it did come down to faulty CPU,
topic: https://forums.tomshardware.com/threads/system-bsods-and-crashes.3787658/

He had i9-13900KF, so a different CPU, still, symptoms are similar. And you don't have sub-par hardware either.

What you could to, is trying some of the troubleshooting steps we did in that topic. E.g using Unigine Superposition to see how GPU fares. Or trying 3D Mark and if it too crashes, like it did for him, it may be time to look towards CPU replacement.


To convert these, i need to summon our resident Win10/11 expert, @Colif .
 

Ralston18

Titan
Moderator
Take a look in Reliability History and Event Viewer. Either one or both tools may be capturing some error code, warning, or even an informational event just before or at the time of the BSODs.

Start with Reliability History: much more user friendly and the timeline format can be very revealing.

Event Viewer is not as easy to use and/or understand.

FYI:

http://www.tomshardware.com/faq/id-3128616/windows-event-viewer.html

How old is that SuperFlower 1000w Leadex Platinum 80+ ?

History of heavy gaming use?

= = = =

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Inspect all for signs of damage.
 
Aug 6, 2022
21
0
10
Take a look in Reliability History and Event Viewer. Either one or both tools may be capturing some error code, warning, or even an informational event just before or at the time of the BSODs.

Start with Reliability History: much more user friendly and the timeline format can be very revealing.

Event Viewer is not as easy to use and/or understand.

FYI:

http://www.tomshardware.com/faq/id-3128616/windows-event-viewer.html

How old is that SuperFlower 1000w Leadex Platinum 80+ ?

History of heavy gaming use?

= = = =

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight and feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Inspect all for signs of damage.
The entire build is under a year old and spotless. I keep everything super clean.

These are some things that I forgot to mention, which could be noteworthy:
  • I had a reasonable amount of blackouts (power outages) in the house I was living in before
  • I only have a single 8-pin connector plugged in for the CPU (as opposed to both 8-pin slots, I was told it would be sufficient unless OCing)
  • I am using a double 8-pin Y-split cable for my GPU (as opposed to two single 8-pins)
  • There was a day where I was struggling with my AIO and my CPU hit >100c a few times. I had to repeatedly restart my PC every time it thermal throttled and shut down, in order to get into the BIOS to fix a messed up fan curve
  • I did recently ship my PC abroad but it was extremely well packed and wrapped in anti-static bubblewrap (GPU and RAM removed and carried separately, in carry-on)
  • Sometimes there will be some pretty strange graphical 'glitches' or artifacting on my screen JUST prior to a BSOD or when it freezes up (PSU problem?)
 
Last edited:

Colif

Win 11 Master
Moderator
conversion of dumps

report- click run as fiddle to read


File: 153at9.dmp (Dec 10 2022 - 02:35:50)
BugCheck: [VIDEO_SCHEDULER_INTERNAL_ERROR (119)]
Probably caused by: dxgmms2.sys (Process: System)
Uptime: 0 Day(s), 0 Hour(s), 08 Min(s), and 46 Sec(s)

File: 4q3ysm.dmp (Dec 10 2022 - 03:06:06)
BugCheck: [DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)]
Probably caused by: ntkrnlmp.exe (Process: svchost.exe)
Uptime: 0 Day(s), 0 Hour(s), 06 Min(s), and 51 Sec(s)

File: ib7fje.dmp (Dec 10 2022 - 07:55:43)
BugCheck: [PFN_LIST_CORRUPT (4E)]
Probably caused by: memory_corruption (Process: AquaComputerSe)
Uptime: 0 Day(s), 0 Hour(s), 10 Min(s), and 37 Sec(s)

crash 1 caused by GPU drivers
not the newest driver out there
Sep 14 2021nvlddmkm.sysNvidia Graphics Card driver http://www.nvidia.com/
try running ddu in safe mode, uninstall GPU drivers, boot back into normal and grab newer drivers

processed mentioned are victims

Any idea what this is?
Aquasuite? Seems too old for win 10/11

run this? https://www.intel.com/content/www/us/en/download/15951/19792/intel-processor-diagnostic-tool.html?

what speed is ram running at? I see 4800 but its rated at 5200
you have newest bios

I can't tell if this is newest version of Asus have taken all the dates off their download files
Jul 27 2021TeeDriverW10x64.sysIntel Management Engine Interface driver
newest version of Asus site is 2229.3.2.0, you probably need to look in device manager to confirm that is same version. It helps if it matches the bios version.

could run this and see if anything newer - https://www.intel.com.au/content/www/au/en/support/intel-driver-support-assistant.html
 

Ralston18

Titan
Moderator
Any observed error codes, warnings, or information events in Reliability History and Event Viewer?

If the original house had blackouts it could be possible that some damage was done to the build and that damage continues to cause the problems - still more needs to be known.

Regarding:

"I only have a single 8-pin connector plugged in for the CPU (as opposed to both 8-pin slots, I was told it would be sufficient unless OCing)
I am using a double 8-pin Y-split cable for my GPU (as opposed to to single 8-pin
s)"

May not be "sufficient". Take another look at the Motherboard and CPU User Guides/Manuals. Verify that the current connections and configuration is indeed supported.

PSU = "SuperFlower 1000w Leadex Platinum 80+ "

Do you have another known working PSU that could be swapped in for testing?
 
Aug 6, 2022
21
0
10
I checked Reliability History and there are a ton of hardware errors (it always coincides with the time of "shut down unexpectedly") :

Problem signature
Problem Event Name: LiveKernelEvent
Code: 124
Parameter 1: 7
Parameter 2: ffffd68ff8d02020
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_22621
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.22621.2.0.0.768.101
Locale ID: 2057

Problem signature
Problem Event Name: LiveKernelEvent
Code: 124
Parameter 1: 7
Parameter 2: ffff93821bbf1030
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_22621
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.22621.2.0.0.768.101
Locale ID: 2057

Problem signature
Problem Event Name: LiveKernelEvent
Code: 193
Parameter 1: 804
Parameter 2: ffffffffc0000001
Parameter 3: 108
Parameter 4: fffff8026739cfe0
OS version: 10_0_22621
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.22621.2.0.0.768.101
Locale ID: 2057

Problem signature
Problem Event Name: LiveKernelEvent
Code: 124
Parameter 1: 7
Parameter 2: ffffba82e3514020
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_22621
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.22621.2.0.0.768.101
Locale ID: 2057

Problem signature
Problem Event Name: LiveKernelEvent
Code: 117
Parameter 1: ffffcf8a074c5010
Parameter 2: fffff803a70da9c4
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_22621
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.22621.2.0.0.768.101
Locale ID: 2057
 
Aug 6, 2022
21
0
10
Any observed error codes, warnings, or information events in Reliability History and Event Viewer?

If the original house had blackouts it could be possible that some damage was done to the build and that damage continues to cause the problems - still more needs to be known.

Regarding:

"I only have a single 8-pin connector plugged in for the CPU (as opposed to both 8-pin slots, I was told it would be sufficient unless OCing)
I am using a double 8-pin Y-split cable for my GPU (as opposed to to single 8-pin
s)"

May not be "sufficient". Take another look at the Motherboard and CPU User Guides/Manuals. Verify that the current connections and configuration is indeed supported.

PSU = "SuperFlower 1000w Leadex Platinum 80+ "

Do you have another known working PSU that could be swapped in for testing?
Unfortunately I don't have another PSU to use. I added some of the errors found in Reliability History in the comment above. Will take a look at suggested configs in the manual.
 
Aug 6, 2022
21
0
10
@Colif Aquasuite is used to modulate my custom loop parameters and RGB. It was crashing before that too. I also flashed to latest BIOS last night, still no improvement (on the latest version). Tried DDU too but not in safe mode, will give it a shot and see if there's any improvement. The RAM is running at default speed (XMP disabled).
 

Colif

Win 11 Master
Moderator
I checked Reliability History and there are a ton of hardware errors (it always coincides with the time of "shut down unexpectedly") :

Reliability history sees unexpected shutdowns as Hardware errors, so that makes sense

event 193 can be outdated chipset drivers, or just drivers in general
event 117 appears to be hardware

really, its hardware of some sort as you had same problems on 2 clean installs. Generally people aren't that unlucky.

ASUS Z690-E
Corsair Dominator DDR5
Intel i9 12900k
ASUS TUF RTX 3080 Ti
SuperFlower 1000w Leadex Platinum 80+
Custom loop cooling (GPU+CPU block, front mounted distro plate, 3x360mm rads)
replaced ram
replaced boot drive

run intel processor diagnostic tool?


run Prime 95? It will check ram, CPU & PSU (takes a long time so run overnight)
https://www.guru3d.com/files-details/prime95-download.html
Prime 95 Instructions - https://appuals.com/how-to-run-a-cpu-stress-test-using-prime95/
 

Ralston18

Titan
Moderator
Unexpected shutdowns, varying error codes, and increasing numbers of errors are an indication to me that the PSU is a likely culprit.

And, unexpected shutdowns, in turn, can and do cause file corruption which makes the situation all the worse.

I am not familiar with Aquasuite so will defer accordingly. However, that said, such software can often be problematic and depending on the environment quite possibly counterproductive.
 
Aug 6, 2022
21
0
10
Unexpected shutdowns, varying error codes, and increasing numbers of errors are an indication to me that the PSU is a likely culprit.

And, unexpected shutdowns, in turn, can and do cause file corruption which makes the situation all the worse.

I am not familiar with Aquasuite so will defer accordingly. However, that said, such software can often be problematic and depending on the environment quite possibly counterproductive.
How confident are you of this? I'm actually hoping it's the PSU. Rebuilding my pc with this custom loop is going to be an absolute fn nightmare (if the mobo or CPU is the culprit). It might be worth nothing that the power test in OCCT is the one which fails the quickest (always within 15 secs or so).
 

Ralston18

Titan
Moderator
Confident enough to recommend the installation of another PSU for testing purposes.

Some other thoughts:

Use some of the calculators listed in the following link to determine the applicable PSU wattage for the build:

https://www.tomshardware.com/reviews/best-psus,4229.html

Plus do your own calculation based on installed components. If a component has a range of wattage values use the high end value.

Test the PSU:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Any voltages out of tolerance make the PSU suspect.

And continue to delve into the dumps and other data available.

There are remaining questions concerns regarding AquaComputerServices and the Intel Management Engine etc. as I understand the previous posts.

The PSU, as stated in the Tom's Hardware link, "plays a significant role in determining your system's reliability, depending on its performance "

And, like many other products, PSUs do degrade over time. All the more so as they near the designed in EOL (End of Life).
 
Aug 6, 2022
21
0
10
Confident enough to recommend the installation of another PSU for testing purposes.

Some other thoughts:

Use some of the calculators listed in the following link to determine the applicable PSU wattage for the build:

https://www.tomshardware.com/reviews/best-psus,4229.html

Plus do your own calculation based on installed components. If a component has a range of wattage values use the high end value.

Test the PSU:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Any voltages out of tolerance make the PSU suspect.

And continue to delve into the dumps and other data available.

There are remaining questions concerns regarding AquaComputerServices and the Intel Management Engine etc. as I understand the previous posts.

The PSU, as stated in the Tom's Hardware link, "plays a significant role in determining your system's reliability, depending on its performance "

And, like many other products, PSUs do degrade over time. All the more so as they near the designed in EOL (End of Life).
Thanks a lot for all your input and advice. BTW, I was experiencing the same problems before the custom loop and Aquasuite, as well as IME. Going to go ahead and buy a new PSU and get it refunded if I still have problems.
 
Aug 6, 2022
21
0
10
Reliability history sees unexpected shutdowns as Hardware errors, so that makes sense

event 193 can be outdated chipset drivers, or just drivers in general
event 117 appears to be hardware

really, its hardware of some sort as you had same problems on 2 clean installs. Generally people aren't that unlucky.


replaced ram
replaced boot drive

run intel processor diagnostic tool?


run Prime 95? It will check ram, CPU & PSU (takes a long time so run overnight)
https://www.guru3d.com/files-details/prime95-download.html
Prime 95 Instructions - https://appuals.com/how-to-run-a-cpu-stress-test-using-prime95/
----------------------------------------------
-- Testing
----------------------------------------------
CPU 1 - Genuine Intel - Pass.
CPU 1 - BrandString - Pass.
CPU 1 - Cache - Pass.
CPU 1 - MMXSSE - Pass.
CPU 1 - IMC - Pass.
CPU 1 - Prime Number - Pass.
CPU 1 - Floating Point - Pass.
CPU 1 - Math - Pass.
CPU 1 - GPUStressW - Pass.
CPU 1 - CPULoad - Fail.

IPDT64 Failed
--- IPDT64 - Revision: 4.1.7.39
--- IPDT64 - End Time: 12/10/2022 5:35:03 PM

----------------------------------------------
FAIL
 

Aeacus

Titan
Ambassador
Possible? Yes. Especially when PSU struggles to provide stable voltage to PC components. Then all sorts of issues can happen.

According to the ATX PSU standard, safe voltage ranges are:
+12V DC rail - tolerance ±5% ; +11.40V to +12.60V
+5V DC rail - tolerance ±5% ; +4.75V to +5.25V
+3.3V DC rail - tolerance ±5% ; +3.14V to +3.47V
-12V DC rail - tolerance ±10% ; -10.80V to -13.20V
+5V SB rail - tolerance ±5% ; +4.75V to +5.25V

And HWinfo64 is one great software that shows what voltage ranges your components get,
link: https://www.hwinfo.com/download/
 

Colif

Win 11 Master
Moderator
what I can find on CPU Load fails appear to be from heat, cleaned inside PC recently? remove dust from heatsinks, can help a lot.

can also be caused by case airflow, or mobo vrm config/cooling
if its a k, tone down its oc, or tweak voltage (unstable + cool = raise voltage) (too hot = lower voltage)
link

Load - CPU = 65c, GPU = 45c
those aren't what I call hot but I would check temps during the tests.

something isn't stable. try running this and check rest of system temps when you run scans - https://forums.tomshardware.com/threads/how-to-use-hwinfo-to-track-sensor-values-on-ryzen.3693704/

I wasn't very awake yesterday, better now.
 
Aug 6, 2022
21
0
10
what I can find on CPU Load fails appear to be from heat, cleaned inside PC recently? remove dust from heatsinks, can help a lot.


link


those aren't what I call hot but I would check temps during the tests.

something isn't stable. try running this and check rest of system temps when you run scans - https://forums.tomshardware.com/threads/how-to-use-hwinfo-to-track-sensor-values-on-ryzen.3693704/

I wasn't very awake yesterday, better now.
I am a clean freak so I tend to keep things surgically clean. I also have no heatsinks since my entire rig is watercooled (gpu and cpu block). No dust whatsoever inside the case, only a little inside the PSU but it's too stubborn to blow out with a datavac. I was running HWinfo during the tests, all temps are within normal ranges. I am going to try to raise the cpu voltage slightly to see if that offers some better stability under stress.
 
Aug 6, 2022
21
0
10
if you ran prime 95, it might tell you if its a voltage problem

No heatsinks? so what is the radiator exactly :)

I assume its clean, mine is.
Tried running Prime95 and it bluescreened after 1 minute, this time it was Hypervisor error :(

Yes, my radiators are spotless; they are brand new
 

Aeacus

Titan
Ambassador
Yes, my radiators are spotless; they are brand new

Are you running your rad fans in pull or push?

Since when in push, this will happen (at 3:45 in video):

View: https://www.youtube.com/watch?v=UyC3lZ5WFMk#t=3m45s


I was running HWinfo during the tests, all temps are within normal ranges.

Temps, yes, but how about voltages? :unsure:
Also, care to share screenshot of HWinfo64 showing all the voltage values? Namely after testing. (E.g launch HWinfo64, start the test, when test ends, screenshot what HWinfo64 voltage sensors show.)