Question Strange Windows crashes and high GPU temperature spikes ?

BurntToasters

Prominent
Mar 13, 2023
1
0
510
Hello. Apologies for the long post I just wanted to include all information from this weird issue in hopes of figuring out the problem.

TLDR at the bottom.

SPECS:
  • Computer Type: Desktop
  • GPU: RTX 4080 Super Founders
  • CPU: RYZEN 9 7900x
  • Motherboard: ASUS ProArt X670E-CREATOR WiFi
  • BIOS Version: 2704
  • RAM: CORSAIR Dominator Titanium 64GB DDR5 6000MHz C30
  • PSU: Seasonic Prime TX 1000W
  • Case: Antec Flux Pro
  • Operating System & Version: WINDOWS 11 PRO 24H2 (Current update: KB5048667 as newer updates have current unresolved issues)
  • GPU Drivers: GEFORCE GAME READY DRIVER - WHQL Driver Version: (Prev: 566.36/Current: 566.14)
  • Chipset Drivers: AMD X670E CHIPSET DRIVERS VERSION 6.10.17.152
Explanation:

At the beginning of last month, I was playing Marvel Rivals and when I exited the game, I got a BSOD. I thought that was strange but after that I no more crashes or weirdness happened... until yesterday. Yesterday, I was playing Balatro and watching a YouTube video, when out of nowhere my computer force-restarted.

It didn't BSOD the first time so I didn't get a dump file so I couldn't see any potential error. After that, I re-installed my GPU drivers using DDU (In safe mode) and ran furmark and the accompanying CPU burner to quickly see if it was a PSU issue.

I ran furmark and cpu burner for 30 mins and everything seemed fine, until I stopped furmark and cpu burner and then windows instantly BSODed this time with the following error: Kernel Security Check Failure. I installed and used bluescreen view to check the dump file and the item that failed was noskrnl.exe. I researched this and found that the kernel crashing is commonly associated with memory issues. I ran memtest86 and it completed with 0 errors and passed.

My next order of business was to update the BIOS to the newest available version. After that, I re-enabled EXPO and things seemed to be working again for the time being. I then re-seated my ram just in case and used DDU again to uninstall the current drivers and use a previous version for troubleshooting purposes.

Now today, I tried the furmark + cpu burner test combo again to see if those worked, and unfortunately the system hard-restarted again without a BSOD the moment I pressed "stop" on cpu burner.

I started furmark again when the system restarted but this time used cinebench 30-min benchmark to stress the cpu and noticed something possibly concerning regarding my temps.

I know the max temp of the 7900x is 95C which is why I was concerned to see on HWInfo64 that during the stress test, the CCD2 Tdie was briefly at 100.6C! and then went back down yo around 88-95C (sometimes 96C).

The weird thing though is that HWInfo64 did not report any thermal throttling due to the temperature. After the cinebench benchmark was done, the system force-restart crashed again.

I had a hunch that the issue was due to thermals, but it only occurred when the stress tests STOPPED. Furthermore, the second time was when the computer wasn't under any heavy load (playing balatro and watching youtube). Additionally, no thermal throttling was reported by HWInfo64.

I have also done the classic dism /online /cleanup-image /restorehealth and sfc /scannow to no avail.

TLDR:
  • BSOD after exiting marvel rivals
  • BSOD while playing balatro and watching youtube
  • Force-restart crash after stopping CPU burner
  • BSOD after stopping CPU burner
  • Force-restart crash after cinebench 30 min benchmark completed
  • High peak temperatures on CCD2 of 100.6C; no thermal throttling reported
  • System ALWAYS crashes AFTER CPU benchmark/stress test is completed/stopped, not during.
Troubleshooting steps I have done:
  • DDU (twice, two different driver versions)
  • DISM and SFC commands
  • Re-seated ram
  • Updated BIOS
  • Ran Memtest86; no errors
Any ideas or insight is greatly appreciated!
 
After you updated the BIOS, did you THEN do a hard reset to ensure that none of the prior setting information was retained (As sometimes the BIOS refuses to "forget" old settings without doing so) and force the hardware tables to be reset? If not, start there.


BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.
 
Also, what CPU cooler are you using?

What is your case model?

How many case fans are installed and EXACTLY how are each of them oriented/configured (Intake, exhaust)?

If you have an AIO cooler, where is the radiator mounted and in what direction are the fans oriented, intake or exhaust?

EXACTLY which slots are your memory modules installed in? Starting with 1 being the closest to the CPU and 4 being the closest to the edge of the motherboard, 1, 2, 3, 4? Which slots the memory is installed in is critical due to termination issues.

I've seen quite a number of systems do exactly what yours is doing when people had the memory installed in slots 1 and 3 or 3 and 4, rather than in 2 and 4 as they are intended to be installed. I've also seen some systems that would crash after stopping a game or stress test as some components on the board or memory can in some cases suddenly become heat saturated if the cooling fan rapidly slows down because the CPU thermal sensor quickly cools so fan responds in kind yet other parts of the CPU package OR the motherboard VRMS may still be absorbing heat and now there is no high speed fan operation and trigger a shut down.
 
Last edited:
  • Like
Reactions: CountMike
Hello. Apologies for the long post I just wanted to include all information from this weird issue in hopes of figuring out the problem.

TLDR at the bottom.

SPECS:
  • Computer Type: Desktop
  • GPU: RTX 4080 Super Founders
  • CPU: RYZEN 9 7900x
  • Motherboard: ASUS ProArt X670E-CREATOR WiFi
  • BIOS Version: 2704
  • RAM: CORSAIR Dominator Titanium 64GB DDR5 6000MHz C30
  • PSU: Seasonic Prime TX 1000W
  • Case: Antec Flux Pro
  • Operating System & Version: WINDOWS 11 PRO 24H2 (Current update: KB5048667 as newer updates have current unresolved issues)
  • GPU Drivers: GEFORCE GAME READY DRIVER - WHQL Driver Version: (Prev: 566.36/Current: 566.14)
  • Chipset Drivers: AMD X670E CHIPSET DRIVERS VERSION 6.10.17.152
Explanation:

At the beginning of last month, I was playing marvel rivals and when I exited the game, I got a BSOD. I thought that was strange but after that I no more crashes or weirdness happened... until yesterday. Yesterday, I was playing Balatro and watching a YouTube video, when out of nowhere my computer force-restarted.

It didn't BSOD the first time do I didn't get a dumb file so I couldn't see any potential error. After that, I re-installed my GPU drivers using DDU (In safe mode) and ran furmark and the accompanying CPU burner to quickly see if it was a PSU issue.

I ran furmark and cpu burner for 30 mins and everything seemed fine, until I stopped furmark and cpu burner and then windows instantly BSODed this time with the following error: Kernel Security Check Failure. I installed and used bluescreen view to check the dump file and the item that failed was noskrnl.exe. I researched this and found that the kernel crashing is commonly associated with memory issues. I ran memtest86 and it completed with 0 errors and passed.

My next order of business was to update the BIOS to the newest available version. After that, I re-enabled EXPO and things seemed to be working again for the time being. I then re-seated my ram just in case and used DDU again to uninstall the current drivers and use a previous version for troubleshooting purposes.

Now today, I tried the furmark + cpu burner test combo again to see if those worked, and unfortunately the system hard-restarted again without a BSOD the moment I pressed "stop" on cpu burner.

I started furmark again when the system restarted but this time used cinebench 30-min benchmark to stress the cpu and noticed something possibly concerning regarding my temps.

I know the max temp of the 7900x is 95C which is why I was concerned to see on HWInfo64 that during the stress test, the CCD2 Tdie was briefly at 100.6C! and then went back down yo around 88-95C (sometimes 96C).

The weird thing though is that HWInfo64 did not report any thermal throttling due to the temperature. After the cinebench benchmark was done, the system force-restart crashed again.

I had a hunch that the issue was due to thermals, but it only occurred when the stress tests STOPPED. Furthermore, the second time was when the computer wasn't under any heavy load (playing balatro and watching youtube). Additionally, no thermal throttling was reported by HWInfo64.

I have also done the classic dism /online /cleanup-image /restorehealth and sfc /scannow to no avail.

TLDR:
  • BSOD after exiting marvel rivals
  • BSOD while playing balatro and watching youtube
  • Force-restart crash after stopping CPU burner
  • BSOD after stopping CPU burner
  • Force-restart crash after cinebench 30 min benchmark completed
  • High peak temperatures on CCD2 of 100.6C; no thermal throttling reported
  • System ALWAYS crashes AFTER CPU benchmark/stress test is completed/stopped, not during.
Troubleshooting steps I have done:
  • DDU (twice, two different driver versions)
  • DISM and SFC commands
  • Re-seated ram
  • Updated BIOS
  • Ran Memtest86; no errors
Any ideas or insight is greatly appreciated!
You didn't say which CPU cooler and general cooling you have. It takes quite a cooler to tame 7900x. If air cooler, case ventilation is critical . if liquid pump speed.
HWInfo64 may not pick up transient temperature peaks if polling frequency is at default 2000mS. lower it to 200 or less.
If you are short on cooling ,use Curve Optimizer from BIOS, PBO section. to set some negative voltage bias, usually -20 to -30. As a first help, you can also set ECO mode in BIOS which should lower temps considerably without much or any performance loss.