Question Random reboots on Windows 11 ?

neto333

Distinguished
Jun 17, 2015
79
1
18,535
Hello, a few days ago i have been experiencing some random reboots within the first 2 or 3 minuts after load, some times is as bad that i cant even log in windows. I already tried to change my psu, ram and reinstalled windows. The weird thing is that im not having this reboots (as far as i have tested) running windows on safe mode.

Here is the dump test file that i got using WinDbg Crash Analyzer https://textbins.online/kphycnnqyg and the dump file https://files.catbox.moe/tnb1fn.dmp

CPU: i7 14700K

Motherboard: Z790 AORUS ELITE AX rev 1.x

Motherboard BIOS FLb (last bios avaible(

RAM: Corsair Vengeance RGB DDR5 RAM 32GB (2x16GB) 6000MHz CL30 AMD Expo iCUE

PSU: Corsair RMx Series RM850x 850W black 127V

Operating System & Version: Windows 11

Im attaching the errors that im getting on event viewer

View: https://imgur.com/a/GC2DUU7

View: https://imgur.com/a/qbKIVUW
 
The error listed in the analyzer has this:
Arg2: ffffa80fd6d38028

Does the failure always occur with this argument, exactly? I couldn't tell you what the specific error is, but if the address is the same each time, then likely it is a physical address to some particular hardware. If this changes, then the issue may not be a particular piece of hardware.

Triple hardware exceptions are designed to instantly reboot. This is to protect your disk, and in general the storage, against corrupting it further. The x86 uses interrupts to tell the CPU to look for what it is going to do next (an IRQ vector table), and in the case of failure, it has ways to deal with this. Unfortunately, once you get to a triple failure there is nothing which can be done to try and recover without risking complete corruption.

You replaced the power supply, and sometimes it can be a power supply (more often than people think). What most people who know this often don't think about is that the power line going to the power supply is part of that power circuitry. If you have a surge protector, then it protects against short spikes, but something more insidious is a brownout condition whereby the voltage is sufficient to run, but not sufficient to be stable. This is one reason why some people buy an UPS which has brownout protection. If you see that address changing, and not sticking to one address, then you might consider that your power line is part of the problem, perhaps getting a brownout protected UPS (an incorrectly wired socket can also be an issue, and incorrectly wired sockets work, but they don't necessarily behave correctly relative to ground; this is a bit of a "wild card" in power delivery).
 
  • Like
Reactions: neto333
The error listed in the analyzer has this:
Arg2: ffffa80fd6d38028

Does the failure always occur with this argument, exactly? I couldn't tell you what the specific error is, but if the address is the same each time, then likely it is a physical address to some particular hardware. If this changes, then the issue may not be a particular piece of hardware.

Triple hardware exceptions are designed to instantly reboot. This is to protect your disk, and in general the storage, against corrupting it further. The x86 uses interrupts to tell the CPU to look for what it is going to do next (an IRQ vector table), and in the case of failure, it has ways to deal with this. Unfortunately, once you get to a triple failure there is nothing which can be done to try and recover without risking complete corruption.

You replaced the power supply, and sometimes it can be a power supply (more often than people think). What most people who know this often don't think about is that the power line going to the power supply is part of that power circuitry. If you have a surge protector, then it protects against short spikes, but something more insidious is a brownout condition whereby the voltage is sufficient to run, but not sufficient to be stable. This is one reason why some people buy an UPS which has brownout protection. If you see that address changing, and not sticking to one address, then you might consider that your power line is part of the problem, perhaps getting a brownout protected UPS (an incorrectly wired socket can also be an issue, and incorrectly wired sockets work, but they don't necessarily behave correctly relative to ground; this is a bit of a "wild card" in power delivery).
Thank you very much for the detailed explanation, your message was very helpful in helping me better understand my problem. Right now im on safe mode without any reboots. I get two types of reboots, one that only reboots without any error, and the other one is the blue screen with the
WHEA_UNCORRECTABLE_ERROR argument. Also i have this PSU: cyberpower CST135XLU, I dont know much about this but on the specifications it mentions "Line Interactive Topology" ad mentions brownouts.
 
Just for testing, try running your system without the UPS. Sometimes an UPS itself can start failing (e.g., because of an old battery) and not quite be up to the task. See if the operation fails the same way without a direct plugin to the wall socket (you could put it on a surge protector, but just remember to never plug a surge protector into an UPS, nor an UPS into a surge protector).
 
The dump bugcheck is a 0x124, a WHEA_UNCORRECTABLE_ERROR. I can see in those logs that there are WHEA errors in the log. WHEA is the Windows Hardware Error Architecture, it's job is to detect and recover from hardware errors (if possible). The specific WHEA error in that dump is a machine check exception, which is a fatal hardware failure.

It's encouraging that it doesn't BSOD in Safe Mode but this could still be a hardware problem.

The first suspect is always RAM and I can see that, although you have 6000MHz RAM installed it's not overclocked and is running at it's native speed of 4800MHz. From a troubleshooting point of view that's good news. What I suggest you do, since you have 2 x 16GB sticks, is remove one stick (in B2) and see whether it's stable on just one stick. After a day or so, swap sticks and place the other stick in A2 and see whether it's stable on that one stick.

If it BSODs on both sticks on their own then it's probably not RAM. Before we get to stressing the CPU it's worth looking for a flaky driver. In rare circumstances bad drivers can cause 0x124 BSODs, and that it's stable in Safe Mode is supportive of that. I'd suggest you enable Driver Verifier, but it has to be enabled in a specific way...

Driver Verifier subjects selected drivers (typically all third-party drivers) to extra tests and checks every time they are called. These extra checks are designed to uncover drivers that are misbehaving. If any selected driver fails any of the Driver Verifier tests/checks then Driver Verifier will BSOD. The resulting minidump should contain enough information for us to identify the flaky driver. It's thus essential to keep all minidumps created whilst Driver Verifier is enabled.

To enable Driver Verifier do the following:

1. Take a System Restore point and/or take a disk image of your system drive (with Acronis, Macrium Reflect, or similar). It is possible that Driver Verifier may BSOD a driver during the boot process (some drivers are loaded during boot). If that happens you'll be stuck in a boot-BSOD loop.

If you should end up in a boot-BSOD loop, boot the Windows installation media and use that to run system restore and restore to the restore point you took, to remove Driver Verifier and get you booting again. Alternatively you can use the Acronis, Macrium Reflect, or similar, boot media to restore the disk image you took.

Please don't skip this step. it's the only way out of a Driver Verifier boot-BSOD loop.

2. Start the Driver Verifier setup dialog by entering the command verifier in either the Run command box or in a command prompt.

3. On that initial dialog, click the radio button for 'Create custom settings (for code developers)' - the second option - and click the Next button.

4. On the second dialog check (click) the checkboxes for the following tests...
  • Special Pool
  • Force IRQL checking
  • Pool Tracking
  • Deadlock Detection
  • Security Checks
  • Miscellaneous Checks
  • Power framework delay fuzzing
  • DDI compliance checking
Then click the Next button.

5. On the next dialog click the radio button for 'Select driver names from a list' - the last option - and click the Next button.

6. On the next dialog click on the 'Provider' heading, this will sort the drivers on this column (it makes it easier to isolate Microsoft drivers).

7. Now check (click) ALL drivers that DO NOT have Microsoft as the provider (ie. check all third-party drivers).

8. Then, on the same dialog, check the following Microsoft drivers (and ONLY these Microsoft drivers)...
  • Wdf01000.sys
  • ndis.sys
  • fltMgr.sys
  • Storport.sys
These are high-level Microsoft drivers that manage lower-level third-party drivers that we otherwise wouldn't be able to trap. That's why they're included.

9. Now click Finish and then reboot. Driver Verifiier will be enabled.

Be aware that Driver Verifier will remain enabled across all reboots and shutdowns. It can only be disabled manually.

Also be aware that we expect BSODs. Indeed, we want BSODs, to be able to identify the flaky driver(s). You MUST keep all minidumps created whilst Driver Verifier is running, so disable any disk cleanup tools you may have.

10. Leave Driver Verifier running for 48 hours, use your PC as normal during this time, but do try and make it BSOD. Use every game or app that you normally use, and especially those where you have seen it BSOD in the past. If Windows doesn't automatically reboot after each BSOD then just reboot as normal and continue testing. The Driver Verifier generated BSODs are these...
  • 0xC1: SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION
  • 0xC4: DRIVER_VERIFIER_DETECTED_VIOLATION
  • 0xC6: DRIVER_CAUGHT_MODIFYING_FREED_POOL
  • 0xC9: DRIVER_VERIFIER_IOMANAGER_VIOLATION
  • 0xD6: DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION
  • 0xE6: DRIVER_VERIFIER_DMA_VIOLATION
If you see any of these BSOD types then you can disable Driver Verifier early because you'll have caught a misbehaving driver.

Note: Because Driver Verifier is doing extra work each time a third-party driver is loaded you will notice some performance degradation with Driver Verifier enabled. This is a price you'll have to pay in order to locate any flaky drivers. And remember, Driver Verifier can only test drivers that are loaded, so you need to ensure that every third-party driver gets loaded by using all apps, features and devices.

11. To turn Driver Verifier off enter the command verifier /reset in either Run command box or a command prompt and reboot.

Should you wish to check whether Driver Verifier is enabled or not, open a command prompt and enter the command verifier /query. If drivers are listed then it's enabled, if no drivers are listed then it's not.

12. When Driver Verifier has been disabled, navigate to the folder C:\Windows\Minidump and locate all .dmp files in there that are related to the period when Driver Verifier was running (check the timestamps). Zip these files up if you like, or not as you choose. Upload the file(s) to the cloud with a link to it/them here (be sure to make it public).
 
My Apologies I did not read all of this or every single reply.

However one option to consider is if all of the above replies did not work then next you can reboot with your OS Medium whether that be a physical Disk or bootable USB Drive and try the built in /automatic repair/fix features within the install menu's.

And if all of the above replies and this does not work then I would just fully reformat your drives and reinstall. Sorry.