Question NUC with Debian - random poweroff/restart can't find cause

Jan 30, 2025
3
0
10
Hi Everyone,

Im experiencing an issue with my Intel NUC 11TNKi5 running Debian. The system randomly restarts after running normally for several hours, and after the restart, it often fails to boot properly, ending up on the BIOS boot selection screen. I’ve been troubleshooting this issue extensively, but I’m running out of ideas.
  • The server runs fine for hours (3, 5 and even 8 hours) before shutting down or restarting.
  • When the restart happens, the power button (sometimes) blinks, and the NUC seems to attempt multiple restarts that are sometimes successful, but sometimes ending up in the BIOS boot menu.
  • I have disabled automatic power-on in BIOS after a power failure, yet the NUC still restarts after shutting down.
  • The issue does not seem to be triggered by a specific workload—it occurs even under minimal load.
  • Normally the NUC is only connected to the power cable and a ethernet cable, bluetooth and wifi or any other usb are not used. For troubleshooting purposes there is a monitor attached at the moment.

What I've Tried So Far

  1. Power Supply Swap: I replaced the power supply with one from another working NUC to rule out a failing adapter.
  2. Checked Logs:
    • systemd-logind logs show multiple "Power key pressed short." events moments before the shutdown, even though no one physically pressed the button.
    • The shutdown process is repeatedly attempted but is blocked (poweroff.target is masked by me in hopes that it would prevent the power off, but instead it just restarts after some failed power off presses).
    • No memory fails
  3. BIOS Settings Adjustments:
    • Disabled automatic power-on after power loss.
    • Checked event logs in BIOS but did not find a clear reason for the shutdowns.
  4. Hardware checks:
    • Removed RAM and SSD and placed it back
    • Used compressed air
    • Kind of wiggled the power button to see if it was loose or anything (its directly attached to MB so i cant (easily) remove it entirely

I am kind of stuck in troubleshooting this any further. Here are the most recent logs:

https://pastebin.com/fwf7vjNq
Log of last night i booted up the server around 8 PM, it ran until a bit before midnight where it started receiving the power button presses for a bit, then the log stops because the server has restarted.

https://pastebin.com/heDDbhB0
Three minutes after first log ends, this one starts. But this boot does not seem to complete.

This morning i found the server on the following screen:

f6JExg7R42Z7e3tcnDMkB5Wh.png


Hope to get some new insights/ideas on how to troubleshoot further!

Thanks in advance😀
 
Two sentences that I noted:

"systemd-logind logs show multiple "Power key pressed short." events moments before the shutdown, even though no one physically pressed the button."

and

"Kind of wiggled the power button to see if it was loose or anything (its directly attached to MB so i cant (easily) remove it entirely"

My thought is that there is a problem with the power button.

Provided I correctly followed your troubleshooting thus far it seems ton narrow down to removing and swapping in another power button. Even if not easily removalble.

That said, and I am not familar with the wiring etc., is it possible to check the power button's continuity (or lack thereof) via a multi-meter?

All should be powered off and unplugged before testing the power button.
 
Two sentences that I noted:

"systemd-logind logs show multiple "Power key pressed short." events moments before the shutdown, even though no one physically pressed the button."

and

"Kind of wiggled the power button to see if it was loose or anything (its directly attached to MB so i cant (easily) remove it entirely"

My thought is that there is a problem with the power button.

Provided I correctly followed your troubleshooting thus far it seems ton narrow down to removing and swapping in another power button. Even if not easily removalble.

That said, and I am not familar with the wiring etc., is it possible to check the power button's continuity (or lack thereof) via a multi-meter?

All should be powered off and unplugged before testing the power button.

Thank you for your reply!

I don't think there is a easy way for me to remove the power button, im pretty sure its soldered onto the motherboard (if all else fails il grab the old soldering iron and give it a try. The thing i'm wondering is how could an issue with the power button cause the device to restart? Judging from the logs the power button is 'pressed' a few times. I've tried to replicate this by repeatedly pressing the physical button myself, even for 30 second straight with nothing happening.
 
Depending on the switch there could be internal corrosion, carbon build up from arcing, etc.. Likely a cheap generic switch and thus poorly designed, low quality materials, and slap together assembly.

Basically the end result being that the switch may or may not "conduct" whether or not the switch is "on" or "off".

Even if the switch is "off" a short could falsely present an "on" condition and the device restarts.

Could be that the problem is occuring somewhere else.... solder connection, a conductor break or crack somewhere....

That is why a multi-meter can be helpful. You can do conductivity checks between points (power off - unplugged) while wiggling, pressing, and so forth.

Intermittent problems are difficult to troubleshoot.

With the switch being difficult to remove it is likely that Murphy's law is kicking in. The one about the most likely source of the problem being the hardest to get to and fix.
 
Depending on the switch there could be internal corrosion, carbon build up from arcing, etc.. Likely a cheap generic switch and thus poorly designed, low quality materials, and slap together assembly.

Basically the end result being that the switch may or may not "conduct" whether or not the switch is "on" or "off".

Even if the switch is "off" a short could falsely present an "on" condition and the device restarts.

Could be that the problem is occuring somewhere else.... solder connection, a conductor break or crack somewhere....

That is why a multi-meter can be helpful. You can do conductivity checks between points (power off - unplugged) while wiggling, pressing, and so forth.

Intermittent problems are difficult to troubleshoot.

With the switch being difficult to remove it is likely that Murphy's law is kicking in. The one about the most likely source of the problem being the hardest to get to and fix.
Ah so basically if it is shorted powerbutton it will turn the device off but during that 'short' it will also turn it back on. Which could explain why after several failures it would go to BIOS boot select.

I'm going to run the server this night on a live-image, just to make absolutely sure it isnt software. Even though unlikely considering lack of errors in logs. And then just inspect the entire MB and the soldered connections to see whats going on.