GRUB2: How to Avoid Boot Loops by Limiting Retries
When it comes to systems where manual intervention is a rare luxury – think remote servers or embedded systems – managing boot slots in a resilient manner is critical. This article introduces a refined GRUB2 configuration designed to limit the number of boot attempts, helping you steer clear of infinite boot loops.
GRUB2 Configuration
The GRUB2 setup outlined below uses a MAX_RETRIES
variable to set a limit on the number of boot attempts for each available slot – be it A or B. Since arithmetics are not supported out of the box, the try counters have to be increased manually. There are some versions of GRUB2 available with Lua scripting support, but it’s usually not part of the standard GRUB2 installation.
set default=0
set timeout=3
set MAX_TRIES=3
set ORDER="A B"
set A_OK=0
set B_OK=0
set A_TRY=0
set B_TRY=0
set A_INDEX=0
set B_INDEX=1
load_env
# Select bootable slot
for SLOT in $ORDER; do
eval "INDEX=\${${SLOT}_INDEX}"
eval "OK=\${${SLOT}_OK}"
eval "TRY=\${${SLOT}_TRY}"
# If bootable and has less than MAX_TRIES
if [ "$OK" -eq 1 -a "$TRY" -lt $MAX_TRIES ]; then
set default=$INDEX
# Increment attempts and save back to slot
if [ "$TRY" -eq 0 ]; then
set TRY=1
elif [ "$TRY" -eq 1 ]; then
set TRY=2
elif [ "$TRY" -eq 2 ]; then
set TRY=3
fi
eval "${SLOT}_TRY=$TRY"
break
fi
done
# Disable timeout if no slot is safe to boot
if [ "$default" -eq 0 -a "$A_TRY" -ge $MAX_TRIES -a "$B_TRY" -ge $MAX_TRIES ]; then
timeout=-1
fi
save_env A_OK A_TRY B_OK B_TRY
CMDLINE="panic=60 quiet"
menuentry "Slot A (OK=$A_OK TRY=$A_TRY)" {
linux (hd0,2)/kernel root=/dev/sda2 $CMDLINE rauc.slot=A
}
menuentry "Slot B (OK=$B_OK TRY=$B_TRY)" {
linux (hd0,3)/kernel root=/dev/sda3 $CMDLINE rauc.slot=B
}
Updated Boot Logic Explained
In this improved GRUB2 configuration, the bootloader is instructed to find a suitable bootable slot according to a set of conditions. To get started, a series of variables are initialized and loaded from the GRUB environment. These variables include:
ORDER
: Specifies the boot sequence, for example, “A B”.A_OK
andB_OK
: Indicates if Slot A or Slot B is bootable (1
) or not (0
).A_TRY
andB_TRY
: Stores the number of boot attempts made for Slot A or Slot B.MAX_TRIES
: Specifies the maximum number of retries allowed for a boot slot.
Boot Slot Selection
A for
loop iterates over the slots specified in the ORDER
variable. Within the loop, the following steps occur:
- The script uses the
eval
function to dynamically generate the variables for each slot (e.g.OK
andTRY
). - It then checks whether a slot is bootable (
OK=1
) and whether the slot has been tried less than the maximum number of times (TRY < MAX_TRIES
). - If the above conditions are met, the script sets
default
to the index of the slot, enabling it to be booted. - The
TRY
count for the bootable slot is incremented. - Finally, the
timeout
is set to3
, allowing automatic booting.
Once the system is booted and healthy the corresponding $SLOT_TRY
variable needs to be reset to 0
, else it will continue incrementing to MAX_TRIES
with each reboot. After installing an update, the ORDER
needs to be swapped.
Error Handling
In case no slots meet the conditions, GRUB will wait indefinitely for manual intervention (timeout=-1
). This happens when both slots have reached the MAX_TRIES
limit.
By using this approach, the system ensures that it doesn’t enter into an infinite boot loop with problematic slots. Instead, it gracefully degrades to a state that allows user intervention. In some situations, booting into a rescue shell may be a good alternative – this can be done by setting the corresponding default
slot instead.
Benefits and Use Cases
The primary advantage of this approach is its robustness. A system configured with this GRUB2 setup won’t remain stuck in an endless boot loop. Instead, it’ll disable slots that fail to boot after a certain number of tries. This makes the system both smart and self-healing, offering automatic failover by switching to the next available slot when one fails.
Modifying Variables with grub-editenv
The variables – such as MAX_RETRIES
, A_OK
, and B_OK
can be modified using the grub-editenv
utility. This can be useful for scenarios where you want to manually override the automatic behavior, for testing or debugging purposes.
# Example: Setting MAX_RETRIES to 5
grub-editenv /boot/grub/grubenv set "MAX_RETRIES=5"
Integration with RAUC
For those who are using the Robust Auto-Update Controller (RAUC) for managing updates on their embedded systems, this GRUB2 configuration can be plugged in seamlessly. While a full explanation is out of scope here, know that RAUC can set these GRUB environment variables as part of its update process, enhancing the reliability of your updates.
Conclusion
With a simple tweak in your GRUB2 configuration, you can make your systems more resilient and easier to maintain. So go ahead and integrate this setup into your projects.