Creating a Recovery System for Ubuntu
Every operating system inevitably runs into issues from time to time that can’t be solved from within the system itself, whether it’s a boot failure caused by some change that needs fixing, or you just want to adjust your system partitions. This is especially true for those of us who use a Linux desktop and enjoy tinkering with our computers. In times like these, you need an operating system that is independent of your main one to lend a hand.
While there are many ready-made solutions like dual-booting or using a LiveUSB, they all have their own problems and limitations. On this matter, I’m quite fond of macOS’s recovery system—a read-only system installed on the same hard drive but independent of the main OS, equipped with a suite of tools that you can switch to at boot. So, I decided to create a similar recovery system for my Ubuntu desktop, but one that’s easier to customize and more convenient to use.
The Problem
For people who use a Linux desktop as their primary system, there are two common recovery solutions:
- Dual-boot: For those who already have a dual-boot setup, simply rebooting into the other OS can help solve many problems.
- LiveUSB system: The installation images for many distributions come with a “try” feature, which includes a lot of useful tools. There are also dedicated maintenance systems like SystemRescue that can be used.
But these solutions have their own issues. For me, the problem with the first option is that I don’t have a dual-boot setup. I’ve been using a single-boot Linux desktop environment for a long time, and I just fire up a virtual machine when I have to use Windows. I find this setup simpler and more convenient, and it avoids potential issues caused by two systems interfering with each other.
LiveUSB is not bad, but it has several major drawbacks:
- Boot speed: USB drives are slow. Even for USB 3.0 drives, many have a rated read speed of only 100MB/s. Faster ones might reach 400MB/s, which is still a far cry from mainstream SSDs that start at 600MB/s and can go up to several GB/s.
- Convenience: LiveUSBs are usually created by writing a prebuilt ISO file. They are designed for general-purpose use and aren’t optimized for specific devices and environments. This means you might need to make adjustments every time you boot into it, like changing display settings, configuring network connections, and so on. On top of that, some specific tools I need, like snapper and sedutil, might not be included.
- Compatibility: This is particularly an issue with dedicated maintenance systems. The kernel and tool suite versions they use are often conservative and may not be fully compatible with your environment. For example, Btrfs introduces new features with each kernel version. If you use some of these features, an outdated kernel in the maintenance system could cause compatibility problems.
I also tried using Cubic to create a custom ISO and write it to a USB drive as a recovery system. While this partially solved the convenience and compatibility issues, having to go through its entire process and write to the USB drive every time I wanted to make a change was very cumbersome. However, its approach of creating a workspace and leaving the filesystem there was a great inspiration.
After consulting many resources, I decided to build a recovery system for my own computer from scratch, based on Ubuntu.
Goals and Approach
Here are the goals I envisioned for this recovery system:
- As a recovery system, its top priority is to be relatively stable. This means every time I boot into it, it should be the same system. If I accidentally mess it up, a simple reboot should restore it to its original state. For this reason, the system should use a read-only + Overlay filesystem, similar to a LiveUSB.
- The system should be adapted to the machine’s own hardware and environment, ready to use immediately upon boot. This includes having the right drivers installed (e.g., graphics drivers) and WiFi connections configured.
- It should be easy to customize and update. If I need new tools, I can easily add them, and the packages inside can be upgraded without much hassle.
- It can be stored on the hard drive instead of being written to a USB drive. Since this system is tailored for the current device, putting it on a USB drive makes less sense, whereas storing it on the hard drive can speed up loading times.
Based on these requirements and the available tools, my general plan is:
- Create a workspace to build the recovery system. In this workspace, I can adjust and rebuild it as needed.
- On the technical side, use Casper to boot the live system and create a Unified Kernel Image as an executable that can be booted directly from EFI.
Workspace and Scripts
First, I created a new Btrfs subvolume /recovery
to serve as the workspace:
sudo btrfs subvolume create /recovery
The location is arbitrary. Since most operations require root privileges, placing it outside the user’s home directory is more convenient. The benefit of creating a separate subvolume is that after each adjustment, I can create a new snapshot to save the current state. If a subsequent change breaks it, I can always revert to a good version using a snapshot, which is like having filesystem-level version control.
In this workspace, I placed three folders:
scripts
: To store the scripts used for maintaining this recovery system, which will be listed below.fs
: The filesystem content used by the recovery system.files
: Some files that might be reused.
Base Filesystem
Next, we need to initialize the filesystem content. You can download the daily build of Ubuntu Base from the official image site. For example, if I want my recovery system to be based on Ubuntu 24.04 LTS, I would download the corresponding architecture’s compressed file from the Ubuntu Base 24.04 (Noble Numbat) Daily Build page, such as noble-base-amd64.tar.gz
, to use as my base filesystem. Enter the fs
directory and extract its contents there:
cd /recovery/fs
sudo tar xvf path/to/noble-base-amd64.tar.gz
The advantage of using a daily build is that the packages inside are the latest for that major version, so I don’t have to upgrade them again after extraction.
chroot Script
Ubuntu Base only includes the most essential system components. We obviously need to install more packages and make many adjustments later. For this, we need a script to “enter” the recovery system’s filesystem to modify it. I wrote a chroot script start-chroot.sh
for this purpose:
#!/bin/sh
set -fv
# Make sure to run as root
if [ "$(id -u)" != "0" ]; then
echo "Require root!"
exit 1
fi
cd /recovery
# Use the host system's DNS config to ensure network availability
mv fs/etc/resolv.conf fs/etc/resolv.conf.1
cp /etc/resolv.conf fs/etc/resolv.conf
# Mount necessary system mount points
mount --bind /dev fs/dev
mount none -t proc fs/proc
mount none -t sysfs fs/sys
mount none -t devpts fs/dev/pts
# Enter chroot
chroot fs
# Clean up temporary files
rm -f fs/var/lib/dbus/machine-id
rm -f fs/root/.bash_history
rm -rf fs/tmp/*
# Unmount system mount points
umount fs/dev/pts
umount fs/dev
umount fs/proc
umount fs/sys
# Restore DNS config
mv fs/etc/resolv.conf.1 fs/etc/resolv.conf
With this script, I can enter the filesystem that the recovery system will use at any time to modify its contents.
The script modifies resolv.conf
. This file is actually a symbolic link managed by systemd, and we don’t want to change this structure in the recovery system. However, when we chroot into this filesystem, the systemd within it isn’t running, so resolv.conf
is empty. This would cause programs that require network access to fail due to a lack of DNS configuration. Therefore, we need to temporarily replace it with the host system’s file.
Build Script
The next script we need is one to package the filesystem into a read-only format and build the necessary EFI executable to boot the recovery system. Here is my build-image.sh
script:
#!/bin/sh
set -efv
# Make sure to run as root
if [ "$(id -u)" != "0" ]; then
echo "Require root!"
exit 1
fi
cd /recovery
# Remove the old image
if [ -L image ]; then
old_image=$(readlink image)
if [ -d "$old_image" ]; then
rm -rf "$old_image"
fi
rm image
fi
# Create a temporary directory to store build results
tmp_dir=$(mktemp -d)
ln -s "$tmp_dir" image
mkdir image/casper
# Package the filesystem
mksquashfs fs image/casper/filesystem.squashfs -comp zstd
# Build the kernel image
ukify build \
--linux=fs/boot/vmlinuz \
--initrd=fs/boot/initrd.img \
--cmdline="boot=casper noprompt libata.allow_tpm=1" \
--output=image/recovery.efi
Here, the build results are placed in a temporary folder, which is then symlinked to the image
subfolder. This isn’t strictly necessary. I chose to do it this way because:
- It prevents the resulting image from being included in Btrfs snapshots.
- The generated image will be automatically cleaned up after the system shuts down.
The image file is a large, compressed binary blob, and there’s no real benefit to having it tracked by the filesystem. Besides, this script allows me to quickly build a new image from fs
at any time1, so there’s no need to keep the old results.
Here, ukify
is used to build the unified kernel image with a few simple kernel parameters:
boot=casper
tells it to use Casper to boot a read-only live system. Afterward, we just need to place the packaged Squashfs filesystem in thecasper
directory of the corresponding partition.noprompt
is a Casper parameter that prevents it from prompting to eject the CD or USB drive on reboot. Since we plan to put the recovery system on the hard drive, there’s nothing to eject, so we disable the prompt.libata.allow_tpm=1
is a kernel parameter required for sedutil to operate on OPAL self-encrypting drives. If you don’t use sedutil, you don’t need to add this.
Snapshots
As mentioned earlier, placing the workspace in a separate Btrfs subvolume makes it easy to create snapshots. I use snapper to manage snapshots. To do this, I first created a configuration file at /etc/snapper/configs/recovery
:
SUBVOLUME="/recovery"
FSTYPE="btrfs"
QGROUP=""
NUMBER_CLEANUP="yes"
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="5"
NUMBER_LIMIT_IMPORTANT="1"
Then, I can create an initial snapshot:
sudo snapper -c recovery create
In the future, I can use the same command to create more snapshots.
Configuring the System
After creating the workspace and placing the corresponding scripts, we need to configure the recovery system to make it truly usable.
Unless otherwise specified, the commands in this section are executed inside the chroot environment created by the start-chroot.sh
script.
Install Base Packages
First, we need to install the base system:
apt update
apt install linux-generic
apt install --no-install-recommends ubuntu-minimal
apt install casper discover laptop-detect os-prober
apt install ubuntu-desktop
Casper performs an MD5 check on boot by default, which isn’t very useful for us, so we’ll disable it:
systemctl disable casper-md5check.service
Next, we can clean up some packages that are unlikely to be needed in a recovery system:
apt autoremove --purge \
snapd \
rhythmbox \
libreoffice-common \
totem \
gnome-calendar \
gnome-clocks \
gnome-characters \
gnome-startup-applications \
gnome-online-accounts \
transmission-gtk \
cloud-init \
unattended-upgrades \
firefox \
thunderbird \
ubuntu-docs \
ubuntu-report
This is the list of packages I removed, mainly things I don’t think are necessary in a recovery system, like media players, office document editors, and system upgrade tools. There are probably more packages that could be uninstalled without affecting its function as a recovery system, but I haven’t looked into it too closely. I also uninstalled snap because the only snap-based software I need is Firefox, and I prefer the version from Mozilla’s official repository. The next section will cover this in more detail.
Additionally, you can install any tools you need, such as editors, common command-line utilities, and maintenance tools:
apt install vim-gtk3 ripgrep curl bash-completion
apt install gparted mtools dmraid \
efibootmgr btrfs-progs \
nvme-cli smartmontools \
cryptsetup lvm2 \
snapper-gui \
systemd-ukify systemd-boot-efi
The specific tools, of course, can be adjusted based on personal habits and actual needs. For example, if you don’t have software-encrypted partitions, you might not need cryptsetup
. If you’re used to Emacs, you might not need VIM, and so on.
If there are other tools not available in the APT repositories, you can also copy them into fs
yourself. For instance, I need sedutil, so I downloaded it from its releases page, copied the sedutil-cli
binary to /usr/sbin
, and added execute permissions.
Install Firefox
Although I removed Firefox in the previous step, a browser is still needed in a recovery system. In this day and age, even if your computer is down, you can look up information on your phone, but it’s much more convenient to be able to do it directly on the computer.
Here, I’m following Mozilla’s official documentation, Install Firefox on Linux, to install it:
# Install the signing key
install -d -m 0755 /etc/apt/keyrings
wget -q https://packages.mozilla.org/apt/repo-signing-key.gpg -O- | \
tee /etc/apt/keyrings/packages.mozilla.org.asc > /dev/null
# Add the APT repository
cat <<EOF | tee /etc/apt/sources.list.d/mozilla.sources
Types: deb
URIs: https://packages.mozilla.org/apt
Suites: mozilla
Components: main
Signed-By: /etc/apt/keyrings/packages.mozilla.org.asc
EOF
# Configure priority
cat <<EOF | tee /etc/apt/preferences.d/mozilla
Package: *
Pin: origin packages.mozilla.org
Pin-Priority: 1000
EOF
# Install the package
apt update
apt install firefox-esr
Since the recovery system isn’t updated frequently, I chose the ESR version, which has a longer and more stable support cycle, but it’s not a critical decision.
Firefox has many features that are unnecessary for a recovery system, such as automatic update checks, Firefox Accounts, profile importing, etc. For the new user created by Casper, it also shows many tips that are irrelevant in a recovery context. We can disable all of them using a Firefox policy file. I added files/firefox-policies.json
in my workspace:
{
"policies": {
"DisableAppUpdate": true,
"DisableFirefoxAccounts": true,
"DisableFirefoxStudies": true,
"DisablePocket": true,
"DisableProfileImport": true,
"DisableProfileRefresh": true,
"DisableSystemAddonUpdate": true,
"DontCheckDefaultBrowser": true,
"DisplayBookmarksToolbar": "never",
"NoDefaultBookmarks": true,
"OfferToSaveLogins": false,
"OverrideFirstRunPage": "",
"UserMessaging": {
"ExtensionRecommendations": false,
"FeatureRecommendations": false,
"SkipOnboarding": true,
"MoreFromMozilla": false,
"FirefoxLabs": false
}
}
}
You can refer to the policy file documentation for specific options; there are many policies to choose from.
To apply this policy file to Firefox in the recovery system, we need to execute the following command outside the chroot:
mkdir -p fs/etc/firefox/policies
ln files/firefox-policies.json \
fs/etc/firefox/policies/policies.json
This links the files
in the workspace with the corresponding file in the filesystem, allowing direct modification from outside. When the recovery system needs to be rebuilt from scratch, these configuration files can also be reused directly.
Configure WiFi
Having network access in the recovery system is very important, not only for looking up information but also for downloading and installing tools that weren’t included, if needed. If you connect to the internet via WiFi, you might need to configure the WiFi password in the recovery system as well:
nmcli --offline connection add type wifi \
autoconnect yes \
ssid "<SSID>" \
wifi-sec.key-mgmt wpa-psk \
wifi-sec.psk "<password>" \
> /etc/NetworkManager/system-connections/wifi.nmconnection
chmod 0600 /etc/NetworkManager/system-connections/wifi.nmconnection
Just replace <SSID>
and <password>
with your actual WiFi configuration. This way, the recovery system will automatically connect to the specified WiFi upon startup, requiring no extra steps.
Configure Display
Sometimes the system doesn’t correctly detect the display configuration, especially when a monitor is rotated or if you want to use a different scaling factor.
In this case, you can copy ~/.config/monitors.xml
to fs/root/monitors.xml
and then create a Casper initramfs script at fs/usr/share/initramfs-tools/scripts/casper-bottom/99gnome_monitors
:
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
# get pre-requisites
prereqs)
prereqs
exit 0
;;
esac
chroot /root install \
-o $USERNAME -g $USERNAME \
/root/monitors.xml \
/home/$USERNAME/.config/monitors.xml
Then execute:
chmod +x /usr/share/initramfs-tools/scripts/casper-bottom/99gnome_monitors
update-initramfs -u
This will ensure the display configuration file is automatically installed into the temporary user’s .config
directory at boot, making sure the display settings are correct.
Override GNOME Desktop Settings
The default applications shown on the Ubuntu dock might not be what we need, but this can be changed.
Similar to the Firefox policy file, first create a configuration file files/gnome-settings.schema.override
:
[org.gnome.shell]
favorite-apps = [ 'org.gnome.Nautilus.desktop', 'firefox-esr.desktop', 'org.gnome.Terminal.desktop', 'gparted.desktop' ]
[org.gnome.shell:ubuntu]
favorite-apps = [ 'org.gnome.Nautilus.desktop', 'firefox-esr.desktop', 'org.gnome.Terminal.desktop', 'gparted.desktop' ]
Then, execute the following outside the chroot:
ln files/gnome-settings.schema.override \
fs/usr/share/glib-2.0/schemas/99_settings.gschema.override
to link this configuration file into the filesystem. Finally, execute this inside the chroot:
cd /usr/share/glib-2.0/schemas
rm gschemas.compiled
glib-compile-schemas .
to ensure the information in GNOME’s database is correct.
Here, I’ve placed the file browser, browser, terminal, and GParted. The specific arrangement can be adjusted according to personal needs. As for why it needs to be written twice, once with :ubuntu
, I’m not entirely sure. But you can check if the data is correct with this command:
gsettings --schemadir . get org.gnome.shell favorite-apps
Build and Deploy
After configuring the system, you can use the build script mentioned earlier to build the recovery system:
sudo ./scripts/build-image.sh
Once the packaging is complete, you will find two files in the workspace’s image
folder: recovery.efi
and casper/filesystem.squashfs
. The next step is to make this file accessible to the EFI.
I personally copy them directly into the EFI System Partition2, but the common recommendation seems to be that this partition only needs to be around 200MB, whereas an Ubuntu-based recovery system can easily be over a gigabyte in size. If your EFI System Partition doesn’t have enough space, you could consider creating another FAT32 partition to store them. It’s important to note that filesystem.squashfs
must be placed in a casper
subdirectory in the partition’s root, but recovery.efi
can be placed anywhere.
Finally, you just need to make the EFI aware of the recovery system. Assuming recovery.efi
is in the root directory of the partition /dev/nvmeXnYpZ
, you can execute:
sudo efibootmgr --create \
--disk /dev/nvmeXnY --part Z \
--label "Ubuntu Recovery" \
--loader recovery.efi
After that, you can use efibootmgr
to see the boot number for this recovery system and then use:
sudo efibootmgr --bootnext XXXX
to boot into the recovery system on the next startup. Additionally, some BIOS interfaces support selecting and booting EFI entries directly.
After rebooting and testing, if everything is fine, you can use snapper to create a new snapshot of the workspace’s current state to save your work.
References
- Building an Ubuntu 22.04 Live CD from Scratch - narukeu (Chinese)
- Minimal Ubuntu Install - Northwestern MSR Hackathon (English)
- LiveCD Customization - Ubuntu Community Help Wiki (English)
- Unified kernel image - ArchWiki (English)