Xidorn's Blog

Creating a Recovery System for Ubuntu

Every operating system inevitably runs into issues from time to time that can’t be solved from within the system itself, whether it’s a boot failure caused by some change that needs fixing, or you just want to adjust your system partitions. This is especially true for those of us who use a Linux desktop and enjoy tinkering with our computers. In times like these, you need an operating system that is independent of your main one to lend a hand.

While there are many ready-made solutions like dual-booting or using a LiveUSB, they all have their own problems and limitations. On this matter, I’m quite fond of macOS’s recovery system—a read-only system installed on the same hard drive but independent of the main OS, equipped with a suite of tools that you can switch to at boot. So, I decided to create a similar recovery system for my Ubuntu desktop, but one that’s easier to customize and more convenient to use.

The Problem

For people who use a Linux desktop as their primary system, there are two common recovery solutions:

But these solutions have their own issues. For me, the problem with the first option is that I don’t have a dual-boot setup. I’ve been using a single-boot Linux desktop environment for a long time, and I just fire up a virtual machine when I have to use Windows. I find this setup simpler and more convenient, and it avoids potential issues caused by two systems interfering with each other.

LiveUSB is not bad, but it has several major drawbacks:

I also tried using Cubic to create a custom ISO and write it to a USB drive as a recovery system. While this partially solved the convenience and compatibility issues, having to go through its entire process and write to the USB drive every time I wanted to make a change was very cumbersome. However, its approach of creating a workspace and leaving the filesystem there was a great inspiration.

After consulting many resources, I decided to build a recovery system for my own computer from scratch, based on Ubuntu.

Goals and Approach

Here are the goals I envisioned for this recovery system:

Based on these requirements and the available tools, my general plan is:

Workspace and Scripts

First, I created a new Btrfs subvolume /recovery to serve as the workspace:

sudo btrfs subvolume create /recovery

The location is arbitrary. Since most operations require root privileges, placing it outside the user’s home directory is more convenient. The benefit of creating a separate subvolume is that after each adjustment, I can create a new snapshot to save the current state. If a subsequent change breaks it, I can always revert to a good version using a snapshot, which is like having filesystem-level version control.

In this workspace, I placed three folders:

Base Filesystem

Next, we need to initialize the filesystem content. You can download the daily build of Ubuntu Base from the official image site. For example, if I want my recovery system to be based on Ubuntu 24.04 LTS, I would download the corresponding architecture’s compressed file from the Ubuntu Base 24.04 (Noble Numbat) Daily Build page, such as noble-base-amd64.tar.gz, to use as my base filesystem. Enter the fs directory and extract its contents there:

cd /recovery/fs
sudo tar xvf path/to/noble-base-amd64.tar.gz

The advantage of using a daily build is that the packages inside are the latest for that major version, so I don’t have to upgrade them again after extraction.

chroot Script

Ubuntu Base only includes the most essential system components. We obviously need to install more packages and make many adjustments later. For this, we need a script to “enter” the recovery system’s filesystem to modify it. I wrote a chroot script start-chroot.sh for this purpose:

#!/bin/sh

set -fv

# Make sure to run as root
if [ "$(id -u)" != "0" ]; then
    echo "Require root!"
    exit 1
fi

cd /recovery

# Use the host system's DNS config to ensure network availability
mv fs/etc/resolv.conf fs/etc/resolv.conf.1
cp /etc/resolv.conf fs/etc/resolv.conf

# Mount necessary system mount points
mount --bind /dev fs/dev
mount none -t proc fs/proc
mount none -t sysfs fs/sys
mount none -t devpts fs/dev/pts

# Enter chroot
chroot fs

# Clean up temporary files
rm -f fs/var/lib/dbus/machine-id
rm -f fs/root/.bash_history
rm -rf fs/tmp/*

# Unmount system mount points
umount fs/dev/pts
umount fs/dev
umount fs/proc
umount fs/sys

# Restore DNS config
mv fs/etc/resolv.conf.1 fs/etc/resolv.conf

With this script, I can enter the filesystem that the recovery system will use at any time to modify its contents.

The script modifies resolv.conf. This file is actually a symbolic link managed by systemd, and we don’t want to change this structure in the recovery system. However, when we chroot into this filesystem, the systemd within it isn’t running, so resolv.conf is empty. This would cause programs that require network access to fail due to a lack of DNS configuration. Therefore, we need to temporarily replace it with the host system’s file.

Build Script

The next script we need is one to package the filesystem into a read-only format and build the necessary EFI executable to boot the recovery system. Here is my build-image.sh script:

#!/bin/sh

set -efv

# Make sure to run as root
if [ "$(id -u)" != "0" ]; then
    echo "Require root!"
    exit 1
fi

cd /recovery

# Remove the old image
if [ -L image ]; then
  old_image=$(readlink image)
  if [ -d "$old_image" ]; then
    rm -rf "$old_image"
  fi
  rm image
fi

# Create a temporary directory to store build results
tmp_dir=$(mktemp -d)
ln -s "$tmp_dir" image
mkdir image/casper

# Package the filesystem
mksquashfs fs image/casper/filesystem.squashfs -comp zstd

# Build the kernel image
ukify build \
  --linux=fs/boot/vmlinuz \
  --initrd=fs/boot/initrd.img \
  --cmdline="boot=casper noprompt libata.allow_tpm=1" \
  --output=image/recovery.efi

Here, the build results are placed in a temporary folder, which is then symlinked to the image subfolder. This isn’t strictly necessary. I chose to do it this way because:

The image file is a large, compressed binary blob, and there’s no real benefit to having it tracked by the filesystem. Besides, this script allows me to quickly build a new image from fs at any time1, so there’s no need to keep the old results.

Here, ukify is used to build the unified kernel image with a few simple kernel parameters:

Snapshots

As mentioned earlier, placing the workspace in a separate Btrfs subvolume makes it easy to create snapshots. I use snapper to manage snapshots. To do this, I first created a configuration file at /etc/snapper/configs/recovery:

SUBVOLUME="/recovery"
FSTYPE="btrfs"
QGROUP=""
NUMBER_CLEANUP="yes"
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="5"
NUMBER_LIMIT_IMPORTANT="1"

Then, I can create an initial snapshot:

sudo snapper -c recovery create

In the future, I can use the same command to create more snapshots.

Configuring the System

After creating the workspace and placing the corresponding scripts, we need to configure the recovery system to make it truly usable.

Unless otherwise specified, the commands in this section are executed inside the chroot environment created by the start-chroot.sh script.

Install Base Packages

First, we need to install the base system:

apt update
apt install linux-generic
apt install --no-install-recommends ubuntu-minimal
apt install casper discover laptop-detect os-prober
apt install ubuntu-desktop

Casper performs an MD5 check on boot by default, which isn’t very useful for us, so we’ll disable it:

systemctl disable casper-md5check.service

Next, we can clean up some packages that are unlikely to be needed in a recovery system:

apt autoremove --purge \
    snapd \
    rhythmbox \
    libreoffice-common \
    totem \
    gnome-calendar \
    gnome-clocks \
    gnome-characters \
    gnome-startup-applications \
    gnome-online-accounts \
    transmission-gtk \
    cloud-init \
    unattended-upgrades \
    firefox \
    thunderbird \
    ubuntu-docs \
    ubuntu-report

This is the list of packages I removed, mainly things I don’t think are necessary in a recovery system, like media players, office document editors, and system upgrade tools. There are probably more packages that could be uninstalled without affecting its function as a recovery system, but I haven’t looked into it too closely. I also uninstalled snap because the only snap-based software I need is Firefox, and I prefer the version from Mozilla’s official repository. The next section will cover this in more detail.

Additionally, you can install any tools you need, such as editors, common command-line utilities, and maintenance tools:

apt install vim-gtk3 ripgrep curl bash-completion
apt install gparted mtools dmraid \
    efibootmgr btrfs-progs \
    nvme-cli smartmontools \
    cryptsetup lvm2 \
    snapper-gui \
    systemd-ukify systemd-boot-efi

The specific tools, of course, can be adjusted based on personal habits and actual needs. For example, if you don’t have software-encrypted partitions, you might not need cryptsetup. If you’re used to Emacs, you might not need VIM, and so on.

If there are other tools not available in the APT repositories, you can also copy them into fs yourself. For instance, I need sedutil, so I downloaded it from its releases page, copied the sedutil-cli binary to /usr/sbin, and added execute permissions.

Install Firefox

Although I removed Firefox in the previous step, a browser is still needed in a recovery system. In this day and age, even if your computer is down, you can look up information on your phone, but it’s much more convenient to be able to do it directly on the computer.

Here, I’m following Mozilla’s official documentation, Install Firefox on Linux, to install it:

# Install the signing key
install -d -m 0755 /etc/apt/keyrings
wget -q https://packages.mozilla.org/apt/repo-signing-key.gpg -O- | \
	tee /etc/apt/keyrings/packages.mozilla.org.asc > /dev/null
# Add the APT repository
cat <<EOF | tee /etc/apt/sources.list.d/mozilla.sources
Types: deb
URIs: https://packages.mozilla.org/apt
Suites: mozilla
Components: main
Signed-By: /etc/apt/keyrings/packages.mozilla.org.asc
EOF
# Configure priority
cat <<EOF | tee /etc/apt/preferences.d/mozilla
Package: *
Pin: origin packages.mozilla.org
Pin-Priority: 1000
EOF
# Install the package
apt update
apt install firefox-esr

Since the recovery system isn’t updated frequently, I chose the ESR version, which has a longer and more stable support cycle, but it’s not a critical decision.

Firefox has many features that are unnecessary for a recovery system, such as automatic update checks, Firefox Accounts, profile importing, etc. For the new user created by Casper, it also shows many tips that are irrelevant in a recovery context. We can disable all of them using a Firefox policy file. I added files/firefox-policies.json in my workspace:

{
  "policies": {
    "DisableAppUpdate": true,
    "DisableFirefoxAccounts": true,
    "DisableFirefoxStudies": true,
    "DisablePocket": true,
    "DisableProfileImport": true,
    "DisableProfileRefresh": true,
    "DisableSystemAddonUpdate": true,
    "DontCheckDefaultBrowser": true,
    "DisplayBookmarksToolbar": "never",
    "NoDefaultBookmarks": true,
    "OfferToSaveLogins": false,
    "OverrideFirstRunPage": "",
    "UserMessaging": {
      "ExtensionRecommendations": false,
      "FeatureRecommendations": false,
      "SkipOnboarding": true,
      "MoreFromMozilla": false,
      "FirefoxLabs": false
    }
  }
}

You can refer to the policy file documentation for specific options; there are many policies to choose from.

To apply this policy file to Firefox in the recovery system, we need to execute the following command outside the chroot:

mkdir -p fs/etc/firefox/policies
ln files/firefox-policies.json \
	fs/etc/firefox/policies/policies.json

This links the files in the workspace with the corresponding file in the filesystem, allowing direct modification from outside. When the recovery system needs to be rebuilt from scratch, these configuration files can also be reused directly.

Configure WiFi

Having network access in the recovery system is very important, not only for looking up information but also for downloading and installing tools that weren’t included, if needed. If you connect to the internet via WiFi, you might need to configure the WiFi password in the recovery system as well:

nmcli --offline connection add type wifi \               
  autoconnect yes \
  ssid "<SSID>" \
  wifi-sec.key-mgmt wpa-psk \
  wifi-sec.psk "<password>" \
  > /etc/NetworkManager/system-connections/wifi.nmconnection
chmod 0600 /etc/NetworkManager/system-connections/wifi.nmconnection

Just replace <SSID> and <password> with your actual WiFi configuration. This way, the recovery system will automatically connect to the specified WiFi upon startup, requiring no extra steps.

Configure Display

Sometimes the system doesn’t correctly detect the display configuration, especially when a monitor is rotated or if you want to use a different scaling factor.

In this case, you can copy ~/.config/monitors.xml to fs/root/monitors.xml and then create a Casper initramfs script at fs/usr/share/initramfs-tools/scripts/casper-bottom/99gnome_monitors:

#!/bin/sh

PREREQ=""

prereqs()
{
    echo "$PREREQ"
}

case $1 in
# get pre-requisites
prereqs)
    prereqs
    exit 0
    ;;
esac

chroot /root install \
    -o $USERNAME -g $USERNAME \
    /root/monitors.xml \
    /home/$USERNAME/.config/monitors.xml

Then execute:

chmod +x /usr/share/initramfs-tools/scripts/casper-bottom/99gnome_monitors
update-initramfs -u

This will ensure the display configuration file is automatically installed into the temporary user’s .config directory at boot, making sure the display settings are correct.

Override GNOME Desktop Settings

The default applications shown on the Ubuntu dock might not be what we need, but this can be changed.

Similar to the Firefox policy file, first create a configuration file files/gnome-settings.schema.override:

[org.gnome.shell]
favorite-apps = [ 'org.gnome.Nautilus.desktop', 'firefox-esr.desktop', 'org.gnome.Terminal.desktop', 'gparted.desktop' ]

[org.gnome.shell:ubuntu]
favorite-apps = [ 'org.gnome.Nautilus.desktop', 'firefox-esr.desktop', 'org.gnome.Terminal.desktop', 'gparted.desktop' ]

Then, execute the following outside the chroot:

ln files/gnome-settings.schema.override \
	fs/usr/share/glib-2.0/schemas/99_settings.gschema.override

to link this configuration file into the filesystem. Finally, execute this inside the chroot:

cd /usr/share/glib-2.0/schemas
rm gschemas.compiled
glib-compile-schemas .

to ensure the information in GNOME’s database is correct.

Here, I’ve placed the file browser, browser, terminal, and GParted. The specific arrangement can be adjusted according to personal needs. As for why it needs to be written twice, once with :ubuntu, I’m not entirely sure. But you can check if the data is correct with this command:

gsettings --schemadir . get org.gnome.shell favorite-apps

Build and Deploy

After configuring the system, you can use the build script mentioned earlier to build the recovery system:

sudo ./scripts/build-image.sh

Once the packaging is complete, you will find two files in the workspace’s image folder: recovery.efi and casper/filesystem.squashfs. The next step is to make this file accessible to the EFI.

I personally copy them directly into the EFI System Partition2, but the common recommendation seems to be that this partition only needs to be around 200MB, whereas an Ubuntu-based recovery system can easily be over a gigabyte in size. If your EFI System Partition doesn’t have enough space, you could consider creating another FAT32 partition to store them. It’s important to note that filesystem.squashfs must be placed in a casper subdirectory in the partition’s root, but recovery.efi can be placed anywhere.

Finally, you just need to make the EFI aware of the recovery system. Assuming recovery.efi is in the root directory of the partition /dev/nvmeXnYpZ, you can execute:

sudo efibootmgr --create \
	--disk /dev/nvmeXnY --part Z \
	--label "Ubuntu Recovery" \
	--loader recovery.efi

After that, you can use efibootmgr to see the boot number for this recovery system and then use:

sudo efibootmgr --bootnext XXXX

to boot into the recovery system on the next startup. Additionally, some BIOS interfaces support selecting and booting EFI entries directly.

After rebooting and testing, if everything is fine, you can use snapper to create a new snapshot of the workspace’s current state to save your work.

References

Footnotes

  1. On my machine, a build takes only about half a minute, which is a worthwhile trade-off of time for space.

  2. I specifically reserved 10GB for this partition the last time I replaced my hard drive.