This repository contains my experiments and instructions on setting up fully diskless Linux VMs that make use of a central server for their boot-up and file storage
Note: This setup was built for use in a virtualization environment where each VM runs on the same machine and can make use of on-host routing for faster network transfers. However, if your physical LAN is fast enough, this setup should still provide acceptable performance for physical machines connecting to a LAN server
For a quick summary, each VM in this setup uses iPXE with HTTP for loading the kernel image rather than PXE with TFTP. The iPXE boot image used is embedded with a script that delegates to a script loaded over HTTP. The HTTP server identifies the client making the request and provides the kernel and initramfs, along with client-specific kernel boot parameters
I maintain VMs used for dev environments on a Proxmox VE host, where each VM has essentially the same OS and software installed. Previously each VM maintained their own virtual HDD, containing the root FS and all user files. Keeping each system up to date manually would have been a chore, and backups of all systems would unnecessarily include their root FS too
Tools do exist for easing some of these issues (for example Ansible, to run a sequence of actions on multiple machines at once), however that would continue to necessitate a base image, with the continued risk of diverging system images
- Require no virtual disks allocated to each VM which cannot be shared
- Avoid putting significant memory overhead on each VM (ex. by having the system image in a tmpfs root)
- Maintain a single location for the common system image and individual home directories
- Require very little configuration on each individual VM
- Don't require any special configuration from the network the VM host is connected to
- Stick to using packages provided by each distro, which helps maintainability
In this setup I use Proxmox VE 7 as my virtualization host, and each client VM will be running Ubuntu 22.04. I've used Fedora 34 Server Edition for running all server-side components of this project. However, these instructions should still be usable for alternatives
Since each client VM will make use of the central server for their root FS, we need a networked file system that's natively supported by the kernel. Linux provides an in-kernel NFS client, and NFS maintains POSIX permissions (except ACLs), which fit the bill perfectly
Linux also provides an NFS server module, but making use of it in LXC (for running it in a container) was unwieldy and I didn't want to install anything on Proxmox itself, meaning the netboot server had to run in a VM. It's likely possible to run a userspace NFS server (like NFS-Ganesha) instead, however I couldn't get it working reliably
At this point, the server exports the following path, with exportfs -arv
used for re-exporting in case of changes:
# Client Root FS
/srv/client_root IP-RANGE/CIDR(ro,no_root_squash)
# ...
I opted to prepare an install of Ubuntu onto a virtual HDD using the official LiveCD. An alternative was to use debootstrap
directly on the server NFS export to prepare an Ubuntu root FS from scratch, however that was finicky to perform. Once the basic install was done (while choosing LVM for easier labels and resizes), I moved the disk over to the Server VM and added the following fstab entry:
/dev/mapper/vgubuntu-root /srv/client_root ext4 defaults 0 0
This has the added advantage of not burdening the Server's root FS. A similar mount could be made for the home directories too
A pleasant surprise from Ubuntu was ready-made support for NFS root FS in its initramfs-tools
. A guide from Ubuntu's community help pages provided the following steps:
- Chroot into the Ubuntu rootfs on the Server (I've provided a simple script for this under
server/chroot/client-root-chroot
which will be useful for future maintenance)for dir in sys dev proc; do mount --rbind /$dir /srv/client_root/$dir && mount --make-rslave /srv/client_root/$dir; done chroot /srv/client_root
- Open
/etc/initramfs-tools/initramfs.conf
on the - Set
MODULES=netboot
to prepare a netboot-capable initramfs - Set
BOOT=nfs
to enable NFS boot - Set
NFSROOT=<ip address of server>:/srv/client_root
since we cannot adjust the DHCP server - Run
update-initramfs -uk all
to regenerate/boot/initrd.img
The client root FS needed updates to make sure it would not attempt to mount the non-existant boot disk on each Client VM, so I removed entries for the root FS and swap partition from the client's /etc/fstab
Since we cannot make use of client VM disks, we cannot keep a bootloader there. Net-booting via PXE is an option, but since it requires the use of a custom DHCP config/server, I deemed it was unusable for our purposes. For situations where they are feasible, they can be used instead
Instead, I came across the iPXE project, which:
- Can be booted from an ISO (CD images can be shared across VMs in Proxmox)
- Implements UEFI HTTP boot, and can load and execute scripts from an HTTP server
- Can have an embedded script that runs on boot
iPXE can also be built to support HTTPS, adding a layer of security to the boot process
I prepared the following embedded script in a custom iPXE ISO (this is handled automatically with the Makefile in this repository):
#!ipxe
# Get an IP address from DHCP
dhcp
# Continue executing a script loaded from the boot server
chain http://server-ip:port/${net0/mac}/boot.ipxe
iPXE is capable of passing the MAC address of the VM when requesting the chain script, which the server can to send user-specific boot parameters. The returned script is as follows:
#!ipxe
# Load the kernel image
kernel /boot/vmlinuz initrd=initrd.img
# Load the init ramdisk
initrd /boot/initrd.img
# Boot into the kernel
boot
Any HTTP server is usable, static or dynamic, depending on your requirements, though this repo provides a basic NodeJS-based server that also supplies user-specific boot parameters. This gets the VM started into loading Ubuntu, but fails before loading the desktop due to a lack of write access to the root FS
One option was to add fstab entries to the client FS that mounted tmpfs to certain folders under /var, but this was too fragile. However, there is a convenient package in Ubuntu's package repos called overlayroot
. This prepares a script in initramfs that:
- Moves the currently mounted root FS (in our case, provided via NFS) into
/media/root-ro
- Mounts a
tmpfs
on/media/root-rw
- Mounts
overlayfs
on/
with/media/root-ro
as the base and/media/root-rw
as the upper-layer
As a bonus, it can be configured from the kernel command-line. This provides a writeable overlay over a read-only base, allowing the rest of the system to boot normally. No changes are persisted across reboots however, due to the use of tmpfs
. This was a reasonable trade-off since the user's home directory will be persisted later
To set it up:
- Enter the Ubuntu root FS on the Server using
chroot
as mentioned before - Edit
/etc/resolv.conf
to addnameserver 1.1.1.1
for DNS resolution. The above path is actually a symlink to a runtime-generated file, and when the system is booted, it will not use the created file here - Run
apt install overlayroot
- Add
overlayroot=tmpfs:recurse=0
to the kernel parameters provided by the HTTP boot server.recurse=0
ensures any further mounts (like for /home) made by us will be read-write
This allows boot-up to proceed. systemd
also prepares tmpfs mounts for us on certain folders later
Since all VMs get the same root FS, they will also get the same home directory mounted to all of them if specified in /etc/fstab
. This is useful for certain cases (ex. a common admin account), and can be set up by adding the following line to the client fstab:
<server-ip>:/srv/client-homes/<user> /home/<user> nfs defaults 0 0
For parameterized mounts however, a different approach is needed. Alongside defining systemd
services, you can also define .mount
units, which can mount a disk on boot-up in conjunction to fstab (systemd auto-generates .mount units from fstab on boot to provide a consistent system)
- First edit the kernel command-line in the HTTP boot server to add
systemd.setenv=DISKLESS_HOME_NAME=<user based on MAC address>
. This tells systemd to add the provided environment variable to systemd's environment, and allows our mount unit to remain the same with per-VM changes - Enter the Ubuntu chroot
- Create
/etc/systemd/system/home-user.mount
with the following contents:This restricts the changing home directory to have the same path[Unit] Description=Load home directory from NFS # Wait for the NFS client to be ready After=nfs-client.target [Install] # Make sure the system waits for this mount to be ready before allowing other users WantedBy=multi-user.target [Mount] What=<server-ip>:/srv/client_homes/$(DISKLESS_HOME_NAME) Where=/home/user Type=nfs Options=defaults
/home/user
on all VMs. As per systemd requirements, the filename for this mount unit must stay consistent with the mount path (home-user
for/home/user
) - Run
cd /etc/systemd/system/multi-user.wants && ln -s /etc/systemd/system/home-user.mount
. This will ensure the NFS export is mounted on boot, and is done manually becausesystemctl enable home-user.mount
does not work inside chroot - Create an NFS export for each home directory under
/srv/client_homes/<user>
, and ensure your HTTP server provides the correct user name for each Client VM MAC address
Rebooting the client VM should now provide a persistent home directory. This same procedure can be used to mount other such directories on a per-user basis
At this point, the client VM is fully booted and persisted, but network access is limited, and DNS queries fail to work
This occurs because:
- On boot, the kernel obtains an IP address using DHCP automatically. This is needed for the in-kernel NFS client to connect to the server
systemd-networkd
, in charge of initializing network devices in userspace, sees that the network adapter has an existing IP address and does not make any alterationssystemd-resolved
, in charge of DNS query resolution, asks the network manager for upstream DNS servers that it should make requests toNetworkManager
sees the existing IP address on the network adapter, and switches to connection mode Manual to preserve it. However, in this state it fails to set any DNS servers since DHCP no longer provides it
To fix this, I create the following file at /etc/systemd/resolved.conf.d/dns.conf
and assign a fixed set of DNS servers:
[Resolve]
DNS=<space separated sequence of DNS servers>
When multiple diskless VMs are run concurrently, NFSv4 file locking is broken even if each VM uses their own isolated share. This is noticeable via instances of log spam on the client/server about lost locks, as well as certain programs like GNOME Tracker repeatedly crashing
As per the Kernel NFS Docs, this is due to all VMs sharing the same NFSv4 client ID. This makes it appear to the NFS server as if each VM was constantly rebooting, and hence clearing any obtained locks. This can be fixed by providing a custom per-user Client ID via the kernel command line argument nfs.nfs4_unique_id
. The reference HTTP server in this repo uses the userID in this ID.
On boot, snap
attempts to start up, but gets stopped soon after. After looking at journalctl
, AppArmor is to blame. This is because snap needs read access to the root FS, which in our case makes a network request to our NFS server. AppArmor is configured by default to deny such requests however, hence snap is stopped. AppArmor can be configured to allow NFS access, but a temporary (not recommended) fix for this is to disable AppArmor by adding apparmor=0
to the kernel's command line from the HTTP boot server
To have a clean boot screen, just add quiet splash vt.handoff=7
to the kernel command line
At this point, I now have functional, fully-diskless, persistent VMs with a common FS and boot configuration. I've prepared this repo in hopes that it may be useful to others in the same situation, and as a reference guide for future use :)
snap
expects its corresponding daemon process snapd
to be running to install any applications. This is usually handled from systemd
, but systemd refuses to start any service from within chroot
. This prevents installation from either client-root-chroot
or overlayroot-chroot
. One option I've identified is to allow direct read-write access to a specific VM to have it boot into the full client system and install snaps from there. This can be achieved by removing overlayroot=...
and adding rw
to the kernel command line. Make sure the root FS is exported as read/write in NFS before doing so however
If you found this helpful, do consider supporting me on Ko-fi!