There once was a protocol called PXE,
Whose specification was overly tricksy.
A committee refined it
Into a big Turing tarpit,
And now you're using it to boot your PC.
Booting a Linux system over the network is quite tedious. You have to set up a TFTP server, configure your DHCP server to recognize PXE clients, and send them the right set of magical options to get them to boot, often fighting rubbish PXE ROM implementations.
Pixiecore aims to simplify this process, by packing the whole process into a single binary that can cooperate with your network's existing DHCP server.
Pixiecore can be used either as a simple "just boot into this OS image" tool, or as a building block of a machine management system with its API mode.
Run the pixiecore binary, passing it a kernel and initrd, and optionally some extra kernel commandline arguments.
Here's a couple of examples. If you feel like a screencast instead, there's a very short demo.
Tiny Core Linux is a positively tiny distro, clocking in at 24M in the configuration we'll be using (it can go lower than that). Let's set ourselves up such that any PXE booting machine on the network boots into a TinyCore ramdisk:
# Fetch the kernel and the 3 cpio files that form the filesystem.
wget http://tinycorelinux.net/6.x/x86/release/distribution_files/{vmlinuz64,modules64.gz,core.gz,rootfs.gz}
# In the real world, you would AUTHENTICATE YOUR DOWNLOADS here. TCL sadly
# only distributes images over HTTP, so it's anyone's guess what you
# just downloaded.
# Go!
pixiecore -kernel vmlinuz64 -initrd rootfs.gz,core.gz,modules64.gz
That's it. Any machine that tries to netboot on this network will now boot into a TinyCore ramdisk.
Notice that we passed multiple cpio archives to -initrd
. All
provided archives will be merged on boot to form the final
ramdisk. This is quite handy for things like providing OEM
configuration without having to respin the upstream initrd image.
Pixiecore was originally written as a component in an automated installation system for CoreOS on bare metal. For this example, let's set up a netboot for the alpha CoreOS release:
# Grab the PXE images and verify them
wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz
wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz
# In the real world, you would AUTHENTICATE YOUR DOWNLOADS
# here. CoreOS distributes image signatures, but that only really
# helps if you already know the right GPG key.
# Go!
pixiecore -kernel coreos_production_pxe.vmlinuz -initrd coreos_production_pxe_image.cpio.gz --cmdline coreos.autologin
Notice that we're passing an extra commandline argument to make CoreOS automatically log in once it's booted.
Think of Pixiecore in API mode as a "PXE to HTTP" translator. Whenever Pixiecore sees a machine trying to PXE boot, it will ask a remote HTTP API (which you implement) what to do. The API server can tell Pixiecore to ignore the machine, or tell it to boot into a given kernel/initrd/commandline.
Effectively, Pixiecore in API mode lets you pretend that your machines speak a simple JSON protocol when trying to netboot. This makes it far easier to play with netbooting in your own software.
To start Pixiecore in API mode, pass it the HTTP API endpoint through
the -api
flag. The endpoint you provide must implement the Pixiecore
boot API, as described in the API spec.
You can find a sample API server implementation in the example
subdirectory. The code is not production-grade, but gives a short
illustration of how the protocol works by reimplementing a subset of
Pixiecore's static mode as an API server.
Pixiecore is available as a Docker image called
danderson/pixiecore
. It's an automatic Docker Hub build that tracks
the repository.
Because Pixiecore needs to listen for DHCP traffic, it has to run with the host network stack.
sudo docker run -v .:/image --net=host danderson/pixiecore -kernel /image/coreos_production_pxe.vmlinuz -initrd /image/coreos_production_pxe_image.cpio.gz
Pixiecore implements four different, but related protocols in one binary, which together can take a PXE ROM from nothing to booting Linux. They are: ProxyDHCP, PXE, TFTP, and HTTP. Let's walk through the boot process for a PXE ROM.
The first thing a PXE ROM does is request a configuration through DHCP, waiting for a DHCP reply that includes PXE vendor options. The normal way of providing these options is to edit your DHCP server's configuration to provide them to clients that identify themselves as PXE clients. Unfortunately, reconfiguring your network's DHCP server is tedious at best, and impossible if you DHCP server is built into a consumer router, or managed by someone else.
Pixiecore instead uses a feature of the PXE specification called ProxyDHCP. As you might guess from the name, ProxyDHCP is not a proxy at all (yeah, the PXE spec is like that), but a second DHCP server that only provides PXE configuration.
When the PXE ROM sends out a DHCPDISCOVER
, it gets two replies back:
one containing network configuration from the primary DHCP server, and
one containing only PXE DHCP options from the ProxyDHCP server. The
PXE firmware combines the two, and continues as if the primary server
had provided all the configuration.
In theory, you'd expect the ProxyDHCP server to just provide a TFTP server IP and a filename to the PXE firmware, and it would proceed to download and boot that just like the BOOTP of old.
Sadly, the average quality of PXE ROM implementations is abysmal, and many of them fail to chainload correctly if you try to do this from a ProxyDHCP server.
So, instead, we make use of the spec's "PXE menu" functionality, which lets you tell the PXE firmware to display a boot menu. Just like everything else in PXE, this is quite brittle, so nobody actually uses it to display menus - instead, they just push a more fully featured bootloader over PXE, and let that bootloader do the fancy work.
However, PXE menus seem to work reliably when combined with ProxyDHCP... And the PXE configuration can provide a timeout after which the first menu entry is booted... And that timeout can be set to zero.
So, we can just provide a single-entry menu, with a zero timeout, and chainload that way! But wait, there's more terribleness. PXE menu entries don't just list a TFTP server and file to load, because that would be too simple. Instead, each menu entry maps to a "Boot Server Type", and yet another DHCP option maps that boot server type to a set of IP addresses.
Those IP addresses aren't TFTP servers, but PXE boot servers. PXE boot servers listen on port 4011. They use the DHCP packet format, but only as a way of conveying a DHCP option that says "please tell me how to boot the following Boot Server Type". It's quite possibly the least efficient protocol encoding ever devised.
At long last, when the PXE server receives that request, it can reply with a BOOTP-ish packet that specified next-server and a filename. And those are, at long last, TFTP.
After navigating the eldritch horror of PXE, TFTP is a breath of fresh air. It is indeed a trivial protocol for transferring files. I have found some PXE ROMs that manage to add unnecessary complexity even to that, but by and large, this step is straightforward.
However, TFTP is quite slow, because it doesn't support transfer windows (well, it does, but it's an extension defined in an RFC published in 2015, so guess how many PXE ROMs implement it...). As a result, you must pay one round-trip per ~1500 bytes transferred, and even on a gigabit network, that slows things down.
Given that some netboot images are quite large (CoreOS clocks in at almost 200MB), what we really want is to switch to a more efficient protocol. That's where PXELINUX comes in.
PXELINUX is a small bootloader that knows how to boot Linux kernels, and it comes in a variant that can speak HTTP. PXELINUX is 90kB, which even over TFTP is very fast to transfer.
Thus, Pixiecore uses TFTP only to transfer PXELINUX, and from there steers it to HTTP for the rest of the loading process.
We've finally crawled our way up to the late nineties - we can speak
HTTP! Pixiecore's HTTP server is wonderfully familiar and normal. It
just serves up a support file that PXELINUX needs (ldlinux.c32
), a
trivial PXELINUX configuration telling it to boot a Linux kernel, and
the user-provided kernel and initrd files.
PXELINUX grabs all of that, and finally, Linux boots.
This is what the whole boot process looks like on the wire.
- PXE ROM, a brittle firmware burned into the network card.
- DHCP server, a plain old DHCP server providing network configuration.
- Pixieboot, the Hero and server of ProxyDHCP, PXE, TFTP and HTTP.
- PXELINUX, an open source bootloader of the Syslinux project.
- PXE ROM starts, broadcasts
DHCPDISCOVER
. - DHCP server responds with a
DHCPOFFER
containing network configs. - Pixiecore's ProxyDHCP server responds with a
DHCPOFFER
containing a PXE boot menu. - PXE ROM does a
DHCPREQUEST
/DHCPACK
exchange with the DHCP server to get a network configuration. - PXE ROM processes the PXE boot menu, decides to boot menu entry 0.
- PXE ROM sends a
DHCPREQUEST
to Pixiecore's PXE server, asking for a boot file. - Pixiecore's PXE server responds with a
DHCPACK
listing a TFTP server, a boot filename, and a PXELINUX vendor option to make it use HTTP. - PXE ROM downloads PXELINUX from Pixiecore's TFTP server, and hands off to PXELINUX.
- PXELINUX fetches its configuration from Pixiecore's HTTP server.
- PXELINUX fetches a kernel and ramdisk from Pixiecore's HTTP server, and boots Linux.
Pixiecore aims to be compliant with the relevant specifications for TFTP, DHCP, and PXE. This section lists the places where Pixiecore deliberately deviates from the spec to support buggy clients.
Some PXE ROMs don't send DHCP option 97, "Client Machine Identifier (GUID)", in their DHCP and PXE requests. According to the PXE 2.1 specification and RFC 4578, this makes the requests non-compliant:
This option MUST be present in all DHCP and PXE packets sent by PXE-compliant clients and servers.
Pixiecore's behavior implements "SHOULD" instead of "MUST": if a client request has a GUID, Pixiecore's response will respond with a GUID. If the client request has no GUID, Pixiecore omits option 97 in its response.
You can use Vagrant to quickly setup a test environment:
(HOST)$ vagrant up --provider=libvirt pxeserver
(HOST)$ vagrant ssh pxeserver
(PXESERVER)$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz
(PXESERVER)$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz
(PXESERVER)$ pixiecore -debug -kernel coreos_production_pxe.vmlinuz -initrd coreos_production_pxe_image.cpio.gz --cmdline coreos.autologin
### In another terminal
(HOST)$ vagrant up --provider=libvirt pxeclient1