Many container runtime tools like systemd-nspawn
, docker
,
etc. focus on providing infrastructure for system administrators and
orchestration tools (e.g. Kubernetes) to run containers.
These tools are not suitable to give to unprivileged users, because it is trivial to turn such access into to a fully privileged root shell on the host.
There is an effort in the Linux kernel called user namespaces which attempts to allow unprivileged users to use container features. While significant progress has been made, there are still concerns about it.
Bubblewrap is a setuid implementation of a subset of user namespaces. (Emphasis on subset)
It inherits code from xdg-app helper which in turn distantly derives from linux-user-chroot.
The maintainers of this tool believe that it does not, even when used in combination with typical software installed on that distribution, allow privilege escalation. It may increase the ability of a logged in user to perform denial of service attacks, however.
In particular, bubblewrap uses PR_SET_NO_NEW_PRIVS
to turn off
setuid binaries.
This program can be shared by all container tools which perform non-root operation, such as:
We would also like to see this be available in Kubernetes/OpenShift clusters. Having the ability for unprivileged users to use container features would make it significantly easier to do interactive debugging scenarios and the like.
bubblewrap works by creating a new, completely empty, filesystem namespace where the root is on a tmpfs that is invisible from the host, and will be automatically cleaned up when the last process exists. You can then use commandline options to construct the root filesystem and process environment and command to run in the namespace.
A simple example is
bwrap --ro-bind / / bash
This will create a read-only bind mount of the host root at the sandbox root, and then start a bash.
Another simple example would be a read-write chroot operation:
bwrap --bind /some/chroot/dir / bash
A more complex example is to run a with a custom (readonly) /usr, but your own (tmpfs) data, running in a PID and network namespace:
bwrap --ro-bind /usr /usr \
--dir /tmp \
--proc /proc \
--dev /dev \
--ro-bind /etc/resolv.conf /etc/resolv.conf \
--symlink usr/lib /lib \
--symlink usr/lib64 /lib64 \
--symlink usr/bin /bin \
--symlink usr/sbin /sbin \
--chdir / \
--unshare-pid \
--unshare-net \
--dir /run/user/$(id -u) \
--setenv XDG_RUNTIME_DIR "/run/user/`id -u`" \
/bin/sh
The name bubblewrap was to convey that this tool runs as the parent of the application (so wraps it in some sense) and creates a protective layer (the sandbox) around it.
(Bubblewrap cat by dancing_stupidity)