-
Notifications
You must be signed in to change notification settings - Fork 15
Live Migration
Checkpoint/Restore In Userspace
, or CRIU
, is an utility to checkpoint/restore a process tree. It is commonly used with lxd and docker to provide live snapshot/restore functionality, and sometimes a step forward to live migration of container at run time preserving all necessary status to a persistent storage.
CRIU run mostly in user space, but some features from the Linux are required to be fully functional:
- Linux >= 3.11, whereas >= 4.15 is recommended
- iproute2 >= 3.5.0 for dumping network namespaces
- ptrace must be allowed
The software is packaged in both Debian Sid and Ubuntu 18.04. With either of the two distributions we can install the utility with a single command: apt update && apt install criu
After the installation finished, check whether it works:
criu check
It should say Looks OK
when check pass, warnings are shown when there's something to mention.
Here we take subutai-nginx.service as an example.
- To get the process number to dump, it must be a process group leader:
root@debian:~# systemctl status subutai-nginx
● subutai-nginx.service - nginx instance for subutai
Loaded: loaded (/lib/systemd/system/subutai-nginx.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2018-06-02 03:53:20 CST; 37min ago
Docs: man:nginx(8)
Process: 8019 ExecReload=/usr/sbin/nginx -c /etc/subutai/nginx/nginx.conf -g daemon on; master_process on; -s reload (code=exited, status=0/SUCCESS)
Process: 822 ExecStart=/usr/sbin/nginx -c /etc/subutai/nginx/nginx.conf -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Process: 820 ExecStartPre=/usr/sbin/nginx -c /etc/subutai/nginx/nginx.conf -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Main PID: 824 (nginx)
Tasks: 3 (limit: 4915)
Memory: 13.0M
CPU: 144ms
CGroup: /system.slice/subutai-nginx.service
├─ 824 nginx: master process /usr/sbin/nginx -c /etc/subutai/nginx/nginx.conf -g daemon on; master_process on;
├─8132 nginx: worker process
└─8143 nginx: cache manager process
So we know the Main PID
is 824
, which will be out target to this dump and restore experiment.
- To pre-dump a process, for shorter freezing time later
mkdir -p /root/dump/nginx && criu pre-dump -t 824 -D /root/dump/nginx
, resulting folder has the following files:
root@debian:~# ls /root/dump/nginx/
irmap-cache pagemap-8143.img pagemap-shmem-15577.img pagemap-shmem-15581.img pages-2.img pages-4.img pages-6.img
pagemap-8132.img pagemap-824.img pagemap-shmem-15578.img pages-1.img pages-3.img pages-5.img stats-dump
- To actually check point the process:
criu dump -t 824 -D /root/dump/nginx/
, resulting folder has changed to this:
root@debian:~# ls /root/dump/nginx/
cgroup.img fdinfo-2.img fs-8132.img ids-8143.img mm-8132.img pagemap-8143.img pagemap-shmem-15581.img pages-4.img stats-dump
core-8132.img fdinfo-3.img fs-8143.img ids-824.img mm-8143.img pagemap-824.img pages-1.img pages-5.img
core-8143.img fdinfo-4.img fs-824.img inventory.img mm-824.img pagemap-shmem-15577.img pages-2.img pages-6.img
core-824.img files.img ids-8132.img irmap-cache pagemap-8132.img pagemap-shmem-15578.img pages-3.img pstree.img
And at this time systemctl status subutai-nginx
shows the service is in failed
status because the main process is killed.
- To create a new PID namespace, also mount namespace and mount /proc filesystem before the processes are run:
unshare -p -m --fork --mount-proc
- To restore the image and detach from the process after finished:
criu restore -d -D /root/dump/nginx/
- Verify that the process group has back and it's not started by systemd:
systemctl status subutai-nginx
is still infailed
status. - Verify the process group has all children ready:
ps aux | grep nginx
:
root@debian:~# ps aux | grep nginx
root 824 0.0 0.0 184448 1928 ? Ss 04:32 0:00 nginx: master process /usr/sbin/nginx -c /etc/subutai/nginx/nginx.conf -g daemon on; master_process on;
daemon 8132 0.0 0.1 184864 2348 ? S 04:32 0:00 nginx: worker process
daemon 8143 0.0 0.0 184648 2040 ? S 04:32 0:00 nginx: cache manager process
root 8226 0.0 0.0 12784 940 pts/0 S+ 04:32 0:00 grep nginx
Also there's a helper script called criu-ns
which can assist to restore in a pseudo-container
- Create a container by cloning
debian-stretch
template:subutai clone debian-stretch test
- Add
lxc.tty = 0
to its config:echo "lxc.tty = 0 >> /var/lib/lxc/test/config
- Start the container:
subutai start test
- Find out PID of the container:
root@debian:~# lxc-ls --active -f -F PID test
PID
20807
- Create the folder for dumping:
mkdir -p /root/dumps/test
- Find out the tty number using python:
root@debian:~# python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> st = os.stat("/proc/20807/root/dev/console")
>>> print "tty[%x:%x]" % (st.st_rdev, st.st_dev)
tty[8801:11]
- Find out the veth MAC address:
root@debian:~# grep lxc.network.veth.pair /var/lib/lxc/test/config | cut -f3 -d' = '
00163ec24665
- Dump the container:
/usr/sbin/criu dump --tcp-established --file-locks --link-remap --manage-cgroups=full \
--ext-mount-map auto --enable-external-sharing --enable-external-masters \
--enable-fs hugetlbfs --enable-fs tracefs \
-D /root/dumps/test -o /root/dumps/test/dump.log \
--cgroup-root name=systemd:/lxc/test \
--cgroup-root devices:/lxc/test \
--cgroup-root freezer:/lxc/test \
--cgroup-root cpu,cpuacct:/lxc/test \
--cgroup-root pids:/lxc/test \
--cgroup-root blkio:/lxc/test \
--cgroup-root cpuset:/lxc/test \
--cgroup-root net_cls,net_prio:/lxc/test \
--cgroup-root perf_event:/lxc/test \
--cgroup-root memory:/lxc/test \
--ext-mount-map /sys/fs/fuse/connections:sys/fs/fuse/connections \
--ext-mount-map /home:home \
--ext-mount-map /opt:opt \
--ext-mount-map /var:var \
-t 20807 \
--skip-in-flight \
--freeze-cgroup /sys/fs/cgroup/freezer///lxc/test \
--ext-mount-map /dev/console:console --external tty[8801:11] \
--force-irmap \
--leave-running
Here we don't need --leave-running
in real deployment, it can be dangerous because the running process may modify various system state. But agent will restart the container anyways when it's not aware of a stop request, so we stop the container manually in next step.
9. Make sure the container is stopped:
root@debian:~# subutai stop test
INFO[2018-06-04 18:53:39] test stopped
root@debian:~# subutai list -i
NAME STATE IP Interface
---- ----- -- ---------
management STOPPED eth0
test STOPPED eth0
- Restore the container:
/usr/sbin/criu restore --tcp-established --file-locks --link-remap --manage-cgroups=full \
--ext-mount-map auto --enable-external-sharing --enable-external-masters \
--enable-fs hugetlbfs --enable-fs tracefs \
-D /root/dumps/test -o /root/dumps/test/restore.log \
--cgroup-root name=systemd:/lxc/test \
--cgroup-root devices:/lxc/test \
--cgroup-root freezer:/lxc/test \
--cgroup-root cpu,cpuacct:/lxc/test \
--cgroup-root pids:/lxc/test \
--cgroup-root blkio:/lxc/test \
--cgroup-root cpuset:/lxc/test \
--cgroup-root net_cls,net_prio:/lxc/test \
--cgroup-root perf_event:/lxc/test \
--cgroup-root memory:/lxc/test \
--ext-mount-map sys/fs/fuse/connections:/sys/fs/fuse/connections \
--ext-mount-map home:/var/lib/lxc/test/home \
--ext-mount-map opt:/var/lib/lxc/test/opt \
--ext-mount-map var:/var/lib/lxc/test/var \
--root /usr/lib/x86_64-linux-gnu/lxc/rootfs \
--restore-detached --restore-sibling --inherit-fd fd[1]:tty[8801:11] \
--ext-mount-map console:/dev/pts/0 \
--external veth[eth0]:00163ec24665
Here we are using fd[1]
for convenience of demonstration, but creating a new fd and give it to criu will be better.
- Check the container's running state:
root@debian:~# subutai list -i
NAME STATE IP Interface
---- ----- -- ---------
management STOPPED eth0
test RUNNING 10.10.10.32 eth0