Nowadays light virtualization is a weapon used by many. Microservice based architecture is increasingly used and containers are the backbone of it.
Software like Docker allow us to manage containers in a simple way but: how are they made? What features are really necessary for a process to be a container manager? It's time to get your hands dirty and make our homemade container.
The questions we're gonna try to resolve are:
- How container engine really works?
- How are containers created by "facilitators" like Docker?
First of all we have to start with some tools that we will use. Below is a brief summary of the tools (present in Linux) that we will use to build a container:
Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. So, thanks by feature, we can limit what the "process can see". We will set up namespaces using Linux kernel syscalls. The namespaces man page tells us there are 3 system calls that make up the API:
pivot_root is a Linux API that changes the root mount in the mount namespace of the calling process.
Cgroups (control groups) is a Linux kernel feature that limits, accounts
for, and isolates the resource usage (CPU, memory, disk I/O, network, etc...)
of a collection of processes.
There are two different versions of cgroups:
- version 1
originally written by Paul Menage and Rohit Seth, are based on a set of hierarchies. Each of them is composed by a set of cgroups arranged in a tree. Each hierarchy has an instance of the cgroup virtual filesystem associated with it. Each hierarchy is a partition of all tasks in the system. - version 2
Based on a single process hierarchy where cgroups form a tree structure and every process in the system belongs to one and only one cgroup. All threads of a process belong to the same cgroup.
In order to compile Understanding-containers
is necessary to install the following libraries:
libcap-dev
seccomp-dev
iptables-dev
~$ sudo apt install libcap-dev seccomp-dev iptables-dev -y
To create your homemade container you will need to compile the source code in
the "src"
directory. I personally recommend using cmake
to do this. Move to the Understanding-containers
directory and then run the
following commands
~$ mkdir build && cd build
~$ cmake .. && make -j $(getconf _NPROCESSORS_ONLN)
Now in the build folder you'll have the executable ready to run! Here the help of the tool:
Usage: sudo ./MyDocker <options> <entrypoint>
<options> should be:
- a run all namespaces without the user namespace
- U run a user namespace using unprivileged container
- c cgrops used to limit resources.
This command must be chained with at least one of:
- M <memory_limit> [1-4294967296] default: 1073741824 (1GB)
- C <percentage_of_cpu_shares> [1-100] default: 25
- P <max_pids> [10-32768] default: 64
- I <io_weight> [10-1000] default: 10
Feel the thrill of your new container now by running. An example of a command can be:
~$ sudo ./MyDocker -aUc -C 50 -I 20 -P 333 /bin/bash
In this case the following cgroup resource limits are applied:
Resource | Applied value |
---|---|
memory_limit | 1GB |
cpu_shares | 50 |
max_pids | 333 |
io_weight | 20 |
Now you'll be running bash inside your container. You can, for example, control the processes that are active inside it and notice how these are different from those of the host machine.
When you want you can finish your container killing the process of his bash exit
The folders in this repository are:
βββ cmake
βββ root_fs
βββ src
βΒ βββ capabilities
βΒ βββ helpers
βΒ βββ namespaces
βΒ β βββ cgroup
βΒ β βββ mount
β β βββ network
βΒ β βββ user
| βββ seccomp
βββ tools
- root_fs [the root filesystem where your container will run]
- src [the source folder]
- capabilities [capabilities dropped for the new namespace]
- helpers [helpers files]
- namespaces [support for various namespaces]
- cgroup [control group support]
- mount [mount namespace reference folder]
- network [network namespace reference folder]
- user [user namespace reference folder]
- seccomp [seccomp configuration to block some syscalls]
- tools [tools folders]
Each folder (except root_fs) contains a more detailed instruction file called
README.md
the tools folder contains a binary file and scripts that will allow your
namespace to connect to the internet. to support this you need a CNI, one
of the scripts provided will allow you to use Polycube
to do this, so
you can also take advantage of ebpf technology!
If you want to better understand how namespaces work then you need to explore
the src
folder. Discover the README.md files inside the various folders,
they will guide you in understanding the realization of the containers
.
I suggest you also take a look at the source files because they are full of
useful comments that will help you understand how virtualization works.
Please report any issues, corrections or ideas on GitHub
- Creating Containers of Michael Crosby
- container of Hoanh An
- Containers From Scratch of Liz Rice
- nsroot & nsroot-paper of Inge Alexander Raknes, BjΓΈrn Fjukstad, Lars Ailo Bongo - UiT The Arctic University of Norway
- Container Specification - runc
- Linux Namespaces - GitHub repository of Ed king
- Docker and Go: why did we decide to write Docker in Go? of Google Developers Group meetup at Google West Campus 2 (Michael Crosby)
- USER_NAMESPACES of Linux Programmer's Manual
- What UID and GID are
- pivot_root new documentation of LWN.net
- Introducing Linux Network Namespaces (04-09-2013)
- Virtual Ethernet Device of Linux Programmer's Manual
- A deep dive into Linux namespaces
- netsetgo of Ed King
- ns-process of Ed King
- Linux containers in 500 lines of code of Lizzie Dixon