Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-containers: allow mount propagations from privileged containers #1601

Merged
merged 1 commit into from
Jun 16, 2021

Conversation

arnaldo2792
Copy link
Contributor

@arnaldo2792 arnaldo2792 commented Jun 3, 2021

Issue number:
#1209

Description of changes:

e8936fc5 host-containers: allow mount propagations from privileged containers

This commit adds support to propagate mount points created in bootstrap and superpowered containers, across mount peer groups.

The root filesystem of bootstrap and superpowered containers is setup with the rshared configuration to allow mounts propagations across peer groups. All mount points attached to the containers are configured as rprivate (except for the mnt mount). This prevents bootstrap and superpowered containers from remounting directories in the host's root filesystem.

The /.bottlerocket/rootfs/mnt mount point was added to bootstrap and superpowered containers. This mount point is a bind mount that points to /mnt in the host, which itself is a bind mount of /local/mnt. This is required to let users create mount points underneath /mnt. This mount point is setup with the rshared configuration to allow propagations across peer groups. This is the only mount point from which propagations are allowed across peer groups.

With this change, bootstrap containers now have access to all the devices in the host. Also, they now have the CAP_SYS_ADMIN capability to let users manage ephemeral disks. The logic to build the container specs was refactored to provide a better understanding of what options are set for the containers' spec.

Testing done:

  • Admin and control containers are still working as expected and can make calls using the apiclient
  • Non-superpowered containers don't have access to /dev, /.bottlerocket/rootfs/mnt or the host's root filesystem
  • I launched a c5d.2xlarge instance, which has one ephemeral disk attached to it. I created a bootstrap container with the following settings:
[settings.bootstrap-containers.ephemeral]
mode = "once"
source = "<SOURCE>"
essential = false

Where setup-ephemeral-disk is defined as

FROM alpine
RUN apk add e2fsprogs bash parted
ADD script ./
RUN chmod +x ./script
ENTRYPOINT ["sh", "script"]

And with script as:

#!/usr/bin/env bash
set -ex

DISK=/.bottlerocket/rootfs/dev/nvme2n1
PERSISTENT_DIR=/.bottlerocket/bootstrap-containers/current
PARTITIONS_CREATED=/.bottlerocket/bootstrap-containers/current/created
BASE_MOUNT_POINT=/.bottlerocket/rootfs/mnt

if [ ! -f $PARTITIONS_CREATED ]; then
  parted -s $DISK mklabel gpt 1>/dev/null
  parted -s $DISK mkpart primary ext4 0% 50% 1>/dev/null
  parted -s $DISK mkpart primary ext4 50% 100% 1>/dev/null
  mkfs.ext4 -F ${DISK}p1
  mkfs.ext4 -F ${DISK}p2
  touch $PARTITIONS_CREATED
fi

mkdir -p $BASE_MOUNT_POINT/part1
mkdir -p $BASE_MOUNT_POINT/part2

mount ${DISK}p1 $BASE_MOUNT_POINT/part1
mount ${DISK}p2 $BASE_MOUNT_POINT/part2

I confirmed that the partitions were mounted and propagated:

# From the admin container
[ec2-user@ip-172-31-5-108 ~]$ lsblk
# ...
├─nvme2n1p1  259:14   0  93.1G  0 part /.bottlerocket/rootfs/mnt/part1
└─nvme2n1p2  259:15   0  93.1G  0 part /.bottlerocket/rootfs/mnt/part2

# From the host
bash-5.0# lsblk
# ...
|-nvme2n1p1  259:14   0  93.1G  0 part /local/mnt/part1
|-nvme2n1p2  259:15   0  93.1G  0 part /local/mnt/part2

From the admin container I ran:

[ec2-user@ip-172-31-12-31 ~]$ sudo su
bash-4.2# umount /.bottlerocket/rootfs/mnt/part1
bash-4.2# mount /dev/nvme2n1p1 /.bottlerocket/rootfs/etc/
bash-4.2# ls /.bottlerocket/rootfs/etc/
lost+found

And confirmed that the mount point didn't propagate:

# From the host
bash-5.0# lsblk
# ...
nvme2n1      259:0    0 186.3G  0 disk
|-nvme2n1p1  259:14   0  93.1G  0 part
|-nvme2n1p2  259:15   0  93.1G  0 part /local/mnt/part2

bash-5.0# ls /etc/
bootstrap-containers  ...

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
packages/release/mnt.mount Show resolved Hide resolved
sources/host-ctr/cmd/host-ctr/main.go Outdated Show resolved Hide resolved
sources/host-ctr/cmd/host-ctr/main.go Outdated Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

Forced push includes:

  • Fixes in main README
  • Refactor how container specs are created as suggested in feedback
  • Mount /.bottlerocket/rootfs/mnt instead of /mnt inside privileged containers

@arnaldo2792
Copy link
Contributor Author

Forced push to fix commit message

Copy link
Contributor

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reorganized code is significantly easier to read; thanks for doing that.

README.md Outdated
@@ -493,7 +493,10 @@ Bootstrap containers are host containers that can be used to "bootstrap" the hos

Bootstrap containers are very similar to normal host containers; they come with persistent storage and with optional user data.
Unlike normal host containers, bootstrap containers can't be treated as `superpowered` containers.
However, these containers have access to the underlying root filesystem on `/.bottlerocket/rootfs`.
However, bootstrap containers do have additional permissions that normal host containers do not have.
Bootstrap containers have access to the underlying root filesystem on `/.bottlerocket/rootfs` as well as to all the devices in the host, and they are setup with the `CAP_SYS_ADMIN` capability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Bootstrap containers have access to the underlying root filesystem on `/.bottlerocket/rootfs` as well as to all the devices in the host, and they are setup with the `CAP_SYS_ADMIN` capability.
Bootstrap containers have access to the underlying root filesystem on `/.bottlerocket/rootfs` as well as to all the devices in the host, and they are set up with the `CAP_SYS_ADMIN` capability.

README.md Outdated
##### Mount propagations in bootstrap and superpowered containers
Both bootstrap and superpowered host containers are configured with the `/.bottlerocket/rootfs/mnt` bind mount that points to `/mnt` in the host, which itself is a bind mount of `/local/mnt`.
This bind mount is set up with shared propagations, so any new mount point created underneath `/.bottlerocket/rootfs/mnt` in any bootstrap or superpowered host container will propagate across mount namespaces.
You can use this feature to configure ephemeral disks attached to your hosts, that you may want to use on your workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
You can use this feature to configure ephemeral disks attached to your hosts, that you may want to use on your workloads.
You can use this feature to configure ephemeral disks attached to your hosts that you may want to use on your workloads.

sources/host-ctr/cmd/host-ctr/main.go Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

Forced push includes fixes for feedback in the main README

@arnaldo2792 arnaldo2792 requested a review from samuelkarp June 5, 2021 00:45
@jhaynes jhaynes requested a review from tjkirch June 7, 2021 21:18
@arnaldo2792
Copy link
Contributor Author

Forced push includes:

  • All mounts, except for /.bottlerocket/rootfs/mnt, are rprivate to prevent propagations from them
  • Removed /dev from mounts since the host container already has access to it from /.bottlerocket/rootfs/dev

Comment on lines 763 to 775
var hasPropagation = false
// Propagations can be shared, rshared, private, rprivate, slave, rslave
re := regexp.MustCompile(`r?(shared|private|slave)`)

for _, option := range mount.Options {
hasPropagation = re.FindString(option) != ""

if hasPropagation {
break
}
}

return hasPropagation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid the use of a regular expression here. There are only six values we need to test for.

Suggested change
var hasPropagation = false
// Propagations can be shared, rshared, private, rprivate, slave, rslave
re := regexp.MustCompile(`r?(shared|private|slave)`)
for _, option := range mount.Options {
hasPropagation = re.FindString(option) != ""
if hasPropagation {
break
}
}
return hasPropagation
// Propagations can be shared, rshared, private, rprivate, slave, rslave
for _, option := range mount.Options {
switch option {
case "shared", "rshared", "private", "rprivate", "slave", "rslave":
return true
}
}
return false

@arnaldo2792
Copy link
Contributor Author

Forced push includes refactor in hasPropagations method as suggested by @samuelkarp

@arnaldo2792
Copy link
Contributor Author

Forced push due to conflicts in rebase

This commit adds support to propagate mount points created in
bootstrap and superpowered containers, across mount peer groups.

The root filesystem of bootstrap and superpowered containers is setup
with the `rshared` configuration to allow mounts propagations across
peer groups. All mount points attached to the containers are configured
as `rprivate` (except for the `mnt` mount). This prevents bootstrap and
superpowered containers from remounting directories in the host's root
filesystem.

The `/.bottlerocket/rootfs/mnt` mount point was added to bootstrap and
superpowered containers. This mount point is a bind mount that points to
`/mnt` in the host, which itself is a bind mount of `/local/mnt`. This
is required to let users create mount points underneath `/mnt`. This
mount point is setup with the `rshared` configuration to allow
propagations across peer groups. This is the only mount point from which
propagations are allowed across peer groups.

With this change, bootstrap containers now have access to all the
devices in the host. Also, they now have the `CAP_SYS_ADMIN` capability
to let users manage ephemeral disks. The logic to build the container
specs was refactored to provide a better understanding of what options
are set for the containers' spec.

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Contributor Author

Forced push includes file missing in previous commit to fix rebase conflicts

@arnaldo2792 arnaldo2792 merged commit 357c2a8 into bottlerocket-os:develop Jun 16, 2021
@arnaldo2792 arnaldo2792 deleted the ephimeral-disks branch June 16, 2021 22:48
@mjgp2 mjgp2 mentioned this pull request Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants