Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rootless workloads #2481

Merged
merged 5 commits into from
Oct 29, 2021
Merged

Implement rootless workloads #2481

merged 5 commits into from
Oct 29, 2021

Conversation

smlx
Copy link
Member

@smlx smlx commented Jan 25, 2021

Checklist

  • Affected Issues have been mentioned in the Closing issues section
  • Documentation has been written/updated. See Document new feature flags #2541.
  • PR title is ready for changelog and subsystem label(s) applied

Description

This PR implements rootless workloads on Lagoon, disabled by default, and controlled via feature flags as documented in #2541.

For Openshift 3.11 compatibility we run as a non-root uid, root gid, and non-root fsGroup. This also means that no changes are required in lagoon-images.

The changes made include:

  • add a non-root securityContext to all pods.
  • add an initContainer to the shared storage nginx pod. This will update the ownership of the shared storage volume to the correct permissions going forward. See this comment below for details.

This depends on uselagoon/lagoon-charts#235. Once that lands, we can remove 4a4fa5d from this PR.

Closing issues

Partially addresses https://github.com/amazeeio/lagoon/issues/2059.

@smlx smlx added the 0-kubernetes Vanilla kubernetes support label Jan 25, 2021
@smlx smlx marked this pull request as draft January 25, 2021 15:47
@smlx
Copy link
Member Author

smlx commented Jan 25, 2021

Not sure what milestone you want to stick this into @tobybellwood?

@tobybellwood tobybellwood added this to the v2.0.0 milestone Jan 25, 2021
@tobybellwood
Copy link
Member

I think it's the last component of the 2.0 release?

@Schnitzel
Copy link
Contributor

discussed in the AIO security meeting:

  • implementing this will have a big impact on all running workload today, therefore we don't think it will make it into 2.1 as we have to do a lot of testing
  • looking at openshift they leave the user running as group root, we should probably do the same, as in some cases the uid cannot be changed in the PVC during mounting. Therefore we don't want to add another group and leave it in root group.

@smlx
Copy link
Member Author

smlx commented Jan 27, 2021

Looking at this a bit more, I think we might be able to keep the group set to non-root, and add root as a supplementary group on openshift. I'll have to do some more research and testing.

@smlx smlx force-pushed the rootless branch 2 times, most recently from e5af09a to 55dee24 Compare February 4, 2021 14:37
@smlx smlx marked this pull request as ready for review February 4, 2021 15:41
@smlx smlx marked this pull request as draft February 5, 2021 07:24
@smlx
Copy link
Member Author

smlx commented Feb 5, 2021

Migrating permissions of RWX volumes from some common storage volume provisioners

Given the need to migrate existing workloads, I tried to understand how our current RWX storage provisioners matched permissions between the pod and the storage volume to enable access.

Here's what you get when mounting a PVC for a RWX volume with the new non-root pod securityContext from this PR:

fsGroup: 10001
runAsGroup: 0
runAsUser: 10000

nfs-provisioner

kind running locally with nfs-provisioner (now deprecated), default configuration, RWX volumes are given 777 permissions, with setgid.

$  kubectl exec -it nonroot-7f977db68d-hlx2j -- sh -c 'id && stat -c "%A %u %g" /storage'
uid=10000 gid=0(root) groups=10001
drwxrwsrwx 0 0

AWS efs-provisioner

AWS with efs-provisioner (also now deprecated!), and our current configuration, RWX volumes are assigned a unique group ID.
This is added as a supplementary group ID to the pod, and increments for a unique value on each provisioned PV. See here for details.

$ kubectl exec -it nonroot-5f5ff9fd8c-cql88 -- sh -c 'id && stat -c "%A %u %g" /storage'
uid=10000 gid=0(root) groups=10001,40003
drwxrws--x 0 40003

Azure file

AKS using the azure-file storageclass, RWX volumes are given 777 permissions and also the group ID given by fsGroup.

$ kubectl exec -it nonroot-5f5ff9fd8c-fgg5g -- sh -c 'id && stat -c "%A %u %g" /storage'
uid=10000 gid=0(root) groups=10001
drwxrwxrwx 0 10001

GKE nfs-client-provisioner

GKE using the nfs-client-provisioner, RWX volumes are simply given 777 permissions.

$ kubectl exec -it nonroot-5fc7554c49-kks8r -- sh -c 'id && stat -c "%A %u %g" /storage'
uid=10000 gid=0(root) groups=10001
drwxrwxrwx 0 0

Discussion

Amazingly, every RWX provisioner I checked uses a different mechanism to give the pod access to the storage.

Since existing workloads are running as root we have the problem that e.g. rsync has created files inside the storage volumes with permissions such as:

$ stat -c "%A %u %g(%G)" example0.jpg
-rw-rw-r-- 0 82(www-data)
$ stat -c "%A %u %g(%G)" example1.jpg
-rw-rw---- 0 82(www-data)

This is a problem, since example0.jpg will be read-only and example1.jpg not accessible at all to the new non-root pod securityContext.

So the solution I've come up with is to recursively set group ownership to match the mountpoint, and to give group access permissions. This is basically the same thing the fix-permissions script does.

This will:

  1. Match the permissions set by all the above StorageClass drivers on the root of the volume.
  2. Give the new non-root pod securityContext g+rw access to all files, and g+x on directories.

There is one circumstance I can see where this logic will break, and that is where the volume has been chgrp'd to something other than the StorageClass default. This is pretty unlikely though IMO and we can deal with it on a case-by-case basis.

The other problem with this approach is that the initContainer will run on every pod start, which is a lot of overhead for a process which only needs to run once. To mitigate this I'm creating a sentinel file inside the storage volume, which is checked before doing anything in the initContainer. This should mean that the permission change is only done approximately once.

set -e
SENTINEL="/storage/.lagoon-rootless-migration-complete"
if ! [ -f "$SENTINEL" ]; then
  find /storage -mindepth 1 -exec chgrp $(stat -c "%g" /storage) {} +
  find /storage -mindepth 1 -exec chmod g+rw {} +
  find /storage -mindepth 1 -type d -exec chmod g+x {} +
fi
touch "$SENTINEL"

I've manually tested this on EKS and GKE by creating files with broken permissions and letting the initContainer fix them.

The Azure file provisioner enforces permissions by ignoring all chown/chmod requests, so the initContainer isn't strictly required:

# mkdir -p /storage/blah; echo blah > /storage/blah/blah; chown -hR 0:82 /storage/blah; chmod -R o= /storage/blah; ls -lahR /storage
/storage:
total 4K     
drwxrwxrwx    2 root     10001          0 Feb  5 10:54 .
drwxr-xr-x    1 root     root        4.0K Feb  5 13:52 ..
drwxrwxrwx    2 root     10001          0 Feb  5 13:52 blah

/storage/blah:
total 1K     
drwxrwxrwx    2 root     10001          0 Feb  5 13:52 .
drwxrwxrwx    2 root     10001          0 Feb  5 10:54 ..
-rwxrwxrwx    1 root     10001          5 Feb  5 13:52 blah

Future work

Once all workloads are running with the new pod securityContext, the initContainer can be removed.

@smlx smlx marked this pull request as ready for review February 5, 2021 14:40
@smlx smlx force-pushed the rootless branch 2 times, most recently from ae10bc8 to 8d9f8ab Compare February 8, 2021 13:05
@smlx smlx requested a review from Schnitzel February 11, 2021 13:08
@smlx
Copy link
Member Author

smlx commented Feb 16, 2021

After some discussion, this needs a feature flag via environment variable for initial rollout.

@smlx smlx marked this pull request as ready for review March 12, 2021 07:10
@smlx
Copy link
Member Author

smlx commented Mar 12, 2021

As this feature can be controlled at the cluster, project, and environment level, I'm setting this as ready for review.

Automated testing for projects/environments to assess if they can run rootless with no changes will be left for a future PR.

@Schnitzel Schnitzel modified the milestones: v2.0.0, v2.1.0 May 16, 2021
@tobybellwood tobybellwood modified the milestones: v2.1.0, v2.x Oct 14, 2021
@smlx smlx force-pushed the rootless branch 3 times, most recently from e8fe650 to f35e775 Compare October 20, 2021 13:54
@Schnitzel
Copy link
Contributor

just for some reference for the future, here is the codechange since k8s 1.12 which changed the behavior to NOT change the persistent storage if it's an RWX mount:
https://github.com/kubernetes/kubernetes/pull/67280/files

@smlx smlx force-pushed the rootless branch 2 times, most recently from b490b3a to 597077b Compare October 28, 2021 08:50
@smlx
Copy link
Member Author

smlx commented Oct 28, 2021

@Schnitzel @tobybellwood I cleaned this up a lot and simplified it to reduce the changes and hopefully make the logic much clearer. Let me know what you think.

smlx added 4 commits October 28, 2021 20:57
The initContainer on the nginx-php-persistent deployment ensures RWX
storage has permissions compatible with the pod securityContext. This
will:

  1. Match the permissions set by all the above StorageClass drivers on
     the root of the volume.

  2. Give the new non-root pod `securityContext` `g+rw` access to all
     files, and `g+x` on directories.

When Pod Security Standards are enabled in Lagoon in future, this
initContainer will have to be disabled or removed, since it runs as
root.
* update comment to be more descriptive.
* only look for global scope Lagoon variables.
* update documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-kubernetes Vanilla kubernetes support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants