[k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems #379

dweomer · 2020-09-16T19:30:37Z

Environmental Info:
K3s Version:

v1.18.x
v1.19.x

Node(s) CPU architecture, OS, and Version:

CentOS 7, any architecture

Cluster Configuration:

any

Describe the bug:
After upgrading k3s via rancher/k3s-upgrade the type portion of the context label for /usr/local/bin/k3s reverts to bin_t or something other than container_runtime_exec_t. When the k3s process is restarted by the supervisor process (typically systemd) it will then cascade via domain/file transitions incorrect labels that will cause new/recreated pods to fail to operate correctly.

Steps To Reproduce:

Installed K3s (single node example, but easier to see in multi-node when upgrade incurs eviction):
- curl -fsSL https://get.k3s.io | sh with the following environment already exported:
  - INSTALL_K3S_CHANNEL=stable
  - K3S_KUBECONFIG_MODE=0644
  - K3S_SELINUX=true
  - K3S_TOKEN=centos/7
Install SUC:
- curl -fsSL https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml | kubectl apply -f-
Install the example k3s-upgrade plans:
- curl -fsSL https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/examples/k3s-upgrade.yaml | kubectl apply -f-
Start the upgrade(s):
- kubectl label node --all k3s-upgrade=true

Expected behavior:
Post-reboot, new/recreated pods start correctly.

Actual behavior:
Post-reboot, new/created pods in CrashLoopBackOff. Pods that were in place prior to the upgrade remain unaffected because they were started with the correct labels/transitions from the previous k3s/containerd process.

Additional context / logs:

SUC fails to upgrade last set of worker nodes set in concurrency k3s-io/k3s#2248 (comment)

The text was updated successfully, but these errors were encountered:

dweomer · 2020-09-16T19:34:36Z

The best current workaround for systemd-based systems is to add a drop-in that performs restorecon on the binary just prior to start, e.g.

[Service]
ExecStartPre=-/sbin/restorecon /usr/local/bin/k3s

at /etc/systemd/system/k3s.service.d/restorecon.conf and /etc/systemd/system/k3s-agent.service.d/restorecon.conf for servers and agents respectively

dweomer · 2020-09-25T18:39:10Z

Based on our design discussion this morning, we should try to match the context label of the file that we install based on the context label of the file that it replaces. Standard SELinux tooling is very contextual which means when it is run in a container it needs some amount of bind-mount/config from the host to work correctly. But given that SELinux context labels are stored as extended attributes on the filesystem we just need some tooling that can help us set the correct attributes.

The rancher/k3s-upgrade (and by extension, the rancher/rke2-upgrade) containers are based off of Alpine, to keep them small, and Alpine provides the libselinux-utils package to help us with this task. The fix to the upgrade script in these containers should be something like:

# replace the "copy" line with something like these lines
K3S_CONTEXT=$(getfilecon $K3S_HOST_BINARY 2>/dev/null)
cp -vf /opt/k3s $K3S_HOST_BINARY
if [ -n "${K3S_CONTEXT}" ]; then
    setfilecon "${K3S_CONTEXT}" $K3S_HOST_BINARY
fi

davidnuzik · 2020-09-25T18:57:28Z

When this is done and ready for QA to test - Shylaja should test this. This is for our system-upgrade-controller testing story with RHEL/CENT environments where SELinux is enabled. This is needed for RKE2 GA and should be first tested via RKE2, then k3s.

ShylajaDevadiga · 2020-10-01T04:53:49Z

Validated correct context labels are set before and after install on both rke2 and k3s with SELinux Enforcing mode
rke2 version v1.18.9-beta21+rke2
Red Hat Enterprise Linux Server release 7.8 (Maipo)

Before

kubectl get nodes
NAME                                          STATUS   ROLES         AGE     VERSION
ip-172-31-16-222.us-east-2.compute.internal   Ready    etcd,master   9m52s   v1.18.9-beta21+rke2
ip-172-31-25-212.us-east-2.compute.internal   Ready    <none>        57s     v1.18.9-beta21+rke2

 ps -eZ | grep -E 'rke2|containerd|shim'
system_u:system_r:container_runtime_t:s0 17330 ? 00:00:12 rke2
system_u:system_r:container_runtime_t:s0 17343 ? 00:00:47 containerd
system_u:system_r:container_runtime_t:s0 17544 ? 00:00:00 containerd-shim

After

kubectl get nodes
NAME                                          STATUS   ROLES         AGE   VERSION
ip-172-31-16-222.us-east-2.compute.internal   Ready    etcd,master   19m   v1.18.9-beta22+rke2
ip-172-31-25-212.us-east-2.compute.internal   Ready    <none>        10m   v1.18.9-beta22+rke2

system_u:system_r:container_runtime_t:s0 26644 ? 00:00:09 rke2
system_u:system_r:container_runtime_t:s0 26696 ? 00:00:09 containerd
system_u:system_r:container_runtime_t:s0 26968 ? 00:00:00 containerd-shim

K3S Upgrade from v1.18.9+k3s1 to v1.19.1+k3s1
Using dev repo
Before

kubectl get nodes
NAME                                          STATUS     ROLES    AGE     VERSION
ip-172-31-30-151.us-east-2.compute.internal   Ready      master   3m33s   v1.18.9+k3s1
ip-172-31-17-56.us-east-2.compute.internal    Ready   <none>   13s      v1.18.9+k3s1

cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml |grep selinux
  enable_selinux = true

[root@ip-172-31-30-151 ~]# ps -eZ | grep -E 'k3s|containerd|shim'
system_u:system_r:container_runtime_t:s0 11178 ? 00:00:24 k3s-server
system_u:system_r:container_runtime_t:s0 11193 ? 00:00:05 containerd
system_u:system_r:container_runtime_t:s0 11588 ? 00:00:00 containerd-shim

After

kubectl get nodes
NAME                                          STATUS   ROLES    AGE   VERSION
ip-172-31-30-151.us-east-2.compute.internal   Ready    master   33m   v1.19.1+k3s1
ip-172-31-17-56.us-east-2.compute.internal    Ready    <none>   29m   v1.19.1+k3s1

cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml |grep selinux
  enable_selinux = true

system_u:system_r:container_runtime_t:s0 13916 ? 00:01:38 k3s-server
system_u:system_r:container_runtime_t:s0 13926 ? 00:00:06 containerd
system_u:system_r:container_runtime_t:s0 14810 ? 00:00:01 containerd-shim

dweomer changed the title ~~[k3s-upgrade] k3s binary missing container_runtime_exe_t context type after upgrade on selinux systems~~ [k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems Sep 17, 2020

davidnuzik assigned galal-hussein Sep 25, 2020

cjellick transferred this issue from k3s-io/k3s Sep 28, 2020

cjellick added this to the GA milestone Sep 28, 2020

cjellick added [zube]: Peer Review kind/bug Something isn't working labels Sep 28, 2020

This was referenced Sep 29, 2020

Restore selinux context on the new k3s binary k3s-io/k3s-upgrade#28

Merged

Restore selinux context on the new rke2 binary rancher/rke2-upgrade#5

Merged

davidnuzik added [zube]: To Test and removed [zube]: Peer Review labels Sep 29, 2020

davidnuzik assigned ShylajaDevadiga Sep 29, 2020

ShylajaDevadiga closed this as completed Oct 1, 2020

zube bot added [zube]: Done and removed [zube]: To Test labels Oct 1, 2020

zube bot removed the [zube]: Done label Dec 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems #379

[k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems #379

dweomer commented Sep 16, 2020

dweomer commented Sep 16, 2020

dweomer commented Sep 25, 2020

davidnuzik commented Sep 25, 2020 •

edited

Loading

ShylajaDevadiga commented Oct 1, 2020

[k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems #379

[k3s-upgrade] k3s binary missing container_runtime_exec_t context type after upgrade on selinux systems #379

Comments

dweomer commented Sep 16, 2020

dweomer commented Sep 16, 2020

dweomer commented Sep 25, 2020

davidnuzik commented Sep 25, 2020 • edited Loading

ShylajaDevadiga commented Oct 1, 2020

davidnuzik commented Sep 25, 2020 •

edited

Loading