-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add restrospetive KEP for ProcMount and target at Beta #4266
Conversation
haircommander
commented
Oct 2, 2023
- One-line PR description: Bump ProcMount feature to beta
- Issue link: add ProcMount option #4265
- Other comments: successor to add ProcMount option community#1934
9a06887
to
8b3f5b7
Compare
/label lead-opted-in |
unprivileged containers with user namespaces as it will allow more information | ||
than is necessary to the program running in the container spawned by kubernetes. | ||
|
||
The main use case for this option is to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to give other examples such as nested podman.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved some stuff around to make the use cases clearer
8b3f5b7
to
292eb80
Compare
|
||
2018-05-07: k/community update opened | ||
2018-05-27: k/kubernetes PR merged with support. | ||
2023-10-02: KEP opened and targeted at Beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the story in 2018-2023?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the enhancement has largely been abandoned between that time
|
||
#### Story 1 | ||
|
||
As a cluster admin, I would like a way to nest containers within containers and get access to proc entries as if the top level container was running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS you are not interested in getting access to those masked procfs entries.
The actual reason you want to unmask them is because the kernel doesn't allow mounting a new procfs without unmasking them, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that's more accurate
#### Story 3 | ||
|
||
As a kubernetes user, I may want to build containers from within a kubernetes container. | ||
See [this article for more information](https://github.com/jessfraz/blog/blob/master/content/post/building-container-images-securely-on-kubernetes.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Story 3 seems to overlap with Story 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I dropped 2
|
||
It should be noted that this is different that the host /proc. It is still a newly mounted /proc just the container runtimes will not mask the paths. | ||
|
||
Since the only use case for this option is to run unprivileged nested containers, this option should only be allowed or used if the user in the container is not `root`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should or must? If must, it will be enforced at API validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I've just checked--only privileged users can have access to this feature. I have left a note on the dangers of allowing users to access this field. I think a future enhancement could be to allow non root users access to an unmasked proc in the baseline policy (especially if they're confined by a user namespace), but more research has to happen for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it's actually the exact opposite of what this sentence said this morning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I misread the code
Since the only use case for this option is to run unprivileged nested containers, this option should only be allowed or used if the user in the container is not `root`. | ||
This can be easily enforced with `MustRunAs`. | ||
|
||
Since the user inside is still unprivileged, doing things to `/proc` would be off limits regardless, since linux user support already prevents this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "doing things" reads or just writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just writes
|
||
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | ||
|
||
No, though since its a fairly contained feature I would be surprised if there are issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please commit to a manual check anyway. Particularly cases where kubelets are restarted without interrupting existing containers.
- Condition name: | ||
- Other field: | ||
- [x] Other (treat as last resort) | ||
- Details: successfully create a pod with a container that uses ProcMountType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior silently drops fields, so it won't actually fail. What's actually needed? Create of procmounttype and then an initContainer that actually checks the filesystem?
- [Optional] Aggregation method: | ||
- Components exposing the metric: | ||
- [x] Other (treat as last resort) | ||
- Details: general health of pods that use this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name a specific metric to watch for a change before/after.
|
||
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? | ||
|
||
Potentially a malicious user given access and running with a root container in the host context could mess with the host processes, but PSA has already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix how? There's a "should" above that isn't clear about should or must. And the consequences here are not obvious to me. And if should isn't enforced today in beta and really ought to be must, then additional consideration/work is required. We could discuss breaking options since its only alpha today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the feature being restricted to users in a privileged PSA sufficient to mitigate? I feel like it is
292eb80
to
039a858
Compare
updated @deads2k thanks for the review--PTAL |
/sig auth |
/assign @ibihim |
55878fa
to
3021f4f
Compare
|
||
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
||
Yes, though a reboot or drain will be necessary to limit any containers that originally got access to Unmasked ProcMountType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will a node reboot change the pod fields that are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, i think I see. Could you enumerate the steps more clearly
- update featuregate on all kube-apiservers
- restart all kube-apiservers
- restart all nodes (to read the new pods, which will now have the fields stripped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a bit more detail
|
||
###### What happens if we reenable the feature if it was previously rolled back? | ||
|
||
The API server will need to recreate the pods that had the request scrubbed of the ProcMountType field (so the original intent can be fulfilled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-apiserver doesn't create pods. Which actor could/should do this? Or perhaps nothing does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do workload resources also get their procMountType stripped when the featuregate is off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this was wrong. I don't think anything will happen as KAS cleans any non default proc mount type from the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
launching a cluster now to double check though
3021f4f
to
7aec569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#prr-shadow
#### Alpha | ||
|
||
- Feature implemented behind a feature flag | ||
- Add e2e tests for the feature (must be done before beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there exists tests, can you link them in the appropriate section above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none yet I will implement them before moving this to beta
|
||
The feature gate is only processed by the API server--Kubelet has no awareness of it. API server will scrub the ProcMount field from the request | ||
if it doesn't support the feature gate. Since all supported Kubelet versions support ProcMountType field, there's no version skew worry. | ||
API server can have the feature gate toggled without worrying about doing the same for Kubelets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing, the feature gate applies to both kube-apiserver and kubelet, in which case you should answer the question what happens when you turn the feature on one, and the other has it disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the feature gate only applies to KAS, kubelet doesn't do any feature gate checks, it just accepts from KAS and processes it, regardless of whether the feature is enabled in the kubelet or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that's from a time when feature gates were not a thing. In which case I'd put an explicit information how this is currently implemented and update also kep.yaml
since it's misleadingly pointing to both kube-apiserver and kubelet as having that feature gate.
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say those are required for the beta promotion. We need to make sure that turning it off works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a comment describing my position, PTAL
|
||
###### How can a rollout or rollback fail? Can it impact already running workloads? | ||
|
||
Inconsistent feature gate enablement is the only way. All supported versions of CRIs (CRI-O and containerd) have support for this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the type of problem I've mentioned above where you should prepare the feature to work when kube-apiserver or kubelet has it off.
|
||
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have a reasonable SLO for that kubelet_pod_start_sli_duration_seconds metric? Something like a few seconds? At least you can describe that the metric should maintain values from before the upgrade, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
7aec569
to
0d328f6
Compare
thanks @soltysh! I have updated for your questions with some additional research I did |
0d328f6
to
ee6cb03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#prr-shadow
prr lgtm
ee6cb03
to
00cfe37
Compare
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
There are not. I'm unsure it will be useful, as the toggling of enablement and the functional enablement of old containers are not related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be manually tested to ensure that what is documented in https://github.com/kubernetes/enhancements/pull/4266/files#diff-1d1320b7f2cd6bb7e0cb6ebf236babce8b5bb99eaac25bd7fea57d1c17f21b52R348 functions as expected if it must be used in the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to describe the results of a manual test, and describe plans to add a test
Signed-off-by: Peter Hunt <pehunt@redhat.com>
00cfe37
to
bba9221
Compare
PRR looks good /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, haircommander, mrunalp, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes. I have manually tested feature enablement and disablement on kube-apiserver, and verified that pods are not recreated without | ||
a drain. There will be an e2e test to verify this as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this ends up impractical in practice a unit test showing the field stripping will be sufficient along with a manual test of disablement.
/lgtm |