Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restrospetive KEP for ProcMount and target at Beta #4266

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

haircommander
Copy link
Contributor

  • One-line PR description: Bump ProcMount feature to beta

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2023
@k8s-ci-robot k8s-ci-robot requested a review from dchen1107 October 2, 2023 19:34
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Oct 2, 2023
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 2, 2023
@haircommander haircommander force-pushed the proc-mount-beta branch 3 times, most recently from 9a06887 to 8b3f5b7 Compare October 2, 2023 19:51
@mrunalp
Copy link
Contributor

mrunalp commented Oct 2, 2023

/label lead-opted-in

unprivileged containers with user namespaces as it will allow more information
than is necessary to the program running in the container spawned by kubernetes.

The main use case for this option is to run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to give other examples such as nested podman.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some stuff around to make the use cases clearer

@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Oct 2, 2023
@npolshakova
Copy link

/label lead-opted-in

Hi @mrunalp, can you add this label to the KEP issue as well for enhancement tracking if this is targeting 1.29? Thank you!


2018-05-07: k/community update opened
2018-05-27: k/kubernetes PR merged with support.
2023-10-02: KEP opened and targeted at Beta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the story in 2018-2023?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the enhancement has largely been abandoned between that time

@AnaMMedina21 AnaMMedina21 mentioned this pull request Oct 3, 2023
8 tasks

#### Story 1

As a cluster admin, I would like a way to nest containers within containers and get access to proc entries as if the top level container was running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS you are not interested in getting access to those masked procfs entries.
The actual reason you want to unmask them is because the kernel doesn't allow mounting a new procfs without unmasking them, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that's more accurate

#### Story 3

As a kubernetes user, I may want to build containers from within a kubernetes container.
See [this article for more information](https://github.com/jessfraz/blog/blob/master/content/post/building-container-images-securely-on-kubernetes.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Story 3 seems to overlap with Story 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I dropped 2


It should be noted that this is different that the host /proc. It is still a newly mounted /proc just the container runtimes will not mask the paths.

Since the only use case for this option is to run unprivileged nested containers, this option should only be allowed or used if the user in the container is not `root`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should or must? If must, it will be enforced at API validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I've just checked--only privileged users can have access to this feature. I have left a note on the dangers of allowing users to access this field. I think a future enhancement could be to allow non root users access to an unmasked proc in the baseline policy (especially if they're confined by a user namespace), but more research has to happen for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's actually the exact opposite of what this sentence said this morning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I misread the code

Since the only use case for this option is to run unprivileged nested containers, this option should only be allowed or used if the user in the container is not `root`.
This can be easily enforced with `MustRunAs`.

Since the user inside is still unprivileged, doing things to `/proc` would be off limits regardless, since linux user support already prevents this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "doing things" reads or just writes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just writes


###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

No, though since its a fairly contained feature I would be surprised if there are issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit to a manual check anyway. Particularly cases where kubelets are restarted without interrupting existing containers.

- Condition name:
- Other field:
- [x] Other (treat as last resort)
- Details: successfully create a pod with a container that uses ProcMountType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior silently drops fields, so it won't actually fail. What's actually needed? Create of procmounttype and then an initContainer that actually checks the filesystem?

- [Optional] Aggregation method:
- Components exposing the metric:
- [x] Other (treat as last resort)
- Details: general health of pods that use this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name a specific metric to watch for a change before/after.


###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Potentially a malicious user given access and running with a root container in the host context could mess with the host processes, but PSA has already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix how? There's a "should" above that isn't clear about should or must. And the consequences here are not obvious to me. And if should isn't enforced today in beta and really ought to be must, then additional consideration/work is required. We could discuss breaking options since its only alpha today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the feature being restricted to users in a privileged PSA sufficient to mitigate? I feel like it is

@haircommander
Copy link
Contributor Author

updated @deads2k thanks for the review--PTAL

@stlaz
Copy link
Member

stlaz commented Nov 20, 2023

/sig auth
/triage accepted

@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Nov 20, 2023
@ibihim
Copy link

ibihim commented Jan 22, 2024

/assign @ibihim

@haircommander haircommander changed the title Add restrospetive KEP for ProcMount, targeted at alpha Add restrospetive KEP for ProcMount and target at Beta Feb 1, 2024

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, though a reboot or drain will be necessary to limit any containers that originally got access to Unmasked ProcMountType.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will a node reboot change the pod fields that are present?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i think I see. Could you enumerate the steps more clearly

  1. update featuregate on all kube-apiservers
  2. restart all kube-apiservers
  3. restart all nodes (to read the new pods, which will now have the fields stripped)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a bit more detail


###### What happens if we reenable the feature if it was previously rolled back?

The API server will need to recreate the pods that had the request scrubbed of the ProcMountType field (so the original intent can be fulfilled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-apiserver doesn't create pods. Which actor could/should do this? Or perhaps nothing does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do workload resources also get their procMountType stripped when the featuregate is off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this was wrong. I don't think anything will happen as KAS cleans any non default proc mount type from the request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

launching a cluster now to double check though

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#prr-shadow

#### Alpha

- Feature implemented behind a feature flag
- Add e2e tests for the feature (must be done before beta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there exists tests, can you link them in the appropriate section above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none yet I will implement them before moving this to beta


The feature gate is only processed by the API server--Kubelet has no awareness of it. API server will scrub the ProcMount field from the request
if it doesn't support the feature gate. Since all supported Kubelet versions support ProcMountType field, there's no version skew worry.
API server can have the feature gate toggled without worrying about doing the same for Kubelets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, the feature gate applies to both kube-apiserver and kubelet, in which case you should answer the question what happens when you turn the feature on one, and the other has it disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the feature gate only applies to KAS, kubelet doesn't do any feature gate checks, it just accepts from KAS and processes it, regardless of whether the feature is enabled in the kubelet or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that's from a time when feature gates were not a thing. In which case I'd put an explicit information how this is currently implemented and update also kep.yaml since it's misleadingly pointing to both kube-apiserver and kubelet as having that feature gate.


###### Are there any tests for feature enablement/disablement?

No
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say those are required for the beta promotion. We need to make sure that turning it off works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a comment describing my position, PTAL


###### How can a rollout or rollback fail? Can it impact already running workloads?

Inconsistent feature gate enablement is the only way. All supported versions of CRIs (CRI-O and containerd) have support for this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type of problem I've mentioned above where you should prepare the feature to work when kube-apiserver or kubelet has it off.

keps/sig-node/4265-proc-mount/README.md Outdated Show resolved Hide resolved

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a reasonable SLO for that kubelet_pod_start_sli_duration_seconds metric? Something like a few seconds? At least you can describe that the metric should maintain values from before the upgrade, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

keps/sig-node/4265-proc-mount/README.md Show resolved Hide resolved
keps/sig-node/4265-proc-mount/README.md Outdated Show resolved Hide resolved
@haircommander
Copy link
Contributor Author

thanks @soltysh! I have updated for your questions with some additional research I did

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#prr-shadow
prr lgtm

keps/sig-node/4265-proc-mount/kep.yaml Outdated Show resolved Hide resolved

###### Are there any tests for feature enablement/disablement?

There are not. I'm unsure it will be useful, as the toggling of enablement and the functional enablement of old containers are not related.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be manually tested to ensure that what is documented in https://github.com/kubernetes/enhancements/pull/4266/files#diff-1d1320b7f2cd6bb7e0cb6ebf236babce8b5bb99eaac25bd7fea57d1c17f21b52R348 functions as expected if it must be used in the field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to describe the results of a manual test, and describe plans to add a test

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@deads2k
Copy link
Contributor

deads2k commented Feb 8, 2024

PRR looks good

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, haircommander, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2024
###### Are there any tests for feature enablement/disablement?

Yes. I have manually tested feature enablement and disablement on kube-apiserver, and verified that pods are not recreated without
a drain. There will be an e2e test to verify this as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this ends up impractical in practice a unit test showing the field stripping will be sufficient along with a manual test of disablement.

@mrunalp
Copy link
Contributor

mrunalp commented Feb 8, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 89521e8 into kubernetes:master Feb 8, 2024
3 of 4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lead-opted-in Denotes that an issue has been opted in to a release lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.