WRKLDS-1060: Prevent User Workloads from being scheduled on Control Plane nodes #1583

knelasevero · 2024-03-05T14:00:33Z

This proposal introduces a mechanism to prevent user workloads from being scheduled on control plane nodes in OpenShift clusters, enhancing cluster security and stability. By default, control plane nodes are protected with a NoSchedule taint to avoid user workload allocation. However, there's currently no safeguard against users who explicitly tolerate this taint or bypass the kube-scheduler by setting the .spec.nodeName directly. This enhancement proposes an admission controller to enforce specified policies and configurable policy management for exceptions.

As an alternative, the proposal also considers using the NoExecute taint for control plane nodes, which is recognized by the kubelet and would reject any pod that does not tolerate the taint (reject new pods, or evict pods already there). This approach requires no scheduler awareness and ensures a stricter enforcement mechanism. However, this method requires that all control plane pods be updated to tolerate the NoExecute taint before it can be applied to control plane nodes to prevent unintended disruptions. (Requires coordination with additional teams, and maintaining artifacts on multiple repositories)

openshift-ci-robot · 2024-03-05T14:00:37Z

@knelasevero: This pull request references WRKLDS-1060 which is a valid jira issue.

In response to this:

This proposal introduces a mechanism to prevent user workloads from being scheduled on control plane nodes in OpenShift clusters, enhancing cluster security and stability. By default, control plane nodes are protected with a NoSchedule taint to avoid user workload allocation. However, there's currently no safeguard against users who explicitly tolerate this taint or bypass the kube-scheduler by setting the .spec.nodeName directly. This enhancement proposes an admission controller to enforce specified policies and configurable policy management for exceptions.

As an alternative, the proposal also considers using the NoExecute taint for control plane nodes, which is recognized by the kubelet and would reject any pod that does not tolerate the taint (reject new pods, or evict pods already there). This approach requires no scheduler awareness and ensures a stricter enforcement mechanism. However, this method requires that all control plane pods be updated to tolerate the NoExecute taint before it can be applied to control plane nodes to prevent unintended disruptions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

knelasevero · 2024-03-05T14:02:03Z

Gonna transfer some of the comments we had in our private doc here

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

openshift-ci-robot · 2024-03-05T14:10:32Z

@knelasevero: This pull request references WRKLDS-1060 which is a valid jira issue.

In response to this:

This proposal introduces a mechanism to prevent user workloads from being scheduled on control plane nodes in OpenShift clusters, enhancing cluster security and stability. By default, control plane nodes are protected with a NoSchedule taint to avoid user workload allocation. However, there's currently no safeguard against users who explicitly tolerate this taint or bypass the kube-scheduler by setting the .spec.nodeName directly. This enhancement proposes an admission controller to enforce specified policies and configurable policy management for exceptions.

As an alternative, the proposal also considers using the NoExecute taint for control plane nodes, which is recognized by the kubelet and would reject any pod that does not tolerate the taint (reject new pods, or evict pods already there). This approach requires no scheduler awareness and ensures a stricter enforcement mechanism. However, this method requires that all control plane pods be updated to tolerate the NoExecute taint before it can be applied to control plane nodes to prevent unintended disruptions. (Requires coordination with additional teams, and maintaining artifacts on multiple repositories)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Co-authored-by: Flavian Missi <fmissi@redhat.com> Co-authored-by: Jan Chaloupka <jchaloup@redhat.com>

openshift-ci-robot · 2024-03-06T04:25:34Z

@knelasevero: This pull request references WRKLDS-1060 which is a valid jira issue.

In response to this:

This proposal introduces a mechanism to prevent user workloads from being scheduled on control plane nodes in OpenShift clusters, enhancing cluster security and stability. By default, control plane nodes are protected with a NoSchedule taint to avoid user workload allocation. However, there's currently no safeguard against users who explicitly tolerate this taint or bypass the kube-scheduler by setting the .spec.nodeName directly. This enhancement proposes an admission controller to enforce specified policies and configurable policy management for exceptions.

As an alternative, the proposal also considers using the NoExecute taint for control plane nodes, which is recognized by the kubelet and would reject any pod that does not tolerate the taint (reject new pods, or evict pods already there). This approach requires no scheduler awareness and ensures a stricter enforcement mechanism. However, this method requires that all control plane pods be updated to tolerate the NoExecute taint before it can be applied to control plane nodes to prevent unintended disruptions. (Requires coordination with additional teams, and maintaining artifacts on multiple repositories)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Co-authored-by: Jan Chaloupka <jchaloup@redhat.com>

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

ingvagabund

Mostly just nits

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

deads2k · 2024-05-20T21:09:00Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+The necessary feature for this solution (ValidatingAdmissionPolicy) is already available, even though it is in Tech Preview (it is in beta but disabled by default upstream). The evolution of this solution ties with the evolution of [ValidatingAdmissionPolicy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3488-cel-admission-control/README.md) and the decisions to graduate it downstream on Openshift.
+
+### Tech Preview -> GA


There needs to be a discussion of how to know when all OCP payload namespaces have proper permissions for tolerations.

test to ensure all ocp payload namespaces have proper permissions

test to ensure all future ocp payload namespaces have proper permissions.

test to ensure enablement works appropriately.

test to ensure enablement evicts pods without tolerations.

deads2k · 2024-05-20T21:10:08Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+The necessary feature for this solution (ValidatingAdmissionPolicy) is already available, even though it is in Tech Preview (it is in beta but disabled by default upstream). The evolution of this solution ties with the evolution of [ValidatingAdmissionPolicy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3488-cel-admission-control/README.md) and the decisions to graduate it downstream on Openshift.
+
+### Tech Preview -> GA


should include information about how what information must-gather includes and a test that ensures its there. Don't forget the case where a cluster-admin creates their own VAPB and you need to remember to collect it.

deads2k · 2024-05-20T21:11:33Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+The necessary feature for this solution (ValidatingAdmissionPolicy) is already available, even though it is in Tech Preview (it is in beta but disabled by default upstream). The evolution of this solution ties with the evolution of [ValidatingAdmissionPolicy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3488-cel-admission-control/README.md) and the decisions to graduate it downstream on Openshift.
+
+### Tech Preview -> GA


documentation (would be better as a tool in oc) that can inspect an OCP cluster (or offline audit log) and produce a list of permissions that are needed to avoid disruption and with an option to auto-produce a directory of manifests that would apply the permissions.

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

openshift-ci · 2024-06-19T16:35:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-06-19T16:50:59Z

@knelasevero: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

soltysh · 2024-06-26T09:19:06Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+- As an administrator, I want to restrict non control plane workloads from being scheduled on control plane nodes (even those using `.spec.nodeName` in their pods), so that the control plane components are not at risk of running out of resources.
+- As an administrator, I want only certain workloads to be allowed to schedule pods on tainted nodes.
+- As an administrator, I want to restrict workloads from being scheduled on special node groups (even those using `.spec.nodeName` in their pods), so that those node groups can only run specialized workloads (e.g., GPU-enabled nodes) without obstruction.


I don't think that's a valid use-case for this particular problem. With this UC you're entering a territory of deciding which nodes are special, how to manage these, etc. It's rather something that should be a non-goal, since you're targeting only control plane nodes.

We had this previously as a non-goal. We decided this is now a goal(ish), as the mechanism is flexible enough to allow admins to do whatever they want (and instructions on how to do this were explicitly requested by customers). We are providing the VAP focused on control-plane node protection (and I agree it is the focus of this proposal), but it is trivial to adapt this workflow to protect any type of special node, with the creation of VAP+VAPB, NoExecute taint and necessary RBAC permissions with custom verbs.

Either way, I'd still advice against it, even if the mechanism will allow it. The main reason for starting small, is that it's easier to ensure the core use cases are handled properly. You're mentioning this being goal(ish) which means you're not fully certain about all possible usage patterns, so you're opening a possible bag of worms. Keep it simple, and start small, with the possibility to extend the mechanism with minor modifications if/when necessary.

That's ok, I can do that

soltysh · 2024-06-26T09:20:23Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+- As an OpenShift developer, I want to understand the impact of the new scheduling restrictions on existing and future workloads, so that I can design applications that comply with cluster policies and make informed decisions about resource requests and deployment strategies.
+- As an end user, I want to receive informative feedback when my workloads are rejected due to being repelled from control plane nodes or special node groups, so that I can make the necessary adjustments without needing extensive support from cluster administrators.
+- As a security professional, I want to ensure that this pod rejection mechanism is enforceable and auditable, so that I can verify compliance with internal and external regulations regarding resource access and control plane integrity.
+- As an administrator, I want the ability to temporarily override scheduling restrictions for emergency or maintenance tasks without compromising the overall security posture of the cluster, ensuring that critical operations can be performed when necessary.


Similarly, this particular UC is out-of-scope for this proposal. That's what oc adm cordon and oc adm drain are for.

same as #1583 (comment)

I'll quote here an excerpt from The Zen of Python: There should be one-- and preferably only one --obvious way to do it. - I'm a fan of that kind of approach. Having two distinct ways puts you in the position that you'll need to address differences between the two approaches. Again, simple is the key 😉

I also like that excerpt, but I don't think it applies here. Maybe that I need to make the text simpler, but not the mechanism we are providing. We are not providing anything similar to cordon or drain here.

I come back to that other comment of mine: maybe I can phrase this better. And I'm sure of it now. I'll do that.

My point is that

temporarily override scheduling restrictions for emergency or maintenance tasks without compromising the overall security posture of the cluster

Is just saying that the admin can disable the enforcement of the vap if they need. So it has nothing to do with cordon or drain. Vap + vapb being enforced in this context is the restriction being mentioned. Overriding the approach here just means that folks can disable the VAP enforcement if needed.

I can phrase this better though.

soltysh · 2024-06-26T09:21:19Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+- Protect Specialized Resources: Restrict scheduling of regular user workloads on nodes with specialized resources (e.g., GPU-enabled nodes) to ensure these costly resources are reserved for appropriate workloads.
+
+- Emergency Override Capability: Allow administrators to temporarily override scheduling restrictions for emergency or maintenance tasks, ensuring critical operations can be performed when necessary without compromising the cluster's overall security posture.


The last two are non-goals, and should rather be handled separately from this task.

Maybe I can phrase this better, but I only mean admins can disable the very thing we are proposing here, in case they actually need to schedule stuff on nodes that would be protected if they had this enabled before. This is a goal.

the previous one same as #1583 (comment)

See my other two answers again.

soltysh · 2024-06-26T09:23:06Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+### Implementation Strategies
+
+Update OpenShift Components for Taint Tolerance: Modify OpenShift operators and control plane components to include tolerations for the NoExecute taint on control plane nodes that admins can apply to achieve the goals described here.


How adding a NoExecute tain will be different from NoSchedule which you've mentioned earlier is already added when mastersSchedulable=true?

See this discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1709049600594239?thread_ts=1709048784.350439&cid=CKJR6200N

TL;DR The kubelet already natively enforces "if no toleration, the no-execute taint kills pod processes" for us.

So we don't need a new admission plugin, or new controller. We can just rely on existing functionality from scheduler kubelet and api-server with ValidatingAdmissionPolicies, NoExecute taints and RBAC rules with custom verbs

Can you add that explanation in the implementation details. I think that's an important detail which I was missing and thus my question.

Don't forget to explain the difference between this new approach and the current.

I think it is clear with the alternatives section? We had to change the whole proposal from that (NoSchedule with new admission plugin) to the new mechanism (NoExecute, Vap,+RBAC) as the new is simpler. I can add a note here that points to that and explains the difference and justification for the discarded approach .

Would that be ok?

soltysh · 2024-06-26T09:33:48Z

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md

+
+3. Validating Admission Policy Binding:
+
+- As part of this approach a ValidatigAdmissionPolicyBinding is also provided. The Cluster Administrator updates `validationActions` field to flip the switch and enable actual enforcement. This is the only field where we allow change, but admins can also create their own VAP/VAPB process if they need to. Binding shown here affects all namespaces.


Why would administrator fiddle with any of the resources created by the operator? Why not exposing a high-level API, which would allow administrator to turn the feature on or off?

See discussion: #1583 (comment)

We can revisit this, of course. Idea here was to keep it simple. Customers can actually already achieve everything described in this EP themselves out of the box, but doing steps manually. We are just providing a standardized way of enabling this to make supporting this easier.

Also this discussion is relevant: https://redhat-internal.slack.com/archives/CC3CZCQHM/p1713811003793789?thread_ts=1713794008.195589&cid=CC3CZCQHM

I've skimmed both places and none of the two explicitly moves the responsibility to the cluster admin. I think the intention is to have a high level API to express these options, but the underlying resources are managed and configured by the operator. The majority of our API, if not all is designed with that in mind, to hide the complexity from cluster-admins.

It's not totally explicit, but we moved from the previous approach to avoid having API changes and a new controller or new admission plugin. With that in mind we don't have a place to add a new field or a new API to control this, if the whole idea is to take advantage of existing APIs and existing functionality of the Scheduler, API Server and Kubelet. (Only providing VAP + VAPBs + rbac to end users and admins).

That's why VAP with cell expressions was created, right? So we can let admins have their own rules implemented with CEL expressions and at most we can provide default or standard VAPs when needed (with no new controllers or plugins or new APIs).

@deads2k would you please chime in? Please let me know if I had the wrong interpretation here.

openshift-bot · 2024-08-12T09:15:36Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-08-20T00:45:18Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 5, 2024

knelasevero commented Mar 5, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero commented Mar 5, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero commented Mar 5, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

openshift-ci bot requested review from enxebre and tkashem March 5, 2024 14:07

knelasevero commented Mar 5, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero commented Mar 5, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero marked this pull request as draft March 5, 2024 14:13

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2024

add initial proposal from template

9c3deed

Co-authored-by: Flavian Missi <fmissi@redhat.com> Co-authored-by: Jan Chaloupka <jchaloup@redhat.com>

knelasevero force-pushed the WRKLDS-1060-prevent-workloads-control-plane-nodes branch from 27e92a2 to 9c3deed Compare March 5, 2024 14:25

knelasevero and others added 2 commits March 6, 2024 08:12

spelling and template noise

d723a46

add CEL with NoExecute alternative

b81ec72

Co-authored-by: Jan Chaloupka <jchaloup@redhat.com>

knelasevero commented Mar 11, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero commented Mar 11, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

knelasevero added 2 commits March 14, 2024 12:17

make CEL with NoExecute alternative the main one

92a10cf

add special nodes into pre-workflow

5868c11

knelasevero force-pushed the WRKLDS-1060-prevent-workloads-control-plane-nodes branch 2 times, most recently from bc0650b to b2dd240 Compare March 14, 2024 14:34

ingvagabund suggested changes Mar 19, 2024

View reviewed changes

fix description

68d168b

knelasevero force-pushed the WRKLDS-1060-prevent-workloads-control-plane-nodes branch from b2dd240 to 68d168b Compare March 27, 2024 09:48

going over review notes

5992a70

knelasevero marked this pull request as ready for review March 27, 2024 12:27

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 27, 2024

knelasevero added 4 commits May 9, 2024 09:54

fix emergency suggestion

8625a5c

fix implementation section

da7078f

fix typo

8f5dea8

breaklines for each sentence

d3a4201

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Outdated Show resolved Hide resolved

deads2k reviewed May 20, 2024

View reviewed changes

atiratree reviewed May 23, 2024

View reviewed changes

enhancements/scheduling/prevent-user-workloads-from-being-scheduled-on-control-plane-nodes.md Show resolved Hide resolved

knelasevero added 4 commits May 24, 2024 07:20

review notes

48d520c

must gather comment

82fe916

adapt statements regarding VAPB

e839d3e

let mirror pods be an exception

82f89cb

soltysh reviewed Jun 26, 2024

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WRKLDS-1060: Prevent User Workloads from being scheduled on Control Plane nodes #1583

WRKLDS-1060: Prevent User Workloads from being scheduled on Control Plane nodes #1583

knelasevero commented Mar 5, 2024 •

edited

Loading

openshift-ci-robot commented Mar 5, 2024 •

edited by openshift-ci bot

Loading

knelasevero commented Mar 5, 2024

openshift-ci-robot commented Mar 5, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 6, 2024 •

edited by openshift-ci bot

Loading

ingvagabund left a comment

deads2k May 20, 2024

deads2k May 20, 2024

deads2k May 20, 2024

openshift-ci bot commented Jun 19, 2024

openshift-ci bot commented Jun 19, 2024

soltysh Jun 26, 2024

knelasevero Jul 2, 2024 •

edited

Loading

soltysh Jul 12, 2024

knelasevero Jul 15, 2024 •

edited

Loading

soltysh Jun 26, 2024

knelasevero Jul 2, 2024 •

edited

Loading

soltysh Jul 12, 2024

knelasevero Jul 15, 2024 •

edited

Loading

soltysh Jun 26, 2024

knelasevero Jul 2, 2024 •

edited

Loading

soltysh Jul 12, 2024

soltysh Jun 26, 2024

knelasevero Jul 2, 2024 •

edited

Loading

soltysh Jul 12, 2024

soltysh Jul 12, 2024

knelasevero Jul 15, 2024 •

edited

Loading

soltysh Jun 26, 2024

knelasevero Jul 2, 2024

soltysh Jul 12, 2024

knelasevero Jul 15, 2024 •

edited

Loading

knelasevero Jul 15, 2024

openshift-bot commented Aug 12, 2024

openshift-bot commented Aug 20, 2024


		The necessary feature for this solution (ValidatingAdmissionPolicy) is already available, even though it is in Tech Preview (it is in beta but disabled by default upstream). The evolution of this solution ties with the evolution of [ValidatingAdmissionPolicy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3488-cel-admission-control/README.md) and the decisions to graduate it downstream on Openshift.

		### Tech Preview -> GA


		- Protect Specialized Resources: Restrict scheduling of regular user workloads on nodes with specialized resources (e.g., GPU-enabled nodes) to ensure these costly resources are reserved for appropriate workloads.

		- Emergency Override Capability: Allow administrators to temporarily override scheduling restrictions for emergency or maintenance tasks, ensuring critical operations can be performed when necessary without compromising the cluster's overall security posture.


		### Implementation Strategies

		Update OpenShift Components for Taint Tolerance: Modify OpenShift operators and control plane components to include tolerations for the NoExecute taint on control plane nodes that admins can apply to achieve the goals described here.


		3. Validating Admission Policy Binding:

		- As part of this approach a ValidatigAdmissionPolicyBinding is also provided. The Cluster Administrator updates `validationActions` field to flip the switch and enable actual enforcement. This is the only field where we allow change, but admins can also create their own VAP/VAPB process if they need to. Binding shown here affects all namespaces.

WRKLDS-1060: Prevent User Workloads from being scheduled on Control Plane nodes #1583

Are you sure you want to change the base?

WRKLDS-1060: Prevent User Workloads from being scheduled on Control Plane nodes #1583

Conversation

knelasevero commented Mar 5, 2024 • edited Loading

openshift-ci-robot commented Mar 5, 2024 • edited by openshift-ci bot Loading

knelasevero commented Mar 5, 2024

openshift-ci-robot commented Mar 5, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented Mar 6, 2024 • edited by openshift-ci bot Loading

ingvagabund left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Jun 19, 2024

openshift-ci bot commented Jun 19, 2024

Choose a reason for hiding this comment

knelasevero Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knelasevero Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Aug 12, 2024

openshift-bot commented Aug 20, 2024

knelasevero commented Mar 5, 2024 •

edited

Loading

openshift-ci-robot commented Mar 5, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 5, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Mar 6, 2024 •

edited by openshift-ci bot

Loading

knelasevero Jul 2, 2024 •

edited

Loading

knelasevero Jul 15, 2024 •

edited

Loading

knelasevero Jul 2, 2024 •

edited

Loading

knelasevero Jul 15, 2024 •

edited

Loading

knelasevero Jul 2, 2024 •

edited

Loading

knelasevero Jul 2, 2024 •

edited

Loading

knelasevero Jul 15, 2024 •

edited

Loading

knelasevero Jul 15, 2024 •

edited

Loading