add KEP: Node Maintenance Lease #1411

michaelgugino · 2019-12-13T20:58:33Z

No description provided.

k8s-ci-robot · 2019-12-13T20:58:41Z

Welcome @michaelgugino!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-12-13T20:58:42Z

Hi @michaelgugino. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

michaelgugino · 2019-12-13T21:10:54Z

cc @smarterclayton @timothysc

michaelgugino · 2019-12-18T17:32:12Z

Some notes from recent sig-node meeting.

Sig-node members suggested that the ownership of this KEP belongs to sig-cluster-lifecycle. I think that's a good approach.

Also, TODO: Add some notes about not impacting the kubelet or scheduling. The proposed lease has no direct effect on the kubelet, only serves as informational to other components that someone/something is requesting exclusive access to disrupt the kubelet.

sftim

@michaelgugino here's some feedback. It's only nits.

keps/sig-node/20191211-node-maintenance-lease.md

sftim · 2019-12-18T17:39:03Z

/ok-to-test

sftim

One more nit. Not a biggie.

michaelgugino · 2019-12-19T17:15:06Z

One more nit. Not a biggie.

@sftim Your new suggestions didn't come through here on github.

sftim · 2019-12-19T17:51:55Z

D'oh! Thanks for explaining.

sftim · 2019-12-19T17:52:27Z

keps/sig-node/20191211-node-maintenance-lease.md

+The Lease object should be created automatically, and the ownerRef should
+be the corresponding node so it is removed when the node is deleted.
+
+### User Stories [optional]


Suggested change

### User Stories [optional]

### User Stories

was what I meant @michaelgugino

mattjmcnaughton

Thanks for your thoughts here @michaelgugino :)

I added some high-level questions and thoughts.

mattjmcnaughton · 2019-12-20T13:29:25Z

keps/sig-node/20191211-node-maintenance-lease.md

+Utilize the existing `Lease` built-in API in the API group `coordination.k8s.io`.
+Create a new Lease object per-node  with Name equal to Node name in a newly
+created dedicated namespace
+That namespace should be created automatically (similarly to "default" and


I'm reading the above as we're proposing the addition of a third "automatically" created namespace, which we will name "kube-maintenance-leases" (or something similar). Am I understanding correctly?

Out of curiosity, do you know if there's policy or precedent around when we add new default namespaces? I'm not sure if this concern is justified, but the thought of each new feature creating its own default namespace kind of concerns me.

I think creating a new default namespace is probably the cleanest implementation. I'm unsure if there is a formal namespace review process. I'm relying on the reasoning here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md#proposal

I'm don't really have a hard preference which namespace it lives in; putting it in the same namespace as the heartbeat would cause a name collision, and that's undesirable from a UX position (if node-names are already at max-length, then we need some kind of hashing or other mechanism which sacrifices readability).

I'm also unsure if there's a formal namespace review process :) Hopefully some other folks will be able to weigh in with insights :)

an end-to-end flow would help improve understanding.

mattjmcnaughton · 2019-12-20T13:30:54Z

keps/sig-node/20191211-node-maintenance-lease.md

+
+## Motivation
+
+Currently, there is no centralized way to inform a controller or user that


Out of curiosity - do you know how folks are addressing this issue now? I.e. these days, if a cluster administrator needs to perform a disruptive operation against a node, what will they do?

It depends on what components they have running. Those components might use some annotation or other mechanism to pause operations, but I don't know of any specific component doing that today.

Some non-trivial conflicting components that exist today:

https://github.com/kubevirt/node-maintenance-operator

NMO is for assisting in scheduling maintenance of bare metal hardware.

https://github.com/openshift/machine-config-operator/

MCO is for upgrading node configurations.

You don't want those two components to compete.

In addition, say you're powered off a bare metal machine that was provisioned via cluster-api. The Machine Health Checker kubernetes-sigs/cluster-api#1684 would delete that node without some kind of intervention today.

mattjmcnaughton · 2019-12-20T13:32:02Z

keps/sig-node/20191211-node-maintenance-lease.md

+Use a designated annotation on the node object, or possibly
+elsewhere.  Drawbacks to this include excess updates being applied to the node
+object directly, which might have performance implications. Having such an
+annotation natively seems like an anti-pattern and could easily be disrupted


I'm reading this statement as saying that we can't use annotations as a locking mechanism, as that is not what they were designed for - I'm I understanding correctly?

We could use annotations for this. I'm not sure this type of operation fits the mold of annotations, and annotations seems to be not meant for this type of coordination; it doesn't seem like there should be a system-wide annotation, I don't think I've seen that.

In any case, if other people decide annotation is the best way/place to do this, it's perfectly fine for me, but the drawback of many things reading/writing to the node makes it less than ideal IMO.

If you used annotations for this, what would that look like, a key with a value which is the lease end time? What prevents someone from overwriting it? (Is that misbehaving?)

Annotations would look pretty ugly, probably. I haven't sat down and actually drawn out how it would work, but probably it takes more than 1 annotation, possibly 3 or more. 1 for acquire time. 1 for duration/release-time, 1 for ownership. These values could be coerced into a CSV string, but that's probably even worse.

I think if anyone has a solid implementation alternative they want specified here, I can include it, but I don't feel inclined to specify in such detail something that's not really being considered.

mattjmcnaughton · 2019-12-20T13:34:10Z

keps/sig-node/20191211-node-maintenance-lease.md

+with a specific name.  Clients could use a Get-or-Create method to check the
+lease.  This will make installing components that support this
+functionality more difficult as RBAC will not be deterministic at install time
+of those components.


Another alternative to consider... (I think - I apologize if it doesn't actually meet the use case...) :)

Could we implement this entire workflow outside of core Kubernetes using CRD and custom controllers? I.e. for those who wished to use MaintenanceLease, they could deploy the custom controller and create the CRD. I think that within the controller implementation, the proper locking behavior could be implemented.

Certainly it's an alternative. It presents more work for everyone. If you want to use the CRD based-method, you still need to agree on a namespace (or do system-wide), RBAC, and then introduce a hard-dependency on your project (this CRD needs to be installed first).

I think utilizing the built-in coordination API over a CRD would be more optimal. We could do things like add an option to kubectl to check for a lease before drain/cordon.

I like the idea of an add-on, whether that's using a CRD or an annotation on Node resources. If the core of Kubernetes doesn't provide enough to let the add-on hook in the way this KEP intends, how about providing just enough extension to make the add-on possible?

kubectl could be aware of the add-on and react differently depending on whether a maintenance lease was set.

I'm against using a CRD for this. There is already the coordination API, which seems specifically designed for this type of thing. I would be okay with an annotation on the node.

I'm unsure what you mean by 'add-on' in this context.

I'm suggesting that this node maintenance lease not become part of the Kubernetes core, and instead it becomes a component that a cluster operator can add if they like it: an add-on.

Another KEP, #1308, covers a SIG Cluster Lifecycle mechanism for a cluster operator to configure which add-ons are added into a new cluster, and covers selection of the default configuration as well.

@sftim

I'm suggesting that this node maintenance lease not become part of the Kubernetes core

Why are you suggesting this? Why would an add-on be preferential here over making it part of the core?

michaelgugino

Some good feedback questions, thanks.

For me, I think the primary discussion points are:

Do we want to introduce a new namespace for this?
Do we want said namespace to be a system-namespace automatically created?
If we introduce a new system-namespace, do we want it's scope to be single-purpose, or do we need room for a kube-system-2 type namespace with a more general purpose?
Should we just declare a kubernetes-wide annotation that can be applied to the node, and skip a bunch of hassle? EG: node-maintenance-lease.k8s.io: "{'acquired-time': xxx, 'expire-time': xxx}" or similar.

sftim · 2019-12-20T16:01:07Z

Is there any interest in providing a more generic mechanism? For example, maybe I can mark maintenance against a StatefulSet, Service, or PersistentVolume. Maintenance doesn't just happen to Nodes.

mattjmcnaughton

Great summary @michaelgugino !

Yeah, I think at the end of the day, my main question (which I don't know the answer to) is what is the cost of adding a new system-wide namespace? My high level fear is that, if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand. Whether that's an actual problem is definitely up for debate though.

One thought - is there a way we could create the system-wide namespace "on demand"? I.e. it does not exist until the first time someone tries to take a maintenance lease? With the adhoc creation, the only folks experiencing "namespace pollution" are those who have knowingly opted into this feature.

I agree with you @michaelgugino that the Lease mechanism does feel like a mechanism for implementation, so I agree with your hesitancy to create something else from scratch.

sftim · 2019-12-22T00:04:47Z

BTW add-ons are things like https://kubernetes.io/docs/concepts/cluster-administration/addons/

sftim · 2019-12-22T00:06:14Z

if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand

Seems valid to call that out (it's a concern for me too).

michaelgugino · 2019-12-23T14:48:55Z

if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand

Seems valid to call that out (it's a concern for me too).

We could not add a new namespace and use an existing one. I'm open to suggestion here. I went with new due to the reasoning here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md#proposal

we considered using kube-system namespace but decided that it's already too overloaded

We could definitely use the kube-system namespace, or we could create some new more-generic namespace that has a variety of uses (which sounds like kube-system).

wojtek-t · 2019-12-30T10:58:48Z

There is non-negligible overlap between #1116 and this KEP. I think we should try to merge those two together somehow.

@yastij

michaelgugino · 2020-01-02T14:14:58Z

There is non-negligible overlap between #1116 and this KEP. I think we should try to merge those two together somehow.

@yastij

@wojtek-t I disagree, there is not much overlap. That KEP is about a node that goes down, this one is about maintenance. The proposed component(s) in the shutdown-KEP could respond to the proposed lease in some way, or not. Ideally, if you have the maintenance lease, you drain the node before doing things to it, so the shutdown-KEP wouldn't really take any meaningful effect.

michaelgugino · 2021-01-13T18:54:57Z

Hi @michaelgugino,

Would you please open a tracking issue in this repo for your KEP and convert this to the new template? Thanks!

It's pretty onerous to require existing open keps to use some new format. I'm fine for new keps using the new template, but we shouldn't require all of them prior to merging.

ehashman · 2021-01-13T19:09:20Z

@michaelgugino as this currently stands this KEP can't be tracked by the enhancements team as it doesn't have an associated issue in the repo. This is not a new requirement. You don't need to fill out a new template, I'm just asking you to rename your file and split out the metadata.

JoelSpeed · 2021-01-25T11:26:30Z

keps/sig-node/20191211-node-maintenance-lease.md

+The Lease object will be created automatically by the NodeLifecycleController,
+and the ownerRef will be the corresponding node so it is removed when the node
+is deleted. If the Lease object is removed before the node is deleted, the
+NodeLifecycleController will recreate it.


Is this the typical pattern for leases? Would it make sense for the lease to be created by the first thing trying to grab it, and then release it when they are finished by removing it? IIRC the lease libraries normally support this? Or is the advantage here that the NodeLifecycleController can make sure the owner references are set correctly?

JoelSpeed · 2021-01-25T12:01:50Z

keps/sig-node/20191211-node-maintenance-lease.md

+
+### Use existing system namespace for lease object
+
+We could utilize an existing system namespace for the lease object. The primary


The original client-go locking used an annotation on a resource (eg a ConfigMap), I think this is what Tim refers to. The implementation underneath would attempt to patch the annotation value with the information about the lease (holder, duration etc) and rely on etcd to ensure conflicts were handled appropriately (ie if you tried to acquire after someone else had updated, you get a 409)

fejta-bot · 2021-02-24T12:23:44Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

michaelgugino · 2021-03-01T19:58:37Z

/remove-lifecycle rotten

fejta-bot · 2021-05-30T20:38:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-06-29T21:08:10Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

michaelgugino · 2021-07-15T16:57:49Z

/remove-lifecycle rotten

neolit123 · 2021-07-15T17:01:39Z

keps/sig-node/20191211-node-maintenance-lease.md

+owning-sig: sig-cluster-lifecycle
+participating-sigs:
+  - sig-node


iterating comments from before, IMO this should be in reverse.
KEP folder / code owner (NodeLifecycleController) -> KEP owner
other SIGs -> participants

neolit123 · 2021-07-15T17:02:50Z

keps/sig-node/20191211-node-maintenance-lease.md

+participating-sigs:
+  - sig-node
+reviewers:
+  - "@neolit123"


you can update the list with participants in this KEP review.

neolit123 · 2021-07-15T17:03:19Z

keps/sig-node/20191211-node-maintenance-lease.md

+reviewers:
+  - "@neolit123"
+approvers:
+  - TBD


IMO, ideally, someone who maintains NodeLifecycleController.

k8s-triage-robot · 2021-10-13T17:42:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-11-12T18:25:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-12-12T18:43:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-12-12T18:43:47Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 13, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 13, 2019

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 13, 2019

k8s-ci-robot requested review from dchen1107 and derekwaynecarr December 13, 2019 20:58

sftim reviewed Dec 18, 2019

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 18, 2019

michaelgugino force-pushed the node-maintenance-lease branch 2 times, most recently from 910461d to 0fe63a5 Compare December 19, 2019 14:59

sftim reviewed Dec 19, 2019

View reviewed changes

mattjmcnaughton reviewed Dec 20, 2019

View reviewed changes

michaelgugino commented Dec 20, 2019

View reviewed changes

mattjmcnaughton reviewed Dec 21, 2019

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021

JoelSpeed reviewed Jan 25, 2021

View reviewed changes

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 1, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2021

n1r1 mentioned this pull request Jul 14, 2021

Propose new controller for pausing MHC during cluster upgrades openshift/enhancements#834

Closed

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 15, 2021

neolit123 reviewed Jul 15, 2021

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2021

ehashman added this to Waiting on Author in SIG Node PR Triage Dec 6, 2021

k8s-ci-robot closed this Dec 12, 2021

SIG Node PR Triage automation moved this from Waiting on Author to Done Dec 12, 2021


		## Motivation

		Currently, there is no centralized way to inform a controller or user that


		### Use existing system namespace for lease object

		We could utilize an existing system namespace for the lease object. The primary

add KEP: Node Maintenance Lease #1411

add KEP: Node Maintenance Lease #1411

Conversation

michaelgugino commented Dec 13, 2019

k8s-ci-robot commented Dec 13, 2019

k8s-ci-robot commented Dec 13, 2019

michaelgugino commented Dec 13, 2019

michaelgugino commented Dec 18, 2019

sftim left a comment

Choose a reason for hiding this comment

sftim commented Dec 18, 2019

sftim left a comment

Choose a reason for hiding this comment

michaelgugino commented Dec 19, 2019

sftim commented Dec 19, 2019

Choose a reason for hiding this comment

mattjmcnaughton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelgugino left a comment

Choose a reason for hiding this comment

sftim commented Dec 20, 2019

mattjmcnaughton left a comment

Choose a reason for hiding this comment

sftim commented Dec 22, 2019

sftim commented Dec 22, 2019

michaelgugino commented Dec 23, 2019

wojtek-t commented Dec 30, 2019

michaelgugino commented Jan 2, 2020

michaelgugino commented Jan 13, 2021

ehashman commented Jan 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Feb 24, 2021

michaelgugino commented Mar 1, 2021

fejta-bot commented May 30, 2021

fejta-bot commented Jun 29, 2021

michaelgugino commented Jul 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-triage-robot commented Oct 13, 2021

k8s-triage-robot commented Nov 12, 2021

k8s-triage-robot commented Dec 12, 2021

k8s-ci-robot commented Dec 12, 2021