Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KEP: Node Maintenance Lease #1411

Conversation

michaelgugino
Copy link

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 13, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @michaelgugino!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 13, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @michaelgugino. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 13, 2019
@michaelgugino
Copy link
Author

cc @smarterclayton @timothysc

@michaelgugino
Copy link
Author

Some notes from recent sig-node meeting.

Sig-node members suggested that the ownership of this KEP belongs to sig-cluster-lifecycle. I think that's a good approach.

Also, TODO: Add some notes about not impacting the kubelet or scheduling. The proposed lease has no direct effect on the kubelet, only serves as informational to other components that someone/something is requesting exclusive access to disrupt the kubelet.

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelgugino here's some feedback. It's only nits.

keps/sig-node/20191211-node-maintenance-lease.md Outdated Show resolved Hide resolved
keps/sig-node/20191211-node-maintenance-lease.md Outdated Show resolved Hide resolved
keps/sig-node/20191211-node-maintenance-lease.md Outdated Show resolved Hide resolved
keps/sig-node/20191211-node-maintenance-lease.md Outdated Show resolved Hide resolved
@sftim
Copy link
Contributor

sftim commented Dec 18, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 18, 2019
@michaelgugino michaelgugino force-pushed the node-maintenance-lease branch 2 times, most recently from 910461d to 0fe63a5 Compare December 19, 2019 14:59
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit. Not a biggie.

@michaelgugino
Copy link
Author

One more nit. Not a biggie.

@sftim Your new suggestions didn't come through here on github.

@sftim
Copy link
Contributor

sftim commented Dec 19, 2019

D'oh! Thanks for explaining.

The Lease object should be created automatically, and the ownerRef should
be the corresponding node so it is removed when the node is deleted.

### User Stories [optional]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### User Stories [optional]
### User Stories

was what I meant @michaelgugino

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts here @michaelgugino :)

I added some high-level questions and thoughts.

Utilize the existing `Lease` built-in API in the API group `coordination.k8s.io`.
Create a new Lease object per-node with Name equal to Node name in a newly
created dedicated namespace
That namespace should be created automatically (similarly to "default" and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading the above as we're proposing the addition of a third "automatically" created namespace, which we will name "kube-maintenance-leases" (or something similar). Am I understanding correctly?

Out of curiosity, do you know if there's policy or precedent around when we add new default namespaces? I'm not sure if this concern is justified, but the thought of each new feature creating its own default namespace kind of concerns me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating a new default namespace is probably the cleanest implementation. I'm unsure if there is a formal namespace review process. I'm relying on the reasoning here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md#proposal

I'm don't really have a hard preference which namespace it lives in; putting it in the same namespace as the heartbeat would cause a name collision, and that's undesirable from a UX position (if node-names are already at max-length, then we need some kind of hashing or other mechanism which sacrifices readability).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also unsure if there's a formal namespace review process :) Hopefully some other folks will be able to weigh in with insights :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an end-to-end flow would help improve understanding.


## Motivation

Currently, there is no centralized way to inform a controller or user that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity - do you know how folks are addressing this issue now? I.e. these days, if a cluster administrator needs to perform a disruptive operation against a node, what will they do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what components they have running. Those components might use some annotation or other mechanism to pause operations, but I don't know of any specific component doing that today.

Some non-trivial conflicting components that exist today:

https://github.com/kubevirt/node-maintenance-operator

NMO is for assisting in scheduling maintenance of bare metal hardware.

https://github.com/openshift/machine-config-operator/

MCO is for upgrading node configurations.

You don't want those two components to compete.

In addition, say you're powered off a bare metal machine that was provisioned via cluster-api. The Machine Health Checker kubernetes-sigs/cluster-api#1684 would delete that node without some kind of intervention today.

Use a designated annotation on the node object, or possibly
elsewhere. Drawbacks to this include excess updates being applied to the node
object directly, which might have performance implications. Having such an
annotation natively seems like an anti-pattern and could easily be disrupted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading this statement as saying that we can't use annotations as a locking mechanism, as that is not what they were designed for - I'm I understanding correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use annotations for this. I'm not sure this type of operation fits the mold of annotations, and annotations seems to be not meant for this type of coordination; it doesn't seem like there should be a system-wide annotation, I don't think I've seen that.

In any case, if other people decide annotation is the best way/place to do this, it's perfectly fine for me, but the drawback of many things reading/writing to the node makes it less than ideal IMO.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used annotations for this, what would that look like, a key with a value which is the lease end time? What prevents someone from overwriting it? (Is that misbehaving?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotations would look pretty ugly, probably. I haven't sat down and actually drawn out how it would work, but probably it takes more than 1 annotation, possibly 3 or more. 1 for acquire time. 1 for duration/release-time, 1 for ownership. These values could be coerced into a CSV string, but that's probably even worse.

I think if anyone has a solid implementation alternative they want specified here, I can include it, but I don't feel inclined to specify in such detail something that's not really being considered.

with a specific name. Clients could use a Get-or-Create method to check the
lease. This will make installing components that support this
functionality more difficult as RBAC will not be deterministic at install time
of those components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative to consider... (I think - I apologize if it doesn't actually meet the use case...) :)

Could we implement this entire workflow outside of core Kubernetes using CRD and custom controllers? I.e. for those who wished to use MaintenanceLease, they could deploy the custom controller and create the CRD. I think that within the controller implementation, the proper locking behavior could be implemented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly it's an alternative. It presents more work for everyone. If you want to use the CRD based-method, you still need to agree on a namespace (or do system-wide), RBAC, and then introduce a hard-dependency on your project (this CRD needs to be installed first).

I think utilizing the built-in coordination API over a CRD would be more optimal. We could do things like add an option to kubectl to check for a lease before drain/cordon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of an add-on, whether that's using a CRD or an annotation on Node resources. If the core of Kubernetes doesn't provide enough to let the add-on hook in the way this KEP intends, how about providing just enough extension to make the add-on possible?

kubectl could be aware of the add-on and react differently depending on whether a maintenance lease was set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against using a CRD for this. There is already the coordination API, which seems specifically designed for this type of thing. I would be okay with an annotation on the node.

I'm unsure what you mean by 'add-on' in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that this node maintenance lease not become part of the Kubernetes core, and instead it becomes a component that a cluster operator can add if they like it: an add-on.

Another KEP, #1308, covers a SIG Cluster Lifecycle mechanism for a cluster operator to configure which add-ons are added into a new cluster, and covers selection of the default configuration as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sftim

I'm suggesting that this node maintenance lease not become part of the Kubernetes core

Why are you suggesting this? Why would an add-on be preferential here over making it part of the core?

Copy link
Author

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good feedback questions, thanks.

For me, I think the primary discussion points are:

  1. Do we want to introduce a new namespace for this?
  2. Do we want said namespace to be a system-namespace automatically created?
  3. If we introduce a new system-namespace, do we want it's scope to be single-purpose, or do we need room for a kube-system-2 type namespace with a more general purpose?
  4. Should we just declare a kubernetes-wide annotation that can be applied to the node, and skip a bunch of hassle? EG: node-maintenance-lease.k8s.io: "{'acquired-time': xxx, 'expire-time': xxx}" or similar.

@sftim
Copy link
Contributor

sftim commented Dec 20, 2019

Is there any interest in providing a more generic mechanism? For example, maybe I can mark maintenance against a StatefulSet, Service, or PersistentVolume. Maintenance doesn't just happen to Nodes.

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great summary @michaelgugino !

Yeah, I think at the end of the day, my main question (which I don't know the answer to) is what is the cost of adding a new system-wide namespace? My high level fear is that, if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand. Whether that's an actual problem is definitely up for debate though.

One thought - is there a way we could create the system-wide namespace "on demand"? I.e. it does not exist until the first time someone tries to take a maintenance lease? With the adhoc creation, the only folks experiencing "namespace pollution" are those who have knowingly opted into this feature.

I agree with you @michaelgugino that the Lease mechanism does feel like a mechanism for implementation, so I agree with your hesitancy to create something else from scratch.

@sftim
Copy link
Contributor

sftim commented Dec 22, 2019

BTW add-ons are things like https://kubernetes.io/docs/concepts/cluster-administration/addons/

@sftim
Copy link
Contributor

sftim commented Dec 22, 2019

if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand

Seems valid to call that out (it's a concern for me too).

@michaelgugino
Copy link
Author

if the trend of creating new namespaces continued and if we fast forward a year, an engineer new to k8s would launch a cluster and see a ton of new namespaces they didn't understand

Seems valid to call that out (it's a concern for me too).

We could not add a new namespace and use an existing one. I'm open to suggestion here. I went with new due to the reasoning here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md#proposal

we considered using kube-system namespace but decided that it's already too overloaded

We could definitely use the kube-system namespace, or we could create some new more-generic namespace that has a variety of uses (which sounds like kube-system).

@wojtek-t
Copy link
Member

There is non-negligible overlap between #1116 and this KEP. I think we should try to merge those two together somehow.

@yastij

@michaelgugino
Copy link
Author

There is non-negligible overlap between #1116 and this KEP. I think we should try to merge those two together somehow.

@yastij

@wojtek-t I disagree, there is not much overlap. That KEP is about a node that goes down, this one is about maintenance. The proposed component(s) in the shutdown-KEP could respond to the proposed lease in some way, or not. Ideally, if you have the maintenance lease, you drain the node before doing things to it, so the shutdown-KEP wouldn't really take any meaningful effect.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@michaelgugino
Copy link
Author

Hi @michaelgugino,

Would you please open a tracking issue in this repo for your KEP and convert this to the new template? Thanks!

It's pretty onerous to require existing open keps to use some new format. I'm fine for new keps using the new template, but we shouldn't require all of them prior to merging.

@ehashman
Copy link
Member

@michaelgugino as this currently stands this KEP can't be tracked by the enhancements team as it doesn't have an associated issue in the repo. This is not a new requirement. You don't need to fill out a new template, I'm just asking you to rename your file and split out the metadata.

Comment on lines +103 to +106
The Lease object will be created automatically by the NodeLifecycleController,
and the ownerRef will be the corresponding node so it is removed when the node
is deleted. If the Lease object is removed before the node is deleted, the
NodeLifecycleController will recreate it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the typical pattern for leases? Would it make sense for the lease to be created by the first thing trying to grab it, and then release it when they are finished by removing it? IIRC the lease libraries normally support this? Or is the advantage here that the NodeLifecycleController can make sure the owner references are set correctly?


### Use existing system namespace for lease object

We could utilize an existing system namespace for the lease object. The primary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original client-go locking used an annotation on a resource (eg a ConfigMap), I think this is what Tim refers to. The implementation underneath would attempt to patch the annotation value with the information about the lease (holder, duration etc) and rely on etcd to ensure conflicts were handled appropriately (ie if you tried to acquire after someone else had updated, you get a 409)

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2021
@michaelgugino
Copy link
Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 1, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2021
@michaelgugino
Copy link
Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 15, 2021
Comment on lines +5 to +7
owning-sig: sig-cluster-lifecycle
participating-sigs:
- sig-node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating comments from before, IMO this should be in reverse.
KEP folder / code owner (NodeLifecycleController) -> KEP owner
other SIGs -> participants

participating-sigs:
- sig-node
reviewers:
- "@neolit123"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can update the list with participants in this KEP review.

reviewers:
- "@neolit123"
approvers:
- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, ideally, someone who maintains NodeLifecycleController.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2021
@ehashman ehashman added this to Waiting on Author in SIG Node PR Triage Dec 6, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

SIG Node PR Triage automation moved this from Waiting on Author to Done Dec 12, 2021
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet