Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use leader election API if available #460

Closed
DirectXMan12 opened this issue May 31, 2019 · 19 comments · Fixed by #1147
Closed

use leader election API if available #460

DirectXMan12 opened this issue May 31, 2019 · 19 comments · Fixed by #1147
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@DirectXMan12
Copy link
Contributor

We should use the new leader election api when available. We can check if it's around via discovery, and then use it if available, falling back to configmaps if not

/kind feature
/priority backlog

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. labels May 31, 2019
@adohe-zz
Copy link

adohe-zz commented Jun 3, 2019

here leader election api means coordination.k8s.io api? @DirectXMan12

@DirectXMan12
Copy link
Contributor Author

yeah

@mszostok
Copy link
Contributor

mszostok commented Aug 28, 2019

Hi @DirectXMan12, I can provide a PR for that 🙂
but are you still interested in having that?

@mszostok
Copy link
Contributor

mszostok commented Sep 1, 2019

/assign

@DirectXMan12
Copy link
Contributor Author

yeah, still interested. Will have to coordinate with #444

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2019
@DirectXMan12
Copy link
Contributor Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 2, 2019
@vincepri vincepri added this to the Next milestone Feb 21, 2020
@vincepri
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 21, 2020
@mszostok mszostok removed their assignment Mar 24, 2020
@timebertt
Copy link
Contributor

timebertt commented Aug 4, 2020

What's the status on this issue?
I see there were some attempts at solving this but none of them made it, and kind of related PR #444 is still open.

If using the lease API based on whether it's available or not is not an option, I would still be interested in having the LE resource lock configurable in leaderelection.Options.
We could add Options.LeaderElectionResourceLock which can be set to one of the locks supported by client-go and would be passed directly to resourcelock.New here.
If migration is a concern (like it was in #600), this approach would allow controllers/users to use multilocks (endpointleases, configmapleases for migration).
WDYT?

This would allow controllers to make the LE lock configurable just like in kcm, ccm, etc. (ref --leader-elect-resource-lock).
It would very much help in our usecase (Project Gardener), as we have some scalability issues, when running hundreds of controllers in one cluster, which all use configmap locks by default (results in many PUT configmap requests, which in turn flood the API server's watch cache).

If you like the proposed approach, I would definitely be willing to try to solve it in a PR :)

@timebertt
Copy link
Contributor

@DirectXMan12 @vincepri
WDYT about making the lock resource configurable via leaderelection.Options?

@vincepri
Copy link
Member

cc @alvaroaleman, given that there was some discussion on #600.

How we approach this problem depends on what we want our compatibility story look like. I'm fine discussing the best way forward that makes sense for most of our users, although I want to make sure we don't break the world at the same time.

If we can get a pluggable leader election strategy that'd be the best way to keep backward compatible behaviors.

@alvaroaleman
Copy link
Member

alvaroaleman commented Aug 18, 2020

If we can get a pluggable leader election strategy that'd be the best way to keep backward compatible behaviors.

Yeah, maybe just expose the strategy as a config option? Its possible to e.G. require both a configmap and a leader lease for a transition. I was first thinking that we should maybe do that for a couple of releases, but if ppl skip those releases, they might end up with two parallel leaders due to the changed strategy. Making it configurable would avoid that.

Only drawback is that we will forever have a default that isn't great.

@vincepri
Copy link
Member

The other solution would be to reduce the scope of Kubernetes versions we support, or rather just increase it.

@alvaroaleman
Copy link
Member

The other solution would be to reduce the scope of Kubernetes versions we support, or rather just increase it.

This doesn't really have anything to do with Kubernetes versions. This was added in 1.11 or something IIRC so presumably everyone has this. The issue is that if we change the strategy, the next time something gets deployed, there might be two active leaders, one with the old strategy, one with the new strategy. Furthermore, changing the default won't result in compile error or anything, just in a runtime "we broke a certain guarantee".

@vincepri
Copy link
Member

Ah I see what you mean, I misunderstood the earlier comment. Let's discuss it at the community meeting?

@alvaroaleman
Copy link
Member

Ah I see what you mean, I misunderstood the earlier comment. Let's discuss it at the community meeting?

sgtm

@alvaroaleman
Copy link
Member

We discussed this on the community meeting on Aug 27 and concluded that:

  • We want to change the default from ConfigMapsResourceLock to ConfigMapsLeasesResourceLock and keep this new default for a long time, to make sure that everyone who upgrades controller-runtime uses this rather than jumping all controller-runtime versions that have this
  • We want to make the strategy configurable so that ppl can start using the LeasesLock once they know all users of their controllers were transitioned

@timebertt
Copy link
Contributor

Sounds great!
/assign @timebertt
for the second part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
8 participants