Enable HNC leader election in controllers #94

joe2far · 2021-10-12T09:56:55Z

updates hnc reconcilers to be leader aware and only perform writes when leader
non leader instance run updating it's forest tree cache, but will not perform any writes
all hnc instances can respond to validating / mutating webhooks
when a leader election occurs objects are re-queued
adds leases permissions to cluster role

manual testing completed with 3 instances and --enable-leader-election

manually killing leader instance at different times and running tests creating many parent child ns hierarchys
validated that the new instance took over and handled updating child ns resources
also validated that all 3 instances were successfully handling validation webhook calls

k8s-ci-robot · 2021-10-12T09:57:03Z

Hi @joe2far. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-10-12T09:57:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: joe2far
To complete the pull request process, please assign tashimi after the PR has been reviewed.
You can assign the PR to them by writing /assign @tashimi in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adrianludwin · 2021-10-12T14:31:52Z

Hey I'll review this shortly, but FYI I didn't know anyone had any big changes under way and moved a lot of files around this weekend :( Sorry about that! See #92 for details, but the summary is that instead of arranging the code by function (reconcilers, mutators, and validators), I've arranged the code by feature (anchors, hierarchyconfig, hncconfig, namespace, objects). Most of the old files still exist but have been renamed, e.g.:

internal/reconcilers/anchor.go -> internal/anchor/reconciler.go
internal/reconcilers/hierarchy_config.go -> internal/hierarchyconfig/reconciler.go
internal/reconcilers/hnc_config.go -> internal/hncconfig/reconciler.go
internal/reconcilers/object.go -> internal/objects/reconciler.go
internal/reconcilers/setup.go -> internal/setup/reconcilers.go

Hopefully most of the changes should be easy to port.

adrianludwin

Good start, thanks!

internal/reconcilers/anchor.go

internal/reconcilers/hnc_config.go

internal/reconcilers/object.go

internal/forest/forest.go

adrianludwin · 2021-10-13T17:27:42Z

Hey @joe2far , I don't want to rush you or anything but I'm thinking about releasing HNC v0.9 shortly (e.g. in the next week or so). Do you think you'll be working on this over the next few days - i.e., do you want me to wait for this to go in? Or is this a lower-priority project for you? Thanks!

joe2far · 2021-10-15T17:16:39Z

@adrianludwin sorry, haven't had time to progress this further, it would be nice to have this in the upcoming release, but it won't be ready in time, I hope to get this PR in better state for review by mid next week

adrianludwin · 2021-10-15T17:21:29Z

No worries, thanks for the contribution. I probably won't wait too long been v0.9 and v1.0 so we'll get it in soon anyway.

…

On Fri, 15 Oct 2021, 1:16 pm Joe Farrell, ***@***.***> wrote: @adrianludwin <https://github.com/adrianludwin> sorry, haven't had time to progress this further, it would be nice to have this in the upcoming release, but it won't be ready in time, I hope to get this PR in better state for review by mid next week — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZBQM3HYZ6W5SS7QUCLUHBOYHANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

joe2far · 2021-10-18T08:45:53Z

/hold

cmd/manager/main.go

internal/anchor/reconciler.go

internal/hncconfig/reconciler.go

internal/forest/forest.go

internal/setup/reconcilers.go

joe2far · 2021-10-19T15:03:43Z

@adrianludwin I have addressed your added comments, but have hit a more fundamental problem (while testing) with the approach of having reconcilers and webhooks active/running on non-leader hnc instances.

the reconciler for webhook cert rotator and other hnc reconcilers are added as leaderElectionRunnables (ref: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/internal.go#L69)

sequence in controller manager for starting runnable is:

go startNonLeaderElectionRunnables
  -> waitForCache
  -> start nonLeaderElectionRunnables
[After leader election]
go startLeaderElectionRunnables
  -> waitForCache
  -> start leaderElectionRunnables

due to above the reconciler for webhook cert rotator is only ever started after the instance is successfully elected the leader,
so webhooks, even though they are nonLeaderElectionRunnables are blocked by cert setup + will only ever start after instance is elected the leader

from logic in manager, it determines leader vs nonleader runnable from NeedLeaderElection() method
https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/internal.go#L209
I had started to add NeedLeaderElection()=false to the hnc reconcilers, but the cert rotator reconciler is set up in external lib open-policy-agent/cert-controller and blocks webhooks + reconcilers setup

I am now not sure what the best approach is? and would like to get your input on direction?
1.) Should we implement our own HNC LeaderElection lease logic to set IsLeader config for each instance (allowing ctrl mgr to start all runnables separately from this config toggle)?
2.) Or should we be using the leader election logic in controller manager runtime, making all webhooks and reconcilers nonLeaderElectionRunnables (and pulling the mgr Elected() chan to toggle their behaviour via IsLeader config)?

adrianludwin · 2021-10-19T15:52:15Z

Hmm. Is it a problem that the cert rotator isn't running? The cert rotator's output is dumped into the webhook config and secret on the apiserver, so it shouldn't matter if it's only running in one of the three replicas, right? The only problem I might see is that after the secret is updated, it can take up to 90s for all the pods to get updated with the new secret, so if the leader dies within 90s of first starting, the other pods might be in a bad state until the 90s are up. But this seems like a bit of a corner case.

So functionally, we shouldn't care (IIUC). Are you saying that something in the framework is stopping the webhooks from running at all?

joe2far · 2021-10-19T15:57:21Z

Hmm. Is it a problem that the cert rotator isn't running? The cert rotator's output is dumped into the webhook config and secret on the apiserver, so it shouldn't matter if it's only running in one of the three replicas, right? The only problem I might see is that after the secret is updated, it can take up to 90s for all the pods to get updated with the new secret, so if the leader dies within 90s of first starting, the other pods might be in a bad state until the 90s are up. But this seems like a bit of a corner case.

So functionally, we shouldn't care (IIUC). Are you saying that something in the framework is stopping the webhooks from running at all?

at following point ->
https://github.com/kubernetes-sigs/hierarchical-namespaces/blob/master/cmd/manager/main.go#L203
we currently block registration of webhooks until certs are created, so webhooks will only be adde/set up after cert rotator setup is run, which is only run when the current instance is elected as the leader as a leaderElectionRunnable

adrianludwin · 2021-10-19T16:11:37Z

ewww. That's unfortunate. I'm not 100% sure what to do here. We could change the cert-controller library (they'd be happy to accept the patch and quickly push out a new release) but we'd have to understand how to make _that_ work in leader-follower mode too. It might be pretty easy since it already knows not to touch anything if the certs are all good. Without that, I think our only option is to add some logic to block until the secret matches the caBundle in the webhook client config, since anyone can observe that. As an aside, it looks like the comment that we need to start the webhooks before the reconcilers is no longer true, since the bug that prompted it was fixed a long time ago ( 619d6e0). I don't think that _really_ helps us here though. wdyt?

…

On Tue, Oct 19, 2021 at 11:57 AM Joe Farrell ***@***.***> wrote: Hmm. Is it a problem that the cert rotator isn't running? The cert rotator's output is dumped into the webhook config and secret on the apiserver, so it shouldn't matter if it's only running in one of the three replicas, right? The only problem I might see is that after the secret is updated, it can take up to 90s for all the pods to get updated with the new secret, so if the leader dies within 90s of first starting, the other pods might be in a bad state until the 90s are up. But this seems like a bit of a corner case. So functionally, we shouldn't care (IIUC). Are you saying that something in the framework is stopping the webhooks from running at all? at following point -> https://github.com/kubernetes-sigs/hierarchical-namespaces/blob/master/cmd/manager/main.go#L203 we currently block registration of webhooks until certs are created, so webhooks will only be adde/set up after cert rotator setup is run, which is only run when the current instance is elected as the leader as a leaderElectionRunnable — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZAHNYNEFSRIJFSQ46TUHWIOXANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

joe2far · 2021-10-19T17:12:34Z

I haven't looked into the cert-rotator logic, but as you said it does already knows not to touch anything if the certs are all good

I have done a test, where I updated the logic in the controller manger (at https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/internal.go#L211) to force all hnc webhooks , controllers/reconcilers(including the cert rotator) into the nonLeaderElectionRunnables for the mgr

From this test, I got the expected behaviour with all controllers + webhooks starting up on all instances (not just the leader) and was able to see mutating/validating webhooks being hit all all instance (not just the leader) so it looks like cert controller might work, but there is likely to be some race condition issues (which I may have just been lucky to have missed...)

I will do some more testing and look further into the cert-controller library... but getting all HNC runnables into the mgr as nonLeaderElection runnables may work to get the behaviour we are looking for with HNC leader-follower, provided they all can function in leader-follower mode (or with possibly with multiple instances running...)

erikgb · 2022-01-30T19:55:25Z

@joe2far What is the status on this? I think enabling HNC leader election would be excellent to get into the next release (GA, 1.0?). Please let me know if I can assist in any way to get this over the finish line. 😄

erikgb · 2022-01-30T20:21:03Z

I am not sure if introducing an additional dependency would be a good idea, but we might have a look at the "Leader for life" approach implemented as part of operator-sdk/operator-lib:

https://github.com/operator-framework/operator-lib/blob/0449c0c509287014f1095739b06279e9113105da/leader/leader.go#L88

Docs: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#leader-for-life

Relevant issue: operator-framework/operator-lib#48

joe2far · 2022-01-30T22:40:06Z

hi @erikgb , I have not looked at this for some time, so PR is stale but leader election behaviour was manually tested, when I was last looking into this....

the solution here, uses the leader election from controller-runtime which is k8s lease based, however in order for hnc reconcilers to work in leader-follower mode using this, they need to be updated with NeedLeaderElection=false to make them start as NonLeaderElectionRunnables and then based upon mgr.Elected() chan they can become leader instance ( only the leader instance performs writes, whereas follower instances don't do any k8s writes but respond to webhooks and perform other actions, in order to keep their forest cache in sync, so they are warm instances, that can become the leader when elected)

I feel using k8s-runtime leader election, but setting up all mgr reconcilers as non leader election runnables is a bit hacky and would agree that using a solution similar to leader election from https://github.com/operator-framework/operator-lib/
may be better than, working around the the k8s-runtime, which is really more designed for just a single active leader. instance.
@adrianludwin what are your thoughts on swapping from using k8s-runtime leader election, to use operator-framework/operator-lib/ leader or a similar implementation?
better yet - may be to use https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/leaderelection.go similar to what is being used in k8s-runtime but with hnc so we have better controller over it...

I will have some free cycles in the coming week and will look into reviving this PR (but would @erikgb welcome any assistance, on offer) and will probably swap over the leader election implementation used in this change.

joe2far · 2022-01-31T09:33:15Z

for example following is an example leader-follower setup, using the operator-framework/operator-lib/ leader used by vault
see: https://github.com/hashicorp/vault-k8s/blob/c02054a37dd6c141c02975679a08cfa1053d245d/leader/leader.go

linux-foundation-easycla · 2022-02-01T12:45:20Z

The committers are authorized under a signed CLA.

✅ Joe Farrell (2bb9745)

joe2far · 2022-02-01T18:26:13Z

I have now rebased this PR to latest HNC master and have added some logging around the reconcilers leader election
And I have again tested locally the basic logic...
@erikgb it would be good to get additional review / testing to verify logic?

...
while above comments still stand, I feel moving the leader election logic into hierarchical ns controller, while on one hand is cleaner (as k8s-runtime leader election is not really meant to support leader-follower mode) but separately as part of moving this into hnc, it would likely be duplicating / adding similar the logic to wrap client-go/tools/leaderelection and disabling the leader election in k8s-runtime, which may be equally as confusing...

erikgb · 2022-02-01T18:33:01Z

One thing I am not able to understand: Why do we need a leader-follower leader-election? I would prefer if we could use the standard single leader approach provided by default by controller-runtime. We use that for all our in-house controllers, and can scale the number of replicas solely based on webhook requirements. The webhooks must work at all times, as it is extremely important for the general cluster availability, and I have heard people setting up 9 replicas just to reduce the possibility of no webhook endpoint not available.

I tested the standard controller-runtime leader election locally with HNC, and that does not work at all. And I'll guess that the reason this PR is suggesting to build some kind of custom leader election mechanism. Why wouldn't it be possible to use the standard leader election with some refactoring/changes? I think the webhooks should be able to work independently of the reconciler manager. So if you spin up 10 replicas, you will have 10 fully functional webhooks, but just a single manager performing reconciles. AFAIK most in-tree reconcilers in K8s runs with a single worker, i.e. ReplicaSet->Pod.

Maybe @adrianludwin can explain why HNC cannot do this? IMO the webhooks should have a read-only cache of the forest, while the reconcilers should have a read-write cache. Would that be possible to achieve? Where is the potential blocker? The webhook cert-issuer?

adrianludwin · 2022-02-01T19:40:00Z

Webhook performance has never been an issue since we serve entirely from memory. I believe we can handle a *far* higher QPS than most K8s apiservers can manage even with a single replica. However, even if this were not true, my understanding is that the built-in controller-runtime leader election ensures that only one replica runs the controller-manager. Of course, this won't work for HNC, since it's the controller-manager (i.e. the reconcilers) that maintain the state of the forest. So we'd need to override the default behaviour and start all the reconcilers in all the replicas, but ensure that only one of them is able to write. While I think this is unnecessary, what _is_ necessary for HNC is to have a replica be able to take over _quickly_ from the leader if it's lost, and this requires exactly the same thing: the reconcilers must all be running in a read-only mode, and then must switch to read-write mode as quickly as possible (and also reconcile any changes that happened after the leader was lost but before the election concluded). This is pretty nonstandard behaviour AIUI. I hope that explains the complexities here.

…

On Tue, Feb 1, 2022 at 1:33 PM Erik Godding Boye ***@***.***> wrote: One thing I am not able to understand: Why do we need a leader-follower leader-election? I would prefer if we could use the standard single leader approach provided by default by controller-runtime. We use that for all our in-house controllers, and can scale the number of replicas solely based on webhook requirements. The webhooks must work at all times, as it is extremely important for the general cluster availability, and I have heard people setting up 9 replicas just to reduce the possibility of no webhook endpoint not available. I tested the standard controller-runtime leader election locally with HNC, and that does not work at all. And I'll guess that the reason this PR is suggesting to build some kind of custom leader election mechanism. Why wouldn't it be possible to use the standard leader election with some refactoring/changes? I think the webhooks should be able to work independently of the reconciler manager. So if you spin up 10 replicas, you will have 10 fully functional webhooks, but just a single manager performing reconciles. AFAIK most in-tree reconcilers in K8s runs with a single worker, i.e. ReplicaSet->Pod. Maybe @adrianludwin <https://github.com/adrianludwin> can explain why HNC cannot do this? IMO the webhooks should have a read-only cache of the forest, while the reconcilers should have a read-write cache. Would that be possible to achieve? Where is the potential blocker? The webhook cert-issuer? — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZEOP7DXJLSIEGMHLL3UZAROTANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

erikgb · 2022-02-01T20:23:05Z

I am not worried about webhook performance, but webhook availability is essential IMO. A failing webhook endpoint can affect all API calls in the cluster, so you typically want to run at least two replicas serving webhook endpoints - to be able to handle smooth updates and eventual cluster issues - like loosing a node. And several controllers I am aware of, run webhooks in dedicated workloads - just for this reason. One example is cert-manager. And this was also noted/suggested in this comment: #68 (comment)

Why cannot the webhooks run with their dedicated (read-only) forest - based on code shared with the reconcilers? I understand that this would eventually be a huge change in HNC, but HA for webhooks is important, and as a bonus we might be able to use the standard leader-election in controller-runtime.

When is the next community meeting? Maybe this subject is better addressed there, than as comments in this PR. 😉

adrianludwin · 2022-02-01T20:28:49Z

Good point about availability. Although at least the _object_ webhook fails open IIRC so your cluster doesn't come to a crashing halt. But better if we don't need to rely on that. I'm not sure how you'd share code with the webhooks without basically reimplementing _almost_ the entire reconciler. Most of the reconcilers' logic is in fact about updating the forest to match what's on the server, so it seems much easier to me to modify the reconcilers to run in read-only mode (i.e., exactly what's necessary to create a read-only forest) than to try to factor out the common code, and then set up our own custom replacement for the controller-manager, just so that we can use standard leader election. *Something's* going to have to be nonstandard :) We don't usually discuss issues like this in our regular meetings, if needed we could have a sync in the next couple of days but otherwise I'm ok to keep this on PRs.

…

On Tue, Feb 1, 2022 at 3:23 PM Erik Godding Boye ***@***.***> wrote: I am not worried about webhook performance, but webhook *availability* is essential IMO. A failing webhook endpoint can affect all API calls in the cluster, so you typically want to run *at least* two replicas serving webhook endpoints - to be able to handle smooth updates and eventual cluster issues - like loosing a node. And several controllers I am aware of, run webhooks in dedicated workloads - just for this reason. One example is cert-manager. And this was also noted/suggested in this comment: #68 (comment) <#68 (comment)> Why cannot the webhooks run with their dedicated (read-only) forest - based on code shared with the reconcilers? I understand that this would eventually be a huge change in HNC, but HA for webhooks is important, and as a bonus we might be able to use the standard leader-election in controller-runtime. When is the next community meeting? Maybe this subject is better addressed there, than as comments in this PR. 😉 — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZFUJ5N7MJECINM2643UZA6LJANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

erikgb · 2022-02-01T21:15:53Z

Good point about availability. Although at least the object webhook fails open IIRC so your cluster doesn't come to a crashing halt. But better if we don't need to rely on that.

I would consider any webhook failing for "technical reasons" something that can be improved (bug). 😉

I'm not sure how you'd share code with the webhooks without basically reimplementing almost the entire reconciler. Most of the reconcilers' logic is in fact about updating the forest to match what's on the server, so it seems much easier to me to modify the reconcilers to run in read-only mode (i.e., exactly what's necessary to create a read-only forest) than to try to factor out the common code, and then set up our own custom replacement for the controller-manager, just so that we can use standard leader election. Something's going to have to be nonstandard :)

As I wrote in my previous comment, I understand that changing this to something that represents reusable code will be a BIG refactoring. That said, I think it could make sense to decouple the forest cache and reconcilers in long-term. Somehow I imagine that the forest cache could include an informer component, that would be responsible for dispatching events to the reconcilers. And function as an adapter around the standard write-through cache + informer in controller-runtime.

We don't usually discuss issues like this in our regular meetings, if needed we could have a sync in the next couple of days but otherwise I'm ok to keep this on PRs.

Yes, I would be interested in a sync - if you are available. Just ping me on Slack.

To summarize my stand on short-term goals, I think enabling HA on webhooks is very important. And to get it, it seems like we need some kind of leader election.

adrianludwin · 2022-02-02T13:24:11Z

What's the difference between "allowing the reconcilers to operate in read-only mode" and "factoring all the reading code out of the reconcilers and adding an informer between them?" I agree it *could* work, but I don't really see the benefits. It seems to me that you'll be adding a lot of boundaries and complications to *most* of the code (the reconcilers) in order to save a small amount of complexity in a tiny amount of the code (leader election is a pretty simple feature). I'll try to give you a call on Friday morning Toronto time.

…

On Tue, Feb 1, 2022 at 4:16 PM Erik Godding Boye ***@***.***> wrote: Good point about availability. Although at least the *object* webhook fails open IIRC so your cluster doesn't come to a crashing halt. But better if we don't need to rely on that. I would consider any webhook failing for "technical reasons" something that can be improved (bug). 😉 I'm not sure how you'd share code with the webhooks without basically reimplementing *almost* the entire reconciler. Most of the reconcilers' logic is in fact about updating the forest to match what's on the server, so it seems much easier to me to modify the reconcilers to run in read-only mode (i.e., exactly what's necessary to create a read-only forest) than to try to factor out the common code, and then set up our own custom replacement for the controller-manager, just so that we can use standard leader election. *Something's* going to have to be nonstandard :) As I wrote in my previous comment, I understand that changing this to something that represents reusable code will be a BIG refactoring. That said, I think it could make sense to decouple the forest cache and reconcilers in long-term. Somehow I imagine that the forest cache could include an informer component, that would be responsible for dispatching events to the reconcilers. And function as an adapter around the standard write-through cache + informer in controller-runtime. We don't usually discuss issues like this in our regular meetings, if needed we could have a sync in the next couple of days but otherwise I'm ok to keep this on PRs. Yes, I would be interested in a sync - if you are available. Just ping me on Slack. To summarize my stand on short-term goals, I think *enabling HA on webhooks is very important*. And to get it, it seems like we need some kind of leader election. — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZGADORTH7SMRUZL333UZBERLANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

erikgb · 2022-02-02T14:06:00Z

(leader election is a pretty simple feature)

I disagree 😉, and I think I have some support for my stand from controller-runtime and operator-lib/operator-sdk: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#leader-election and https://github.com/kubernetes/client-go/blob/30b06a83d67458700a5378239df6b96948cb9160/tools/leaderelection/leaderelection.go#L21-L24.

I'll try to give you a call on Friday morning Toronto time

Ack! 👍

adrianludwin · 2022-02-02T16:27:23Z

*Implementing* leader election is crazy hard, but I don't think that controller-runtime is *using* it in any particularly special way. E.g. the warnings on that comment about RenewDeadline and LeaseDuration are just passed on to the controller-runtime clients anyway (here <https://github.com/kubernetes-sigs/controller-runtime/blob/f236f0345ad2933912ebf34bfcf0f93620769654/pkg/manager/manager.go#L178>). At worst, we just use their sensible defaults (15s and 10s, respectively). However, I doubt it would come to that. I'd probably do something like: - Make all reconcilers *not* require leader election; by default, they all run in read-only mode - Add a single controller-runtime Runnable called something like BecomeLeader; when it starts, it changes the mode and asks all the reconcilers to fully re-sync themselves That should preserve the vast majority of both the controller-runtime LE infrastructure, as well as have a minimal impact on HNC.

…

On Wed, Feb 2, 2022 at 9:06 AM Erik Godding Boye ***@***.***> wrote: (leader election is a pretty simple feature) I disagree 😉, and I think I have some support for my stand from controller-runtime and operator-lib/operator-sdk: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#leader-election and https://github.com/kubernetes/client-go/blob/30b06a83d67458700a5378239df6b96948cb9160/tools/leaderelection/leaderelection.go#L21-L24 . I'll try to give you a call on Friday morning Toronto time Ack! 👍 — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZG2354C75N6T4PE2RLUZE25HANCNFSM5F2FCCIA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

adrianludwin · 2022-02-07T00:35:49Z

Oh man I forgot to hit "submit" on a review I wrote in early November. Sorry! Will re-review now.

adrianludwin

Left a few comments. @joe2far , thanks so much for reviving this.

I'll try to summarize the conversation I had with @erikgb on Friday. Basically, his top priority is making the webhooks highly available, and not the reconciler. Technically, this means we don't actually need LE for this one purpose; instead, we could:

Have a deployment of a single pod for the reconciler which doesn't serve webhooks
Have another deployment of 3 pods that only serve the webhooks

These might be exactly the same image, just running with a different command-line flag. And it would probably work just fine. The only problem would be that if the reconciler pod ever died, it could be a few minutes (on a large cluster) before it became active again. And in that time, the webhooks could show some undefined behaviour if someone was busy modifying the hierarchy.

However, why should we wait for a few minutes for the reconciler pod to restart, if all the webhook pods already have all the information needed to reconcile as well? Also, this could use up a lot of the apiserver's bandwidth for a while. Instead, we can use LE to just turn one of the replicas into the leader, since it already has all its information loaded into memory and ready to go. Even re-syncing all its reconcilers is probably very fast since all the API objects will be in the controller-runtime cache. There might still be a short period where the webhooks are not fully in sync with what's on the apiserver but that will at least resolve itself far more quickly.

@erikgb , just to your specific comments:

What Joe's implemented is essentially the same of the "leader for life" approach you suggested, at least as I understand it. I've even suggested using the atomic.Value approach from that code here as well.
You mentioned that you "tested the standard controller-runtime leader election locally with HNC, and that does not work at all." Were you testing what's currently built into HNC (which is known not to work) or this change that Joe's proposing? I wouldn't expect anything to work until this change is merged.

@erikgb , if you're happy with this I'm ready to merge this change (after a few nits), especially if @joe2far can manually test it one last time. @rjbez17 , can you please have a look at this as well?

Thanks everyone and sorry for the delays here!

adrianludwin · 2022-02-07T00:44:43Z

internal/anchor/reconciler.go

@@ -56,6 +58,11 @@ type Reconciler struct {
 // Reconcile sets up some basic variables and then calls the business logic. It currently
 // only handles the creation of the namespaces but no deletion or state reporting yet.
 func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
+	// If not leader exiting, without performing any write operations


It's not obvious that we can safely skip this entire reconciler. Maybe a better comment would be: this reconciler never updates the forest, so it's fine to skip it entirely if it's not the leader.

adrianludwin · 2022-02-07T00:46:58Z

internal/anchor/reconciler.go

@@ -415,9 +422,72 @@ func (r *Reconciler) SetupWithManager(mgr ctrl.Manager) error {
 			}},
 		}
 	}
-	return ctrl.NewControllerManagedBy(mgr).


Rather than changing this initialization logic, can't we just add the NeedLeaderElection() method to all the reconcilers? That seems easier and less disruptive to me.

Oh never mind, I forgot about the distinction between Controller and Reconciler. Sorry.

adrianludwin · 2022-02-07T00:47:46Z

internal/anchor/reconciler.go

+
+// enqueueAllObjects enqueues all the current anchor objects in all namespaces.
+func (r *Reconciler) enqueueAllObjects(ctx context.Context, log logr.Logger) error {
+	keys := r.Forest.GetNamespaceNames()


Need r.Forest.Lock() and defer r.Forest.Unlock() here

adrianludwin · 2022-02-07T00:49:28Z

internal/config/default_config.go

+
+// IsLeader is global which repesents whether this HNC instance is the leader
+var IsLeader bool


Consider using atomic.Value()? I'd guess that adds a memory fence which might be useful if the value has just changed.

adrianludwin · 2022-02-07T00:49:56Z

internal/hierarchyconfig/reconciler.go

+
+// enqueueAllObjects enqueues all the current objects in all namespaces.
+func (r *Reconciler) enqueueAllObjects(ctx context.Context, log logr.Logger) error {
+	keys := r.Forest.GetNamespaceNames()


Lock/unlock forest here

adrianludwin · 2022-02-07T01:27:46Z

vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go

+// NeedLeaderElection is false as this runnable can run in standby mode without leader lock
+func (cr *CertRotator) NeedLeaderElection() bool {
+	return false
+}
+


This seems a bit weird - what happens if multiple instances try to create conflicting certs for the same webhook at the same time? I wonder if this will normally work but not always. Do you know?

Otherwise, an alternative option here might be to leave the runnables as requiring leader election, but then provide a function for the followers to determine when it's safe to start. E.g. maybe modify ensureCertsMounted() so that it's a standalone function that you can call (or just factor it out so as to preserve the existing interface), and it simply blocks until the certs exist. wdyt?

Either way, we should add this to the main cert-controller as well. Can you please file an issue to do so? Thanks!

k8s-ci-robot · 2022-02-08T06:54:29Z

@joe2far: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adrianludwin · 2022-03-25T21:07:03Z

I think that #163 will be our solution for HA support, not leader election. We'll also focus on greatly accelerating startup time in v1.1, further reducing the need for active/passive replicas. Since there's not a lot of activity on this PR, let's close this and continue work on #163 and its derivatives.

/close

k8s-ci-robot · 2022-03-25T21:07:16Z

@adrianludwin: Closed this PR.

In response to this:

I think that #163 will be our solution for HA support, not leader election. We'll also focus on greatly accelerating startup time in v1.1, further reducing the need for active/passive replicas. Since there's not a lot of activity on this PR, let's close this and continue work on #163 and its derivatives.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 12, 2021

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 12, 2021

k8s-ci-robot requested review from adrianludwin and srampal October 12, 2021 09:57

joe2far mentioned this pull request Oct 12, 2021

Enabled HNC leader election #88

Closed

adrianludwin reviewed Oct 13, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2021

joe2far force-pushed the leader-election-support2 branch from 9a9a393 to 0ad789c Compare October 18, 2021 08:46

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 18, 2021

joe2far force-pushed the leader-election-support2 branch 2 times, most recently from 3ac4bb9 to 6a6250c Compare October 18, 2021 14:00

adrianludwin reviewed Oct 18, 2021

View reviewed changes

joe2far force-pushed the leader-election-support2 branch from 6a6250c to 1ccc80f Compare October 19, 2021 14:30

joe2far force-pushed the leader-election-support2 branch from 1ccc80f to a3dbadc Compare October 20, 2021 16:38

Enable HNC leader election in controllers

2bb9745

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 1, 2022

joe2far force-pushed the leader-election-support2 branch from c89303c to 2bb9745 Compare February 1, 2022 13:33

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 1, 2022

Added leader election logging

47ed6b3

adrianludwin reviewed Feb 7, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2022

erikgb mentioned this pull request Feb 17, 2022

"failed calling webhook" error when creating namespace #147

Closed

k8s-ci-robot closed this Mar 25, 2022


		// IsLeader is global which repesents whether this HNC instance is the leader
		var IsLeader bool

Enable HNC leader election in controllers #94

Enable HNC leader election in controllers #94

Conversation

joe2far commented Oct 12, 2021 • edited Loading

k8s-ci-robot commented Oct 12, 2021

k8s-ci-robot commented Oct 12, 2021

adrianludwin commented Oct 12, 2021

adrianludwin left a comment

Choose a reason for hiding this comment

adrianludwin commented Oct 13, 2021

joe2far commented Oct 15, 2021

adrianludwin commented Oct 15, 2021 via email

joe2far commented Oct 18, 2021

joe2far commented Oct 19, 2021 • edited Loading

adrianludwin commented Oct 19, 2021 • edited Loading

joe2far commented Oct 19, 2021

adrianludwin commented Oct 19, 2021 via email

joe2far commented Oct 19, 2021 • edited Loading

erikgb commented Jan 30, 2022

erikgb commented Jan 30, 2022 • edited Loading

joe2far commented Jan 30, 2022 • edited Loading

joe2far commented Jan 31, 2022

linux-foundation-easycla bot commented Feb 1, 2022 • edited Loading

joe2far commented Feb 1, 2022

erikgb commented Feb 1, 2022

adrianludwin commented Feb 1, 2022 via email

erikgb commented Feb 1, 2022

adrianludwin commented Feb 1, 2022 via email

erikgb commented Feb 1, 2022

adrianludwin commented Feb 2, 2022 via email

erikgb commented Feb 2, 2022

adrianludwin commented Feb 2, 2022 via email

adrianludwin commented Feb 7, 2022

adrianludwin left a comment • edited Loading

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

adrianludwin Feb 7, 2022

Choose a reason for hiding this comment

k8s-ci-robot commented Feb 8, 2022

adrianludwin commented Mar 25, 2022

k8s-ci-robot commented Mar 25, 2022

joe2far commented Oct 12, 2021 •

edited

Loading

joe2far commented Oct 19, 2021 •

edited

Loading

adrianludwin commented Oct 19, 2021 •

edited

Loading

joe2far commented Oct 19, 2021 •

edited

Loading

erikgb commented Jan 30, 2022 •

edited

Loading

joe2far commented Jan 30, 2022 •

edited

Loading

linux-foundation-easycla bot commented Feb 1, 2022 •

edited

Loading

adrianludwin left a comment •

edited

Loading