Enabled HNC leader election #88

joe2far · 2021-10-06T21:27:38Z

block reconciler controller setup until instance becomes the leader using mgr.Elected() channel
adds leases permissions to cluster role
tunes leader election default configurations

k8s-ci-robot · 2021-10-06T21:27:46Z

Hi @joe2far. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-10-06T21:27:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: joe2far
To complete the pull request process, please assign tashimi after the PR has been reviewed.
You can assign the PR to them by writing /assign @tashimi in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adrianludwin

Thanks for this @joe2far ! I've left some small comments in the review but I have some deeper ones.

Today, HNC runs as a single pod, and if it's killed, it'll be restarted. The typically happens quite quickly AIUI, but on any cluster of any reasonable size, it then takes a while for the controllers to sync the state of every namespace and object in the system. This PR doesn't change that - it will (iiuc) wait for the prior leader to fail for at least 10s, and then will start the controller. This is only an improvement in availability if it previously took more than 10s to restart the pod, which I don't think is the case. By contrast, on larger clusters, it can take minutes for the controllers to fully start. So this doesn't really add a lot in the way of availability.

Moreover, the followers in this deployment can't actually respond to webhook requests without the controllers running, but the webhook Service will select these pods and try to send them traffic. We would need to prevent this using readiness probes.

I think there's a better alternative, but it takes more work. Every controller in HNC follows a strict loop: read info from apiserver; make decisions; update in-memory data structures; apply changes to the apiserver. In HA mode, if an instance is a follower, only that last step needs to be suppressed. So you could have all three pods fully up and running (and responding to webhooks), but only enable the final step if the pod is the leader.

In this scenario, there will be a race condition between the old leader dying and the new one taking over, so it's possible that none of the pods will commit a decision that needs to be committed, because they'll assume that the (now dead) leader did it. So when a pod takes over as leader, it will need to re-synchronize the entire cluster, in order to catch anything it missed. Usually, it won't have missed anything, so this will be a no-op, but it will still take time and use quota. So we'd probably want to manually throttle it, e.g. have a goroutine that tried to sync, say, one namespace every 200ms or so (we'd have to tune this). This would leave HNC responsive enough to deal with any new user actions, while still ensuring that anything that had fallen through the cracks would be fixed in a reasonable amount of time.

Does all this make sense?

cmd/manager/main.go

joe2far · 2021-10-11T10:31:59Z

Thanks for this @joe2far ! I've left some small comments in the review but I have some deeper ones.

Today, HNC runs as a single pod, and if it's killed, it'll be restarted. The typically happens quite quickly AIUI, but on any cluster of any reasonable size, it then takes a while for the controllers to sync the state of every namespace and object in the system. This PR doesn't change that - it will (iiuc) wait for the prior leader to fail for at least 10s, and then will start the controller. This is only an improvement in availability if it previously took more than 10s to restart the pod, which I don't think is the case. By contrast, on larger clusters, it can take minutes for the controllers to fully start. So this doesn't really add a lot in the way of availability.

Moreover, the followers in this deployment can't actually respond to webhook requests without the controllers running, but the webhook Service will select these pods and try to send them traffic. We would need to prevent this using readiness probes.

I think there's a better alternative, but it takes more work. Every controller in HNC follows a strict loop: read info from apiserver; make decisions; update in-memory data structures; apply changes to the apiserver. In HA mode, if an instance is a follower, only that last step needs to be suppressed. So you could have all three pods fully up and running (and responding to webhooks), but only enable the final step if the pod is the leader.

In this scenario, there will be a race condition between the old leader dying and the new one taking over, so it's possible that none of the pods will commit a decision that needs to be committed, because they'll assume that the (now dead) leader did it. So when a pod takes over as leader, it will need to re-synchronize the entire cluster, in order to catch anything it missed. Usually, it won't have missed anything, so this will be a no-op, but it will still take time and use quota. So we'd probably want to manually throttle it, e.g. have a goroutine that tried to sync, say, one namespace every 200ms or so (we'd have to tune this). This would leave HNC responsive enough to deal with any new user actions, while still ensuring that anything that had fallen through the cracks would be fixed in a reasonable amount of time.

Does all this make sense?

Yes, above does makes sense. Intention here was to improve availability using existing leader election logic. During upgrades, where there is a lot of namespace activity, coupled with node updates/scaling, we have seen issues where HNC could be taken offline due to a node upgrade and during this time certain namespace operations are getting blocked from completing

joe2far · 2021-10-12T09:57:57Z

While I had made updates to this PR, given that there will be a race condition between the old leader dying and the new one taking over, so it's possible that none of the pods will commit a decision that needs to be committed, is there much point in adding this basic leader election to HNC? or is the recommendation to implement the more involved work in making reconcilers aware of leader/follower mode and re-queueing changes when there is a new leader elected

To this end, I have created a new PR which starts updating the controllers to be leader aware rather than function with a leader and standby instances (waiting to become a leader)
ref: #94

adrianludwin · 2021-10-12T14:28:56Z

My intuition tells me that this "basic" leader election won't help availability, so unless you've tried this out locally and seen an improvement, I think you're right. Let's close this one off and continue on #94.

adrianludwin · 2021-10-12T14:29:04Z

/close

k8s-ci-robot · 2021-10-12T14:29:15Z

@adrianludwin: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 6, 2021

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 6, 2021

k8s-ci-robot requested review from rjbez17 and srampal October 6, 2021 21:27

adrianludwin reviewed Oct 7, 2021

View reviewed changes

cmd/manager/main.go Outdated Show resolved Hide resolved

cmd/manager/main.go Outdated Show resolved Hide resolved

Enable HNC leader election

01af65b

joe2far force-pushed the leader-election-support branch from 9e46828 to 01af65b Compare October 11, 2021 10:10

k8s-ci-robot closed this Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled HNC leader election #88

Enabled HNC leader election #88

joe2far commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

adrianludwin left a comment

joe2far commented Oct 11, 2021

joe2far commented Oct 12, 2021

adrianludwin commented Oct 12, 2021

adrianludwin commented Oct 12, 2021

k8s-ci-robot commented Oct 12, 2021

Enabled HNC leader election #88

Enabled HNC leader election #88

Conversation

joe2far commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

adrianludwin left a comment

Choose a reason for hiding this comment

joe2far commented Oct 11, 2021

joe2far commented Oct 12, 2021

adrianludwin commented Oct 12, 2021

adrianludwin commented Oct 12, 2021

k8s-ci-robot commented Oct 12, 2021