doc/user-guide: show how to enable both leader election options #1052

hasbro17 · 2019-02-04T07:01:05Z

Description of the change:
Added a brief overview of the two leader election implementations that can be used along with their respective tradeoffs and example code snippets on how to use them.

Motivation for the change:

Fixes #665 #784

I'm aiming to keep the explanations as concise as possible without being too vague and referring to the pkg godocs for more details on the implementations themselves.

I also need a second opinion on whether this is bloating the user-guide, and if we would rather keep this in a separate doc and just link it from the user-guide.

lilic

Few nits, overall makes sense to me 👍

doc/user-guide.md

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

joelanford

LGTM. Just one nit

doc/user-guide.md

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

estroz

LGTM, a few nits and a question.

doc/user-guide.md

estroz · 2019-02-07T20:12:03Z

doc/user-guide.md

+
+When the operator is not running in a cluster, the Manager will return an error on starting since it can't detect the operator's namespace in order to create the configmap for leader election. You can override this namespace by setting the Manager's `LeaderElectionNamespace` option.
+
+[tools_leaderelection]: https://godoc.org/github.com/kubernetes/client-go/tools/leaderelection


Is this link supposed to be used anywhere?

Good catch, no this got left in from an earlier change. I'll remove this.

mhrivnak

Thanks for writing this! I made a few suggestions.

mhrivnak · 2019-02-04T09:19:34Z

doc/user-guide.md

@@ -400,6 +400,70 @@ func main() {

 After adding new import paths to your operator project, run `dep ensure` in the root of your project directory to fulfill these dependencies.

+## Leader election
+
+For improved reliability and quick failover it may be necessary to run multiple replicas of the operator with leader election, so that one leader instance handles the reconciliation while the other instances are inactive but ready to take over when the leader fails.


Just to give people the right guidance, do we know of any controllers or operators that actually do this? I think there's a risk that people will read this as advice on something good to do, when in reality it is uncommon. For most people, we don't want to double their number of operator pods, and brief operator downtime is completely acceptable (eventual consistency being the goal).

As I understand it, leader election is more about avoiding contention during rollout of a newer operator, and when an unhealthy operator gets replaced.

As far as I know it's mostly the control plane controllers (like the scheduler) for whom high availability is an important requirement.
But I agree that improved reliability is not the dominant use case for using leader election and we probably shouldn't give that impression. I'll update this to point out that the focus is to avoid contention between multiple instances during rollouts.

mhrivnak · 2019-02-07T21:00:27Z

doc/user-guide.md

+
+There are two different leader election implementations to choose from, each with its own tradeoff.
+
+- [Leader-for-life][leader_for_life]: The leader pod only gives up leadership (via garbage collection) when it is deleted. Avoids the possibility of 2 instances mistakenly running as leaders(split brain). However, this method can be slow to elect a new leader when the leader pod is on an unresponsive node(partitioned) where it takes a long time for the leader pod to be deleted.


Rather than use words like "slow", which is relative and subjective, I suggest referencing the node failure timings. I don't remember the name of the setting off the top of my head, but there is a configurable time frame (default 5 minutes) after which workloads on an unresponsive node get deleted. Referencing that setting here will also give people an opportunity to consider modifying it. I'll try to look it up later on.

I'm learning more about node lifecycle, and my suspicion is that in practice the period of uncertainty will normally not last for 5 minutes. Using the cloud-controller-manager, the status of the underlying machine can be queried with the cloud environment (AWS, azure, openstack, whatever), and a dead machine can have its node deleted more promptly. So I suspect in real life this won't be a common scenario.

I believe that's the pod-eviction-timeout. From the node architecture docs:

If the Status of the Ready condition remains Unknown or False for longer than the pod-eviction-timeout, an argument is passed to the kube-controller-manager and all the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is five minutes

I'll make this more specific by pointing out the default time for pod eviction that can result in an unresponsive leader for the given period of time.

mhrivnak · 2019-02-07T21:02:22Z

doc/user-guide.md

+There are two different leader election implementations to choose from, each with its own tradeoff.
+
+- [Leader-for-life][leader_for_life]: The leader pod only gives up leadership (via garbage collection) when it is deleted. Avoids the possibility of 2 instances mistakenly running as leaders(split brain). However, this method can be slow to elect a new leader when the leader pod is on an unresponsive node(partitioned) where it takes a long time for the leader pod to be deleted.
+- [Leader-with-lease][leader_with_lease]: The leader pod periodically renews the leader lease and gives up leadership when it can't renew the lease. This method allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in [certain situations][lease_split_brain].


I would add that it assumes the leader is capable of stopping what it is doing immediately upon lease expiry. Depending on how a controller is written, it may or may not be able to make that guarantee.

@mhrivnak Aren't all leaders capable of stopping abruptly when using leader election via the manager? Since that will just exit the program when the lease expires for the leader and it stops leading.
https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/internal.go#L279-L283
https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/internal.go#L230-L233
https://github.com/operator-framework/operator-sdk-samples/blob/master/memcached-operator/cmd/manager/main.go#L85-L88

In my understanding this should exit all background goroutines abruptly as well without blocking for any defer statements.

Good point, it does look like they've guarded it well. It may be too late to cancel API requests that are in-progress, but that's quite an edge case.

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

hasbro17 · 2019-02-11T23:06:00Z

@mhrivnak PTAL. Updated per your comments.

doc/user-guide: show how to enable both leader election options

d945535

hasbro17 added the docs label Feb 4, 2019

hasbro17 assigned mhrivnak, joelanford, lilic and estroz Feb 4, 2019

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 4, 2019

hasbro17 unassigned mhrivnak, joelanford, lilic and estroz Feb 4, 2019

hasbro17 requested review from estroz, lilic, joelanford and mhrivnak February 4, 2019 07:02

lilic reviewed Feb 4, 2019

View reviewed changes

doc/user-guide.md Outdated Show resolved Hide resolved

doc/user-guide.md Outdated Show resolved Hide resolved

lilic and others added 2 commits February 4, 2019 10:57

fix typo

20786b9

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

fix typo

17b86c6

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

hasbro17 removed the request for review from mhrivnak February 4, 2019 18:58

joelanford approved these changes Feb 5, 2019

View reviewed changes

doc/user-guide.md Outdated Show resolved Hide resolved

fix typo

28701fa

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

estroz approved these changes Feb 7, 2019

View reviewed changes

mhrivnak suggested changes Feb 7, 2019

View reviewed changes

estroz and others added 4 commits February 11, 2019 13:38

rephrase

3f9f513

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

add hyphens

7028593

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

rephrase

2e4d737

Co-Authored-By: hasbro17 <hasbro17@gmail.com>

clarify contention avoidance and pod-eviction-timeout

968e5db

mhrivnak approved these changes Feb 18, 2019

View reviewed changes

hasbro17 merged commit 7e578b4 into operator-framework:master Feb 18, 2019

hasbro17 deleted the leader-election-doc branch February 18, 2019 22:04

hasbro17 mentioned this pull request Feb 18, 2019

Leader election may be too slow to re-elect new master #784

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/user-guide: show how to enable both leader election options #1052

doc/user-guide: show how to enable both leader election options #1052

hasbro17 commented Feb 4, 2019

lilic left a comment

joelanford left a comment

estroz left a comment

estroz Feb 7, 2019

hasbro17 Feb 11, 2019

mhrivnak left a comment

mhrivnak Feb 4, 2019

hasbro17 Feb 11, 2019

mhrivnak Feb 7, 2019

hasbro17 Feb 11, 2019

mhrivnak Feb 7, 2019

hasbro17 Feb 11, 2019

mhrivnak Feb 18, 2019

hasbro17 commented Feb 11, 2019


		When the operator is not running in a cluster, the Manager will return an error on starting since it can't detect the operator's namespace in order to create the configmap for leader election. You can override this namespace by setting the Manager's `LeaderElectionNamespace` option.

		[tools_leaderelection]: https://godoc.org/github.com/kubernetes/client-go/tools/leaderelection


		There are two different leader election implementations to choose from, each with its own tradeoff.

		- [Leader-for-life][leader_for_life]: The leader pod only gives up leadership (via garbage collection) when it is deleted. Avoids the possibility of 2 instances mistakenly running as leaders(split brain). However, this method can be slow to elect a new leader when the leader pod is on an unresponsive node(partitioned) where it takes a long time for the leader pod to be deleted.

doc/user-guide: show how to enable both leader election options #1052

doc/user-guide: show how to enable both leader election options #1052

Conversation

hasbro17 commented Feb 4, 2019

lilic left a comment

Choose a reason for hiding this comment

joelanford left a comment

Choose a reason for hiding this comment

estroz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhrivnak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasbro17 commented Feb 11, 2019