operator leader election failed when the node of current leader shutdowns unexpectedly #2797

thesunnysky · 2020-04-07T14:07:13Z

Bug Report

What did you do?
my operator runs in cluster with replicas is 3, when the node of current leader shutdowns unexpectedly, I expected one of the else two operators will take the leadership and become new leader, but what i had seen is that none of the else two operators take the leadership and become new leader;

Environment

operator-sdk version:
operator-sdk version: v0.10.0, commit: ff80b17

go version:
golang 1.12

Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"a8b52209ee172232b6db7a6e0ce2adc77458829f", GitTreeState:"clean", BuildDate:"2019-10-15T12:12:15Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"a8b52209ee172232b6db7a6e0ce2adc77458829f", GitTreeState:"clean", BuildDate:"2019-10-15T12:04:30Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:
The k8s cluster of Docker internal
Are you writing your operator in ansible, helm, or go?
writing my operator in go

Possible Solution

Additional context
//internal/scaffold/cmd.go

	ctx := context.TODO()
	// Become the leader before proceeding
	err = leader.Become(ctx, "{{ .ProjectName }}-lock")
	if err != nil {
		log.Error(err, "")
		os.Exit(1)
	}

operator must create the configmap resource lock({{ .ProjectName }}-lock) successfully before become the leader of operators, but when the node of old leader shutdowns unexpectedly the resource lock(configmap) of old leader won't be delete from k8s cluster, so the rest of opertors won't create the new resource lock successfully and won't become new leader , thus leads to operator leader election failed !!!

The text was updated successfully, but these errors were encountered:

thesunnysky · 2020-04-08T02:01:20Z

Use Leader-with-lease election may fix this problem:
https://docs.openshift.com/container-platform/4.1/applications/operator_sdk/osdk-leader-election.html

thesunnysky changed the title ~~the leader of operator can't failover when the node of current leader is down~~ operator leader election failed when the node of current leader shutdowns unexpectedly Apr 7, 2020

thesunnysky closed this as completed Apr 7, 2020

yoheimuta mentioned this issue Oct 24, 2022

Not handled case when leader pod have status reason "Terminated" planetscale/operator-sdk-libs#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator leader election failed when the node of current leader shutdowns unexpectedly #2797

operator leader election failed when the node of current leader shutdowns unexpectedly #2797

thesunnysky commented Apr 7, 2020 •

edited

Loading

thesunnysky commented Apr 8, 2020

operator leader election failed when the node of current leader shutdowns unexpectedly #2797

operator leader election failed when the node of current leader shutdowns unexpectedly #2797

Comments

thesunnysky commented Apr 7, 2020 • edited Loading

Bug Report

thesunnysky commented Apr 8, 2020

thesunnysky commented Apr 7, 2020 •

edited

Loading