Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator leader election failed when the node of current leader shutdowns unexpectedly #2797

Closed
thesunnysky opened this issue Apr 7, 2020 · 1 comment

Comments

@thesunnysky
Copy link

thesunnysky commented Apr 7, 2020

Bug Report

What did you do?
my operator runs in cluster with replicas is 3, when the node of current leader shutdowns unexpectedly, I expected one of the else two operators will take the leadership and become new leader, but what i had seen is that none of the else two operators take the leadership and become new leader;

Environment

  • operator-sdk version:
    operator-sdk version: v0.10.0, commit: ff80b17
  • go version:
    golang 1.12
  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"a8b52209ee172232b6db7a6e0ce2adc77458829f", GitTreeState:"clean", BuildDate:"2019-10-15T12:12:15Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"a8b52209ee172232b6db7a6e0ce2adc77458829f", GitTreeState:"clean", BuildDate:"2019-10-15T12:04:30Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:
    The k8s cluster of Docker internal

  • Are you writing your operator in ansible, helm, or go?
    writing my operator in go

Possible Solution

Additional context
//internal/scaffold/cmd.go

	ctx := context.TODO()
	// Become the leader before proceeding
	err = leader.Become(ctx, "{{ .ProjectName }}-lock")
	if err != nil {
		log.Error(err, "")
		os.Exit(1)
	}

operator must create the configmap resource lock({{ .ProjectName }}-lock) successfully before become the leader of operators, but when the node of old leader shutdowns unexpectedly the resource lock(configmap) of old leader won't be delete from k8s cluster, so the rest of opertors won't create the new resource lock successfully and won't become new leader , thus leads to operator leader election failed !!!

@thesunnysky thesunnysky changed the title the leader of operator can't failover when the node of current leader is down operator leader election failed when the node of current leader shutdowns unexpectedly Apr 7, 2020
@thesunnysky
Copy link
Author

Use Leader-with-lease election may fix this problem:
https://docs.openshift.com/container-platform/4.1/applications/operator_sdk/osdk-leader-election.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant