Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keda operator restarted at the time of start.(error retrieving resource lock keda/operator.keda.sh) #2836

Closed
crisp2u opened this issue Mar 28, 2022 Discussed in #2722 · 7 comments
Labels
stale All issues that are marked as stale due to inactivity

Comments

@crisp2u
Copy link

crisp2u commented Mar 28, 2022

Discussed in #2722

Originally posted by vkamlesh March 7, 2022
Keda operator failed to elect leader after keda-operator pod restart. These restarts are not frequent but it's happening in a few days(6 days) time intervals.

KEDA Version: 2.6.1
Git Commit: efca71d

Kubernetes version: v1.20.9
Kubernetes Cluster : AKS


bash-3.2$ k get po -n keda
NAME                                      READY   STATUS    RESTARTS   AGE
keda-metrics-apiserver-649f4ddbbd-v4pjp   1/1     Running   0          12d
keda-operator-68ddbdcc8f-6h767            1/1     Running   3          12d
bash-3.2$ 


bash-3.2$ kubectl get --raw "/apis/coordination.k8s.io/v1/namespaces/keda/leases/operator.keda.sh"
{"kind":"Lease","apiVersion":"coordination.k8s.io/v1","metadata":{"name":"operator.keda.sh","namespace":"keda","uid":"edb18fd7-b95e-463f-81cf-6a1010073409","resourceVersion":"135421212","creationTimestamp":"2022-02-23T14:33:32Z","managedFields":[{"manager":"keda","operation":"Update","apiVersion":"coordination.k8s.io/v1","time":"2022-02-23T14:33:32Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:acquireTime":{},"f:holderIdentity":{},"f:leaseDurationSeconds":{},"f:leaseTransitions":{},"f:renewTime":{}}}}]},"spec":{"holderIdentity":"keda-operator-68ddbdcc8f-6h767_53e840b6-5466-484f-a6f6-16978b7ee12c","leaseDurationSeconds":15,"acquireTime":"2022-03-07T12:07:54.000000Z","renewTime":"2022-03-07T17:36:42.260717Z","leaseTransitions":142}}




bash-3.2$ k logs keda-operator-68ddbdcc8f-6h767 -n keda -f -p


1.6466548397264059e+09	INFO	controller.scaledobject	Reconciling ScaledObject	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "observationsprocessor-func", "namespace": "platform-api"}
E0307 12:07:33.812275       1 leaderelection.go:330] error retrieving resource lock keda/operator.keda.sh: Get "https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/keda/leases/operator.keda.sh": context deadline exceeded
I0307 12:07:33.812329       1 leaderelection.go:283] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
1.6466548538123553e+09	ERROR	setup	problem running manager	{"error": "leader election lost"}`
@zroubalik
Copy link
Member

This is most likely a problem in sigs.k8s.io/controller-runtime as it is responsible for leader election. We should investigate.

@crisp2u
Copy link
Author

crisp2u commented Mar 30, 2022

I've found this. What puzzles me is that I saw the same error message ("failed to renew lease" ) on the other controllers in the cluster that probably use
controller-runtime but they managed to recover. Maybe the default options are to optimistic in keda ?

@zroubalik
Copy link
Member

Hard to say, could please try to tweak those settings on your setup?

@vkamlesh
Copy link

@crisp2u @zroubalik Where exactly do we need to tweak values?

@wsugarman
Copy link
Contributor

wsugarman commented Jun 8, 2022

I'm also seeing this issue, and it's leading to noisy pod restart alerts in our AKS cluster. We are only running 1 replica of the KEDA operator, but as of now we're seeing container restarts ~3-8 times a day thanks to "leader election lost"

leaderelection.go:367] Failed to update lock: Put ".../api/v1/namespaces/keda/configmaps/operator.keda.sh": context deadline exceeded
leaderelection.go:283] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
ERROR setup problem running manager {"error": "leader election lost"}

@zroubalik - Presumably you were talking previously about tweaking the lease-related settings? Perhaps there should be a hook in the helm chart for configuring the leasing options:

keda/main.go

Lines 87 to 95 in dcb9c1e

mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: metricsAddr,
Port: 9443,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "operator.keda.sh",
Namespace: namespace,
})

@stale
Copy link

stale bot commented Aug 7, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Aug 7, 2022
@stale
Copy link

stale bot commented Aug 14, 2022

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants