-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost leader election may be leading to restarts #810
Comments
How do we know this is actually is a bug on our side, and not instability of the cluster? On a side note, when we upgrade controller-runtime the leader election will start using "leases" instead of ConfigMaps: https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.12.0 |
@hiddeco we don't, therefore I removed the |
@mfamador I noticed that the staging cluster has a 10-fold occurrence rate on this issue. Have you noticed any other metric that could be correlated to indicate why is that so? I am assuming from an API server perspective, the only difference amongst the two environments would be the availability that Azure provides. Or are both environments configured with the same API server availability? Can you please also confirm whether this is happening only in North Europe, or across all regions? |
@pjbgf we're only using the The other regions we're using
I'll leave here the latest logs from the most problematic region
|
Source controller seems to be restarting abruptly, potentially due to lost leader election.
Logs before last restart:
On the specific case, the AKS clusters are syncing with the same Azure DevOps git repositories (gitImplementation: libgit2).
The deployment has cluster in multiple regions, most of which work as expected, apart from the clusters based in North Europe, which gets restarted every so often.
First reported by @mfamador at #402 (comment).
The text was updated successfully, but these errors were encountered: