-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Configure leader election timeout #1443
chore: Configure leader election timeout #1443
Conversation
Note for the reviewer: Testing that is hard. You have to make the API Server unavailable for some time. I found a simple way to test it locally, but it requires a code change so it's not suitable for automated test. How to test locally: Add the following to the
Example run:
Note that the time difference between the first and the last log entry in the example above is not exactly 20 seconds, but rather around 18 seconds. This is because the controller creates the first error log message about 2 seconds after the API Server connectivity is lost |
b20e74b
to
b5cac34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good description, I can replicate the behavior as you described.
Two general things I am wondering about:
a) when does the leader election actually get used? I looked at Argo for stage and prod and can see that we only have one pod there. Are we only using it when a new version of KLM is deployed?
b) in the ticket you described that API server outages are about 2 mins. Why are we setting the renew deadline to 90 seconds then and not something slightly above 2 mins?
A quote from some k8s book:
Short answer: I think shorter is better, but we decided to test how it behaves for longer time-outs... Edit: Despite my preferences for shorter values, I decided to raise the deadline to 2 minutes and lease time to 3 minutes (to keep their ratio as in library's defaults) |
b5cac34
to
548ea6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations! Good for me, just one thing I just remembered. We once agreed to add all flag defaults to the tests file which has not been done yet: https://github.com/kyma-project/lifecycle-manager/blob/main/internal/pkg/flags/flags_test.go
854e8f9
to
01838f1
Compare
01838f1
to
dc0385d
Compare
Description
Fixes #1351