Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Leader Election Resource Lock Options #6426

Merged
merged 11 commits into from
Jun 21, 2023
Merged

Add Leader Election Resource Lock Options #6426

merged 11 commits into from
Jun 21, 2023

Conversation

IzzyMusa
Copy link
Contributor

@IzzyMusa IzzyMusa commented May 14, 2023

Description of the change:

Background:

Currently, the operator SDK for the Ansible operator lacks options to customize the leader election resource lock behavior. This limits the flexibility of the operator and prevents users from adjusting lease duration and renew deadline according to their specific needs.

Changes:

I made enhancements to the operator SDK to include additional flags for configuring leader election resource lock options. These changes provide users with the ability to specify the type of resource object used for locking during leader election and customize the lease duration and renew deadline.

Introduced --leader-elect-lease-duration flag to allow users to define the duration that non-leader candidates will wait to force acquire leadership. The default duration is set to 15 seconds.

Introduced --leader-elect-renew-deadline flag, enabling users to set the renew deadline, which determines the duration that the acting control plane will retry refreshing leadership before giving up. The default duration is set to 10 seconds.

These changes provide more flexibility and control over leader election behavior, allowing operators to adapt to various deployment scenarios and specific requirements.

Reason for the Change:

The motivation behind these enhancements is to address frequent restarts observed in the operator instances. It was found that the leader election was often lost because the instance couldn't update the resource lock in time, resulting in the assumption that it crashed. By allowing users to adjust the lease duration and renew deadline, operators can mitigate the impact of resource lock updates, reducing unnecessary restarts and ensuring uninterrupted operation.

Expected Impact:

By incorporating these enhancements, operators built with the Ansible operator SDK will allow users to fine-tune the operators leader election resource locking.

Checklist

If the pull request includes user-facing changes, extra documentation is required:

@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy May 15, 2023 17:03 — with GitHub Actions Inactive
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IzzyMusa Thanks for this contribution! I think the change makes sense and looks good to me. Only change needed for my approval is fixing the changelog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a template file that should not be changed. Instead, would you mind copying the contents of this file into a new file and modifying it to reflect the changes you are making in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven Gotcha! Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven Do you know which version of OpenShift Operator-SDK will support in v1.29.0? If 4.13 I might as well change the resource lock to be lease here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IzzyMusa I think the version of operator-sdk that got pulled down for 4.13 was v1.28.? (not sure if a patch version went in or not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven I updated the change log. Yup, based on this here. I can't find the exact place where I saw the migration plan from configmap to leases but I also added a flag for the type of resources lock. I can default it to configmapLeases if you think folks with older clusters will still depend on the configmap as a resource lock obj.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit in the changelog, but other than that I think this looks good. Thanks @IzzyMusa !

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2023
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 2, 2023 13:11 — with GitHub Actions Inactive
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2023
oceanc80 and others added 5 commits June 2, 2023 09:31
Signed-off-by: Catherine Chan-Tse <cchantse@redhat.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
…erator

Signed-off-by: Izzy Musa <izzymusa@Izzys-MacBook-Pro.local>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzymusa@Izzys-MacBook-Pro.local>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Izzy Musa and others added 6 commits June 2, 2023 09:31
This reverts commit 443be19.

Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
This reverts commit 29f8b9e.

Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
…yaml

Co-authored-by: Bryce Palmer <everettraven@gmail.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
Signed-off-by: Izzy Musa <izzy.musa@ibm.com>
everettraven

This comment was marked as resolved.

@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
@IzzyMusa IzzyMusa temporarily deployed to deploy June 6, 2023 18:13 — with GitHub Actions Inactive
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 6, 2023
@IzzyMusa
Copy link
Contributor Author

IzzyMusa commented Jun 7, 2023

@everettraven There is a test that failed due to a timeout. Is it just a build error or do I need to change something?

@everettraven
Copy link
Contributor

@IzzyMusa I'll re-run it but if it fails again I'll verify nothing is actually broken and override if everything checks out

@everettraven everettraven merged commit 2f54287 into operator-framework:master Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants