Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Wait and requeue if LB + its ports not deleted #2122

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

EmilienM
Copy link
Contributor

@EmilienM EmilienM commented Jun 11, 2024

What this PR does / why we need it:

Wait and requeue if LB is in PENDING_DELETE

If the LB that is being deleted when a cluster is deleted, it'll go
through the PENDING_DELETE state and at this stage there is nothing we
can do but wait for the LB to be actually deleted.

If the LB is in that state during the deletion, let's just return no
error but request a reconcile after some time.

Wait for Octavia-managed ports to be removed

In a best-effort mode, when cleaning a load-balancer, wait for the ports
with a device ID (mapped with the LB ID) and a certain prefix is deleted
(by Octavia itself, not CAPO managed) before claiming the LB is really
deleted.

This will avoid the reconcile to fail later when trying to remove the
network while some ports are still attached.

Which issue(s) this PR fixes:
Fixes #2124
Fixes #2121

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from emilienm. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 11, 2024
Copy link

netlify bot commented Jun 11, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit e91c9a6
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/6669ff2d7abc8e0008d9a8ad
😎 Deploy Preview https://deploy-preview-2122--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@EmilienM EmilienM changed the title Reconcile with wait after LB cleanup 🐛 Reconcile with wait after LB cleanup Jun 11, 2024
@EmilienM
Copy link
Contributor Author

/hold
I want to confirm we don't see the error in the logs.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2024
@EmilienM EmilienM self-assigned this Jun 11, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2024
@jichenjc
Copy link
Contributor

/test pull-cluster-api-provider-openstack-e2e-test

@jichenjc
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 12, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 12, 2024
@EmilienM EmilienM changed the title 🐛 Reconcile with wait after LB cleanup Wait and requeue if LB is in PENDING_DELETE Jun 12, 2024
If the LB that is being deleted when a cluster is deleted, it'll go
through the PENDING_DELETE state and at this stage there is nothing we
can do but wait for the LB to be actually deleted.

If the LB is in that state during the deletion, let's just return no
error but request a reconcile after some time.
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 12, 2024
@EmilienM EmilienM changed the title Wait and requeue if LB is in PENDING_DELETE Wait and requeue if LB + its ports not deleted Jun 12, 2024
@EmilienM EmilienM changed the title Wait and requeue if LB + its ports not deleted 🐛 Wait and requeue if LB + its ports not deleted Jun 12, 2024
In a best-effort mode, when cleaning a load-balancer, wait for the ports
with a device ID (mapped with the LB ID) and a certain prefix is deleted
(by Octavia itself, not CAPO managed) before claiming the LB is really
deleted.

This will avoid the reconcile to fail later when trying to remove the
network while some ports are still attached.
}

if lb == nil {
return nil
if lbPortsExist {
s.scope.Logger().Info("Load balancer ports still exist, waiting for them to be deleted", "name", loadBalancerName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the logs, it seems like we never hit that so I wonder if the requeue at the end of the function has given enough time to perform the port cleanup on the Octavia side.

@EmilienM EmilienM requested a review from mdbooth June 13, 2024 01:04
@EmilienM
Copy link
Contributor Author

@mdbooth ready for review when time permits. Maybe over-engineered but it does the job apparently. Feedback is open.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

Invalid state PENDING_DELETE of loadbalancer resource cluster delete with LB: network is removed too soon
4 participants