Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only KeystoneAPI finalizer from sub-resources when actually needed #401

Merged

Conversation

abays
Copy link
Contributor

@abays abays commented Apr 11, 2024

We are seeing the cleanup of certain KUTTL tests hang. This happens because KeystoneAPI fails to fully delete when multiple service CRs (Nova, Glance, Placement, etc) are being deleted at the same time. This is more likely to occur in operators that have a lot of other operator dependencies in their KUTTL testing (for instance, Octavia). KeystoneService is used in the enumeration below, but this also applies for KeystoneEndpoints.

  1. The problem is that the deleted KeystoneService resource for all the OSP service CRs is being requeued one more time after it has removed its own self-finalizer and is then waiting to be fully deleted.
  2. And why is it requeued? Because it's watching itself and the removal of its self-finalizer will queue it for one more read (because it's an update to its metadata).
  3. It's a race though. Since the self-finalizer is gone, k8s will be pinged to remove the KeystoneService from the cache and etcd at the same time as that object has been requeued for one more read.
  4. If k8s removes it first, no problem. But if the reconcile happens first, then what happens is that the KeystoneService adds a finalizer back onto KeystoneAPI (
    //
    // Add a finalizer to the KeystoneAPI for this service instance, as we do not want
    // the KeystoneAPI to disappear before this service in the case where this service
    // is deleted
    //
    if controllerutil.AddFinalizer(keystoneAPI, fmt.Sprintf("%s-%s", helper.GetFinalizer(), instance.Name)) {
    err := r.Update(ctx, keystoneAPI)
    if err != nil {
    return ctrl.Result{}, err
    }
    }
    ).
  5. And then, later in that same reconcile, where the deletion logic tries to remove that same finalizer from the KeystoneAPI, you can hit the "object has been modified" error if other threads for other KeystoneServices are also modifying the KeystoneAPI.
  6. And if you hit that error, you're now in a state where the KeystoneService is going to be completely gone at the end of the reconcile (because its self-finalizer was removed in the last reconcile and won't be removed in this one, hence it won't cause a requeue and k8s will completely wipe it from etcd) and yet it has left a finalizer on KeystoneAPI, forever preventing the deletion of the KeystoneAPI.

To prevent this problem, we are moving the Update() to add a finalizer to the KeystoneAPI in the KeystoneService and KeystoneEndpoint reconcile loops to a location that only executes when that finalizer is actually needed. Beforehand we were potentially always adding the finalizer to KeystoneAPI even if the KeystoneService or KeystoneEndpoint had no need of it anymore (i.e. during deletion, after user/service/endpoint had been removed from the OSP Keystone database).

@abays abays requested a review from stuggi April 11, 2024 14:35
@openshift-ci openshift-ci bot requested a review from viroel April 11, 2024 14:35
@stuggi
Copy link
Contributor

stuggi commented Apr 12, 2024

/test keystone-operator-build-deploy-kuttl

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Apr 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stuggi
Copy link
Contributor

stuggi commented Apr 12, 2024

/test keystone-operator-build-deploy-kuttl

1 similar comment
@abays
Copy link
Contributor Author

abays commented Apr 12, 2024

/test keystone-operator-build-deploy-kuttl

@openshift-merge-bot openshift-merge-bot bot merged commit 850399d into openstack-k8s-operators:main Apr 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants