Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent readiness status updates #583

Open
TerjeLafton opened this issue Oct 24, 2023 · 9 comments · May be fixed by #654
Open

Consistent readiness status updates #583

TerjeLafton opened this issue Oct 24, 2023 · 9 comments · May be fixed by #654
Assignees
Labels
enhancement New feature or request roadmap

Comments

@TerjeLafton
Copy link

What problem are you facing?

The Ready status of a resource is reflected nicely when the resource is being created, but not so when updating or doing other things.

For example, when a resource needs to be updated, there is no status indicating this to the user, except for an event and log saying "Successfully requested update of external resource". The Ready status will just be true while this is happening. Successfully updating a resource also doesn't return reconcile.Result{Requeue: true}..., which means it won't trigger an observe right after to confirm that the update did its job.

When creating a resource, the Ready status is set to false with Creating as the reason both when it errors and succeeds, which is the correct way to do it. It also returns reconcile.Result{Requeu: true}... to make sure the resource is ready for use.

How could Crossplane help solve your problem?

I don't see why Create should be different than Update, so I would prefer if they were consistent.

I would personally add a ConditionReason for this as well as a helper function to set the Ready status for you, like its done with Create and Delete:

const ReasonUpdating ConditionReason = "Updating"

// Updating returns a condition that indicates the resource is currently
// being updated.
func Updating() Condition {
    return Condition{
	Type:               TypeReady,
	Status:             corev1.ConditionFalse,
	LastTransitionTime: metav1.Now(),
	Reason:             ReasonUpdating,
    }
}

The code that executes external.Update should be updated to return this reconcile.Result:
return reconcile.Result{Requeue: true}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)

All the returns related to the update should also add xpv1.Updating() to its SetCondition:
managed.SetConditions(xpv1.Updating(), xpv1.ReconcileSuccess())

@TerjeLafton TerjeLafton added the enhancement New feature or request label Oct 24, 2023
@turkenh
Copy link
Member

turkenh commented Oct 24, 2023

This makes sense to me.
I think the Create case was more obvious since it usually impacts time to readiness. For resources that do not require a provisioning time, e.g. a bucket, we want to mark it as Ready as soon as possible after the Create call, so, we re-queued immediately instead of waiting for the next reconcile (usually after the poll interval, assuming we don't update the CR in Create).

Reflecting on the Updates feels less of an issue compared to Create, but it is still valid.
We need to make sure it doesn't introduce loops (for whatever reason I can think of) or extra loads, at least by testing with some of the providers/resources out there.

@mbbush
Copy link

mbbush commented Oct 26, 2023

Would this also mean that a failed update results in the resource no longer being ready? I think that would be really valuable; the fact that this doesn't happen (at least not always) today makes troubleshooting much harder.

@TerjeLafton
Copy link
Author

Yes, it would work like that.

When the external.Update function returns an error, we can just add a xpv1.Updating() condition to the SetConditions call. This would set the resource as Ready: false.

update, err := external.Update(externalCtx, managed)
if err != nil {
    log.Debug("Cannot update external resource")
    record.Event(managed, event.Warning(reasonCannotUpdate, err))
    managed.SetConditions(xpv1.Updating(), xpv1.ReconcileError(errors.Wrap(err, errReconcileUpdate)))
    return reconcile.Result{Requeue: true}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
}

I was thinking of opening a PR for this, but haven't had the time yet. It would require some testing too, but I am not that up to date on the code here.

@TerjeLafton TerjeLafton linked a pull request Jan 29, 2024 that will close this issue
2 tasks
@negz
Copy link
Member

negz commented Feb 21, 2024

Does a resource being updated mean it's not ready? I don't think so, at least not usually.

The Ready status condition indicates that the resource in the external system that the MR represents is ready to use.

When the provider requests that a resource be created, it typically takes a little time for the creation process to complete. During this time the resource isn't usable. The provider confirms the creation has succeeded before marking the resource ready.

When the provider requests that a resource be updated, it doesn't necessarily prevent anyone from using the external resource. Some resources (for example a GKE cluster) may enter a state like 'Repairing' and be degraded or unavailable while the external system processes the update. This isn't always the case though - in fact anecdotally I think it's less common.

@vibe
Copy link

vibe commented May 2, 2024

Does a resource being updated mean it's not ready? I don't think so, at least not usually.

The Ready status condition indicates that the resource in the external system that the MR represents is ready to use.

When the provider requests that a resource be created, it typically takes a little time for the creation process to complete. During this time the resource isn't usable. The provider confirms the creation has succeeded before marking the resource ready.

When the provider requests that a resource be updated, it doesn't necessarily prevent anyone from using the external resource. Some resources (for example a GKE cluster) may enter a state like 'Repairing' and be degraded or unavailable while the external system processes the update. This isn't always the case though - in fact anecdotally I think it's less common.

I totally agree with this, we currently use the ready status on a resource to determine whether or not we can access status properties on the resource, which usually includes some form of ID / ARN.

From our perspective, a resource that is updating is still "ready" even if the update fails because its existing status properties remain valid.

For example we may request to expand a disk on a VM which fails, but other resources consuming that VM's ID should consider the VM ready.

Synced being set to false, might make sense, or an addition of a new state property.

@mbbush
Copy link

mbbush commented May 2, 2024

More robust status conditions like it sounds like are being discussed here would be a better solution to crossplane/upjet#327 than what I proposed in that issue.

@jbw976 jbw976 removed this from the v1.16 milestone May 15, 2024
@chlunde
Copy link

chlunde commented Sep 2, 2024

Also related: #198

If we basically merge synced/upToDate into Ready as Reason, I think we should try to get reason as a field in additionalPrinterColumns so the same information in kubectl as today with synced.

@chlunde
Copy link

chlunde commented Sep 2, 2024

And this? :) #39

@chlunde
Copy link

chlunde commented Sep 2, 2024

And: #363 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap
Projects
Status: In Design
Development

Successfully merging a pull request may close this issue.

7 participants