-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not retry updates on conflict #217
Do not retry updates on conflict #217
Conversation
548739d
to
238b674
Compare
Does this change result in conflict errors consistently showing up in the logs during uninstall? In general, I think that's reasonable because it is what is happening, but I've seen that conflict error make many, many users nervous in lots of different operators. It seems like if it is showing up, we would want to quickly see if its possible to eliminate the conflict issue altogether. |
Yes, we've seen it happen pretty much consistently. These changes should fix that, basically by not performing an update on a |
@joelanford let me know if there's anything I can do in terms of tweaking/changing stuff ! |
I'm being pulled in a bunch of different directions, so I'll volunteer @everettraven (if he's willing) to take a look as well. |
I'll add it to my queue 😄 |
1627069
to
d6ae7c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me and seem reasonable. Thanks @ludydoo for taking the time to make this contribution!
@joelanford would this be good to go ? |
// In case of a Conflict error, the update cannot be retried, | ||
// and the reconciliation loop needs to be restarted anew. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward, because these two new comment lines applies to both retry.OnError
calls. However the above 3 lines only apply to the first retry.OnError
call.
I can suggest two solutions:
- move these two lines to a separate comment above this one. However this takes it further away from the second
OnError
call, which is not great. - create a helper function such as:
func retryOnRetryableUpdateError(backoff wait.Backoff, fn func() error) {
return retry.OnError(backoff, isRetryableUpdateError, fn)
}
and then we could put this comment on this function, which would be the most appropriate place. We could optionally also inline isRetryableUpdateError
in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good :)
pkg/reconciler/reconciler.go
Outdated
@@ -521,8 +521,12 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl. | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
shouldUpdate := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a new variable here that is ready by the defer closure (and not easy to grok straight away), I think it might be cleaner to introduce a method on the updater such as SkipAllUpdates()
or DoNotApply()
that would:
- cause
Apply(...)
to justreturn nil
without doing anything, and - (optional) cause the updater to
panic()
on furtherEnsure...
calls to make sure thatSkipUpdates
is closely followed by areturn
fromReconcile()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot 💯
pkg/reconciler/reconciler.go
Outdated
return err | ||
} | ||
} else { | ||
log.Info("Resource is already terminated, skipping deletion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Info("Resource is already terminated, skipping deletion") | |
log.Info("Resource is already terminated, skipping deletion.") |
pkg/reconciler/reconciler.go
Outdated
if err := r.handleDeletion(ctx, actionClient, obj, log); err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
shouldUpdate = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different from the case described in the PR description, right?
I'm guessing the problem here was that on deletions both the uninstallUpdater
and the regular u
updater would fire, and the latter was both useless and would cause errors since the resource would typically be already gone after deleting the finalizer?
If I got this right, please mention this in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, but please add tests for CancelUpdates
to the updater package for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
The changes look good to me. Agree on adding a test for CancelUpdates()
to keep the code coverage same. We can merge this for now to move fwd, and will create an issue for followup as its straightforward.
When the reconciliation loops tries to update a CR, and the result is a Conflict, the reconciliation loop should not retry because this error will never succeed. The reconciliation loop needs to be restarted.
Additionally, there was a situation where the resource would be deleted, but the Updater would still try to update the deleted resource, resulting in errors.