reconciler/managed: avoid requeuing if an update event is pending #527

sttts · 2023-08-24T07:42:35Z

Description of your changes

We use this:

return reconcile.Result{Requeue: true}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)

This pattern means that the object is immediately (modulo queue delay) reconciled again. We use cached clients (controller-runtime's default). Chance is high that the immediate client.Get call will return a stale object if that Update did a change. We will run all the complex reconcile logic (all the work twice), create noise and all that, events for example, to eventually find out that the world has moved on (through our own update) and our update of the second reconcile iteration will fail with a conflict error (another event).

The right pattern would be to only requeue if the update was a noop (= resource version didn't change), or even just set Requeue: false. If the update did something, we will see a watch event anyway (for watched resources at least).

I have:

Read and followed Crossplane's contribution process.
Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@upbound.io>

negz

I like the intention here, but I'd prefer not to have to call a function to assess whether we need to requeue at every return point.

I'm wondering how bad it would be to just return reconcile.Result{} (i.e. implicit Requeue: false). If we did that, we'd still get a reconcile queued immediately if any of the following were true:

We updated the MR during this reconcile.
We hit a different error than we did last time (i.e. we updated the Synced condition).
We failed to update the MR status (i.e. we returned a non-nil error).

So the edge case we'd be susceptible to would be hitting the same error twice in a row. In that case nothing would trigger a reconcile to be queued until the poll interval expired (or a watch event happened).

negz · 2023-08-24T20:51:04Z

pkg/reconciler/managed/reconciler.go

+		// the object has changed. We get a watch event and do not need to
+		// requeue. This helps to avoid a reconcile on a stale read when the
+		// informer has not caught up.


I was under the impression that controller-runtime would deduplicate multiple events for the same object in the queue. i.e. If we return Requeue: true and a watch event is triggered we'd only reconcile once, not twice - https://github.com/kubernetes-sigs/controller-runtime/blob/c20ea143/pkg/doc.go#L184

Why isn't this working? Is the issue that the watch-triggered reconcile has potentially already been popped from the queue and processed by another goroutine before we return Requeue: true?

and a watch event is triggered we'd only reconcile once, not twice

The requeue is instant. The informer has delay.

Is the issue that the watch-triggered reconcile has potentially already been popped from the queue and processed by another goroutine before we return Requeue: true

No. There is always just one reconcile per key. Only when we return it to the queue, another go routine could take over immediately. But this PR is about the case when there is no event yet, but the key is immediately popped from the queue. That work is both unnecessary and will very likely run into another conflict error.

The requeue is instant. The informer has delay.

I think I understand now. You're saying the issue is that we'll requeue so fast (i.e. instantly) that we'll just read the same stale resource from the informer on the next reconcile, and essentially keep doing that until the informer cache is updated.

negz · 2023-08-24T21:24:26Z

So the edge case we'd be susceptible to would be hitting the same error twice in a row. In that case nothing would trigger a reconcile to be queued until the poll interval expired (or a watch event happened).

Actually, we have to hit a case where we return reconcile.Result{RequeueAfter: r.pollInterval}. So I think in theory if we returned reconcile.Result{} in error scenarios there'd be a risk that we'd sit waiting for a watch event (or sync interval) before being requeued.

turkenh · 2023-11-02T11:35:37Z

How different is this from #372 ?

reconciler/managed: do not requeue when update event expected

036a84c

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@upbound.io>

sttts requested review from a team as code owners August 24, 2023 07:42

sttts requested review from negz and lsviben August 24, 2023 07:42

sttts changed the title ~~reconciler/managed: avoid temporary data loss to managed on annotation update~~ reconciler/managed: avoid requeuing if an update event is pending Aug 24, 2023

negz reviewed Aug 24, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reconciler/managed: avoid requeuing if an update event is pending #527

reconciler/managed: avoid requeuing if an update event is pending #527

sttts commented Aug 24, 2023

negz left a comment

negz Aug 24, 2023

sttts Aug 30, 2023

sttts Aug 30, 2023

negz Aug 31, 2023

negz commented Aug 24, 2023

turkenh commented Nov 2, 2023 •

edited

Loading

reconciler/managed: avoid requeuing if an update event is pending #527

Are you sure you want to change the base?

reconciler/managed: avoid requeuing if an update event is pending #527

Conversation

sttts commented Aug 24, 2023

Description of your changes

How has this code been tested

negz left a comment

Choose a reason for hiding this comment

negz Aug 24, 2023

Choose a reason for hiding this comment

sttts Aug 30, 2023

Choose a reason for hiding this comment

sttts Aug 30, 2023

Choose a reason for hiding this comment

negz Aug 31, 2023

Choose a reason for hiding this comment

negz commented Aug 24, 2023

turkenh commented Nov 2, 2023 • edited Loading

turkenh commented Nov 2, 2023 •

edited

Loading