Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

nightkr · 2022-06-08T14:12:26Z

Fixes #926/#925, and adds a regression test.

This is slightly breaking, since it changes error_policy from a FnMut to Fn. We could work around that by running it inside a Mutex, but I'd rather let clients decide whether that overhead is worth it.

/cc @moustafab

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

This fixes kube-rs#926, since we already run multiple reconcilers in parallel. Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

kube-runtime/src/controller/mod.rs

codecov-commenter · 2022-06-08T14:25:21Z

Codecov Report

Merging #932 (48e603c) into master (98cf1c8) will increase coverage by 1.37%.
The diff coverage is 74.07%.

❗ Current head 48e603c differs from pull request most recent head 89d19ff. Consider uploading reports for the commit 89d19ff to get more accurate results

@@            Coverage Diff             @@
##           master     #932      +/-   ##
==========================================
+ Coverage   70.77%   72.15%   +1.37%     
==========================================
  Files          64       64              
  Lines        4332     4356      +24     
==========================================
+ Hits         3066     3143      +77     
+ Misses       1266     1213      -53

Impacted Files	Coverage Δ
kube-runtime/src/utils/mod.rs	`56.89% <0.00%> (+38.89%)`	⬆️
kube-runtime/src/controller/mod.rs	`35.18% <79.72%> (+35.18%)`	⬆️
kube-derive/src/custom_resource.rs	`11.76% <100.00%> (ø)`
kube-runtime/src/utils/backoff_reset_timer.rs	`82.14% <0.00%> (-0.62%)`	⬇️
kube-client/src/client/middleware/mod.rs	`93.10% <0.00%> (-0.45%)`	⬇️
kube-runtime/src/utils/stream_backoff.rs	`87.05% <0.00%> (-0.45%)`	⬇️
kube-runtime/src/controller/future_hash_map.rs	`95.00% <0.00%> (-0.24%)`	⬇️
kube-client/src/client/mod.rs	`67.94% <0.00%> (-0.21%)`	⬇️
kube-runtime/src/utils/event_flatten.rs	`92.10% <0.00%> (-0.21%)`	⬇️
kube-runtime/src/controller/runner.rs	`95.00% <0.00%> (-0.17%)`	⬇️
... and 6 more

clux · 2022-06-08T18:51:30Z

kube-derive/src/custom_resource.rs

@@ -599,6 +599,6 @@ mod tests {
            struct FooSpec { foo: String }
        };
        let input = syn::parse2(input).unwrap();
-        let kube_attrs = KubeAttrs::from_derive_input(&input).unwrap();
+        let _kube_attrs = KubeAttrs::from_derive_input(&input).unwrap();


i think this is fixed in master

Doesn't look like it, the only PR that wasn't included in this branch was #931, which didn't touch this.

oh, my bad. turns out i had a different fix for it (by testing more) in https://github.com/kube-rs/kube-rs/pull/924/files but it hasn't been reviewed and thus not made it into master

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

clux · 2022-06-08T22:10:31Z

kube-runtime/src/controller/mod.rs

-    let err_context = context.clone();
-    let (scheduler_tx, scheduler_rx) = channel::mpsc::unbounded::<ScheduleRequest<ReconcileRequest<K>>>();
+    let (scheduler_tx, scheduler_rx) =
+        channel::mpsc::channel::<ScheduleRequest<ReconcileRequest<K>>>(APPLIER_REQUEUE_BUF_SIZE);


some points that i wanted to lift regarding the unbounded channels here because i'm not sure about the full reasoning here. it feels partly defensible to have unbounded queues:

there's generally a limit to how many items people put inside clusters because of how impractical it is to have a large-scale one-to-many relations with common items that have realistic cluster bounds (pods generally do not grow forever, people often make new clusters at some point after 10k or higher)

large numbers of objects make global reconcilers an O(n^2) problem. constant requeuing, and retriggering in such cases are also likely to waste of resources (node affecting controllers in controller manager are a huge IO hogs in particular)

if we are in such a large-scale case where we have 10k+ items in our reflector cache, and we want to reconcile all of them, we first need enough memory to house 10k+ of these specs, and an unbounded queue would at most double resources

but on the other hand. if i understand this queue correctly, it also serves as a limiter of the amount of parallelism in a controller? in that case, limiting it actually makes a lot of sense, because 10k objects being reconciled at once might DoS a third party service. is that a correct understanding of this?

Not quite, there are actually four different "queues" going on in applier:

The executor itself, this has no hard limit but is de-duped (so there will only be one task running at any one moment per K8s object)

The "pending reconciliations" queue (for objects that are already reconciling but should be retried as soon as their current job is done) queue, this is also unbounded but deduped

The scheduler (for objects that are scheduled to be reconciled again at some point in the future because they requested it via Action), this is also unbounded but deduped

The "pending scheduling requests" queue, for objects that haven't been organized into one of the previous three queues yet

This PR is only concerned with queue 4, where we have no practical way to implement deduping.

So if I understand it right:

the pending reconciliations queue is pending in Scheduler (i.e. 2)

the internal DelayQueue on Scheduler is 3

4 is effectively the queuestream merged with unclassified requeues (scheduler_tx)

and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? Is that something that is viable to bound at some point, somehow? In the Runner?

So if I understand it right:

Yes.

and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? 😬

Yes.

Is that something that is viable to bound at some point, somehow? In the Runner?

Well, we could add additional constraints for when we actually start running a pending reconciliation. That's not implemented at the moment, on the assumption that you could "just" use a semaphore in your reconciler function.

Doing the semaphoring in the executor (which has other benefits, like not having to allocate the task before it actually has a semaphore permit) shouldn't be too difficult either, the main problem there would be that applier's configuration is already getting pretty unwieldy as it is.

Maybe it makes sense to refactor a lot of appliers configuration into some kind of ApplierParams struct that can be heavily defaulted.

Probably, yeah..

kube-runtime/src/controller/mod.rs

clux · 2022-06-08T22:36:39Z

Coverage 70.77% 72.15% +1.37%

really nice job getting the first applier test in!

@clux

As suggested by @clux Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

clux · 2022-06-09T14:11:51Z

Ok, looking good to me now, will approve. Btw, can we maybe change the title of the PR slightly? It's not clear at a glance how this title is a fix for the buffer block unless you really dig into it.

nightkr · 2022-06-09T14:14:20Z

How about "Reschedule reconciliations inside the reconciler tasks"?

Either way this is a fairly insidery change, #925 already "fixed" the issue from the immediate user's perspective.

clux · 2022-06-09T14:16:53Z

I would at least try to add something like "to avoid deadlocking on full channel"

"Improve reconciler reschedule context to avoid deadlocking on full channel" ?

nightkr · 2022-06-09T14:17:07Z

Sure.

nightkr added runtime controller runtime related changelog-fix changelog fix category for prs labels Jun 8, 2022

nightkr requested review from clux and a team June 8, 2022 14:12

nightkr self-assigned this Jun 8, 2022

nightkr added 4 commits June 8, 2022 16:12

Revert kube-rs#925 to prepare for a proper fix

d0adfa0

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

Add test for kube-rs#926

da80f15

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

Run post-reconciliation requeue in the reconciler context

c25bd48

This fixes kube-rs#926, since we already run multiple reconcilers in parallel. Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

Scale down the kube-rs#926 hang test slightly

89d19ff

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

nightkr force-pushed the bugfix/issue-926 branch from 48e603c to 89d19ff Compare June 8, 2022 14:12

nightkr commented Jun 8, 2022

View reviewed changes

kube-runtime/src/controller/mod.rs Show resolved Hide resolved

clux reviewed Jun 8, 2022

View reviewed changes

nightkr and others added 2 commits June 8, 2022 20:55

Include shutdown phase for deadlock test

8c9f591

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

Merge branch 'master' into bugfix/issue-926

ebf1607

clux reviewed Jun 8, 2022

View reviewed changes

kube-runtime/src/controller/mod.rs Show resolved Hide resolved

clux reviewed Jun 8, 2022

View reviewed changes

kube-runtime/src/controller/mod.rs Outdated Show resolved Hide resolved

Rename PostReconciler to RescheduleReconciliation

457bdb1

As suggested by @clux Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>

clux approved these changes Jun 9, 2022

View reviewed changes

nightkr changed the title ~~Applier: Run post-reconciliation tasks (such as rescheduling the object) in the reconciler's context~~ Applier: Improve reconciler reschedule context to avoid deadlocking on full channel Jun 9, 2022

clux added this to the 0.74.0 milestone Jun 9, 2022

nightkr merged commit 12218ed into kube-rs:master Jun 9, 2022

nightkr deleted the bugfix/issue-926 branch June 9, 2022 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

nightkr commented Jun 8, 2022

codecov-commenter commented Jun 8, 2022

clux Jun 8, 2022

nightkr Jun 8, 2022 •

edited

Loading

clux Jun 8, 2022 •

edited

Loading

nightkr Jun 8, 2022

clux Jun 8, 2022 •

edited

Loading

nightkr Jun 8, 2022

clux Jun 9, 2022 •

edited

Loading

nightkr Jun 9, 2022

nightkr Jun 9, 2022

clux Jun 9, 2022

nightkr Jun 9, 2022

clux commented Jun 8, 2022

clux commented Jun 9, 2022

nightkr commented Jun 9, 2022

clux commented Jun 9, 2022

nightkr commented Jun 9, 2022

Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

Conversation

nightkr commented Jun 8, 2022

codecov-commenter commented Jun 8, 2022

Codecov Report

clux Jun 8, 2022

Choose a reason for hiding this comment

nightkr Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

clux Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

nightkr Jun 8, 2022

Choose a reason for hiding this comment

clux Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

nightkr Jun 8, 2022

Choose a reason for hiding this comment

clux Jun 9, 2022 • edited Loading

Choose a reason for hiding this comment

nightkr Jun 9, 2022

Choose a reason for hiding this comment

nightkr Jun 9, 2022

Choose a reason for hiding this comment

clux Jun 9, 2022

Choose a reason for hiding this comment

nightkr Jun 9, 2022

Choose a reason for hiding this comment

clux commented Jun 8, 2022

clux commented Jun 9, 2022

nightkr commented Jun 9, 2022

clux commented Jun 9, 2022

nightkr commented Jun 9, 2022

nightkr Jun 8, 2022 •

edited

Loading

clux Jun 8, 2022 •

edited

Loading

clux Jun 8, 2022 •

edited

Loading

clux Jun 9, 2022 •

edited

Loading