Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932

Merged
merged 7 commits into from
Jun 9, 2022

Conversation

nightkr
Copy link
Member

@nightkr nightkr commented Jun 8, 2022

Fixes #926/#925, and adds a regression test.

This is slightly breaking, since it changes error_policy from a FnMut to Fn. We could work around that by running it inside a Mutex, but I'd rather let clients decide whether that overhead is worth it.

/cc @moustafab

@nightkr nightkr added runtime controller runtime related changelog-fix changelog fix category for prs labels Jun 8, 2022
@nightkr nightkr requested review from clux and a team June 8, 2022 14:12
@nightkr nightkr self-assigned this Jun 8, 2022
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
This fixes kube-rs#926, since we already run multiple reconcilers in parallel.

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
@codecov-commenter
Copy link

Codecov Report

Merging #932 (48e603c) into master (98cf1c8) will increase coverage by 1.37%.
The diff coverage is 74.07%.

❗ Current head 48e603c differs from pull request most recent head 89d19ff. Consider uploading reports for the commit 89d19ff to get more accurate results

@@            Coverage Diff             @@
##           master     #932      +/-   ##
==========================================
+ Coverage   70.77%   72.15%   +1.37%     
==========================================
  Files          64       64              
  Lines        4332     4356      +24     
==========================================
+ Hits         3066     3143      +77     
+ Misses       1266     1213      -53     
Impacted Files Coverage Δ
kube-runtime/src/utils/mod.rs 56.89% <0.00%> (+38.89%) ⬆️
kube-runtime/src/controller/mod.rs 35.18% <79.72%> (+35.18%) ⬆️
kube-derive/src/custom_resource.rs 11.76% <100.00%> (ø)
kube-runtime/src/utils/backoff_reset_timer.rs 82.14% <0.00%> (-0.62%) ⬇️
kube-client/src/client/middleware/mod.rs 93.10% <0.00%> (-0.45%) ⬇️
kube-runtime/src/utils/stream_backoff.rs 87.05% <0.00%> (-0.45%) ⬇️
kube-runtime/src/controller/future_hash_map.rs 95.00% <0.00%> (-0.24%) ⬇️
kube-client/src/client/mod.rs 67.94% <0.00%> (-0.21%) ⬇️
kube-runtime/src/utils/event_flatten.rs 92.10% <0.00%> (-0.21%) ⬇️
kube-runtime/src/controller/runner.rs 95.00% <0.00%> (-0.17%) ⬇️
... and 6 more

@@ -599,6 +599,6 @@ mod tests {
struct FooSpec { foo: String }
};
let input = syn::parse2(input).unwrap();
let kube_attrs = KubeAttrs::from_derive_input(&input).unwrap();
let _kube_attrs = KubeAttrs::from_derive_input(&input).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is fixed in master

Copy link
Member Author

@nightkr nightkr Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it, the only PR that wasn't included in this branch was #931, which didn't touch this.

Copy link
Member

@clux clux Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, my bad. turns out i had a different fix for it (by testing more) in https://github.com/kube-rs/kube-rs/pull/924/files but it hasn't been reviewed and thus not made it into master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh

nightkr and others added 2 commits June 8, 2022 20:55
let err_context = context.clone();
let (scheduler_tx, scheduler_rx) = channel::mpsc::unbounded::<ScheduleRequest<ReconcileRequest<K>>>();
let (scheduler_tx, scheduler_rx) =
channel::mpsc::channel::<ScheduleRequest<ReconcileRequest<K>>>(APPLIER_REQUEUE_BUF_SIZE);
Copy link
Member

@clux clux Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some points that i wanted to lift regarding the unbounded channels here because i'm not sure about the full reasoning here. it feels partly defensible to have unbounded queues:

  • there's generally a limit to how many items people put inside clusters because of how impractical it is to have a large-scale one-to-many relations with common items that have realistic cluster bounds (pods generally do not grow forever, people often make new clusters at some point after 10k or higher)
  • large numbers of objects make global reconcilers an O(n^2) problem. constant requeuing, and retriggering in such cases are also likely to waste of resources (node affecting controllers in controller manager are a huge IO hogs in particular)
  • if we are in such a large-scale case where we have 10k+ items in our reflector cache, and we want to reconcile all of them, we first need enough memory to house 10k+ of these specs, and an unbounded queue would at most double resources

but on the other hand. if i understand this queue correctly, it also serves as a limiter of the amount of parallelism in a controller? in that case, limiting it actually makes a lot of sense, because 10k objects being reconciled at once might DoS a third party service. is that a correct understanding of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, there are actually four different "queues" going on in applier:

  1. The executor itself, this has no hard limit but is de-duped (so there will only be one task running at any one moment per K8s object)
  2. The "pending reconciliations" queue (for objects that are already reconciling but should be retried as soon as their current job is done) queue, this is also unbounded but deduped
  3. The scheduler (for objects that are scheduled to be reconciled again at some point in the future because they requested it via Action), this is also unbounded but deduped
  4. The "pending scheduling requests" queue, for objects that haven't been organized into one of the previous three queues yet

This PR is only concerned with queue 4, where we have no practical way to implement deduping.

Copy link
Member

@clux clux Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand it right:

  • the pending reconciliations queue is pending in Scheduler (i.e. 2)
  • the internal DelayQueue on Scheduler is 3
  • 4 is effectively the queuestream merged with unclassified requeues (scheduler_tx)

and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? Is that something that is viable to bound at some point, somehow? In the Runner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand it right:

Yes.

and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? 😬

Yes.

Is that something that is viable to bound at some point, somehow? In the Runner?

Well, we could add additional constraints for when we actually start running a pending reconciliation. That's not implemented at the moment, on the assumption that you could "just" use a semaphore in your reconciler function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the semaphoring in the executor (which has other benefits, like not having to allocate the task before it actually has a semaphore permit) shouldn't be too difficult either, the main problem there would be that applier's configuration is already getting pretty unwieldy as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to refactor a lot of appliers configuration into some kind of ApplierParams struct that can be heavily defaulted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yeah..

@clux
Copy link
Member

clux commented Jun 8, 2022

  • Coverage 70.77% 72.15% +1.37%

really nice job getting the first applier test in!

As suggested by @clux

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
@clux
Copy link
Member

clux commented Jun 9, 2022

Ok, looking good to me now, will approve. Btw, can we maybe change the title of the PR slightly? It's not clear at a glance how this title is a fix for the buffer block unless you really dig into it.

@nightkr
Copy link
Member Author

nightkr commented Jun 9, 2022

How about "Reschedule reconciliations inside the reconciler tasks"?

Either way this is a fairly insidery change, #925 already "fixed" the issue from the immediate user's perspective.

@clux
Copy link
Member

clux commented Jun 9, 2022

I would at least try to add something like "to avoid deadlocking on full channel"

"Improve reconciler reschedule context to avoid deadlocking on full channel" ?

@nightkr
Copy link
Member Author

nightkr commented Jun 9, 2022

Sure.

@nightkr nightkr changed the title Applier: Run post-reconciliation tasks (such as rescheduling the object) in the reconciler's context Applier: Improve reconciler reschedule context to avoid deadlocking on full channel Jun 9, 2022
@clux clux added this to the 0.74.0 milestone Jun 9, 2022
@nightkr nightkr merged commit 12218ed into kube-rs:master Jun 9, 2022
@nightkr nightkr deleted the bugfix/issue-926 branch June 9, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-fix changelog fix category for prs runtime controller runtime related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Applier hangs if schedule request buffer is full
3 participants