Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract and fix draining #1176

Merged
merged 7 commits into from
Jul 5, 2024

Conversation

howardjohn
Copy link
Member

  • Centralize draining logic in one helper function
  • Fix inbound draining (HBONE). Before, we did not shut down the
    listener upon draining. This meant new connections would go to the old
    ztunnel on a ztunnel restart.
  • Simplify inbound draining; do not re-create the force shutdown logic,
    and instead let the common abstraction do it (which does it slightly
    better)
  • socsk5: add propery draining with force shutdown. Remove double-spawn,
    which adds some complexity around the proxy_to_cancellable.

This is primarily tested in istio/istio#51710,
which sends a large stream of requests and restarts ztunnel and the
backend app (2 different tests). With this change, these tests pass.

It would be good to get more isolated tests in this repo in the future
as well

@howardjohn howardjohn requested a review from a team as a code owner June 25, 2024 17:12
@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Jun 25, 2024
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2024
Copy link
Contributor

@bleggett bleggett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of #899 does this resolve? Specifically allowing drain logic to distinguish between (ztunnel shutdown) || (workload shutdown)?

src/proxy/util.rs Outdated Show resolved Hide resolved
@istio-testing istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2024
@howardjohn howardjohn added the do-not-merge/hold Block automatic merging of a PR. label Jun 26, 2024
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 26, 2024
src/drain.rs Outdated
Comment on lines 20 to 58
impl DrainWatcher {
/// wait_for_drain will return once a drain has been initiated.
/// The drain will not complete until the returned DrainBlocker is dropped
pub async fn wait_for_drain(self) -> DrainBlocker {
DrainBlocker(self.0.signaled().await)
}
}

#[allow(dead_code)]
/// DrainBlocker provides a token that must be dropped to unblock the drain.
pub struct DrainBlocker(drain_internal::ReleaseShutdown);

/// New constructs a new pair for draining
/// * DrainTrigger can be used to start a draining sequence and wait for it to complete.
/// * DrainWatcher should be held by anything that wants to participate in the draining. This can be cloned,
/// and a drain will not complete until all outstanding DrainWatchers are dropped.
pub fn new() -> (DrainTrigger, DrainWatcher) {
let (tx, rx) = drain_internal::channel();
(DrainTrigger(tx), DrainWatcher(rx))
}

Copy link
Contributor

@bleggett bleggett Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need DrainBlocker and friends? tokio::sync::watch supports all the same semantics and has closed().await which already blocks until all listeners are dropped? We can just use that, or am I missing something?

I think this wrapper can literally just create and return both ends of a tokio::sync::watch and probably doesn't need to do much else.

The other nice thing is that makes it much easier to pair with an mpsc or other channel for bidirectional messaging, if we ever need it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my thinking:

  • Having nicely named things, with clear semantics (i.e. the ONLY thing you can do with a DrainTrigger is start_drain_and_wait) is a good idea, regardless of the underlying implementation. If I just have a Watch, its not clear what its for, and how to use it.
  • If we abstract it ,we can change the underlying implementation to directly use tokio::* instead of drain::*, if there is benefit to do so

Copy link
Contributor

@bleggett bleggett Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we abstract it ,we can change the underlying implementation to directly use tokio::* instead of drain::*, if there is benefit to do so

Yeah my point is I think the abstraction doesn't actually provide any benefit (other than the type name alias) if we just use tokio::sync::watch. Additionally, doing that lets us drop a Cargo dep we really don't need, that (IIRC) we are now only using here - and dropping deps is always good, especially in ztunnel.

I am fine wrapping tokio::sync::watch with this wrapper if we want to keep the type alias, but I don't want to keep drain AND the type alias if regular old tokio::sync::watch (which is already one of our deps) will do the job, and I am pretty sure it can.

(if I am wrong about that, feel free to ignore)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I want:

  • A wrapper with reasonable names and exactly one method. Watch has bad names (since they are general) and too many methods (since its general) making it hard to use
  • To not spend time right now migrating off drain since it works fine.

If someone wants to swap out the underlying implement of drain, now its trivial to do so -- edit the tiny new package. I am not actually convinced its better (maybe it is, just haven't looked into it), though. I do not think we can SIMPLY stick a watch in there, probably we will end up with 2, and then we have just re-inventing drain.Which is fine if we have reasons to, but its super low priority, and we need this PR anyways.

@howardjohn howardjohn force-pushed the drain/perfect-draining branch 2 times, most recently from c0c1732 to 1f25727 Compare June 26, 2024 21:34
@howardjohn howardjohn removed the do-not-merge/hold Block automatic merging of a PR. label Jun 26, 2024
@howardjohn howardjohn requested a review from bleggett June 26, 2024 23:06
@howardjohn howardjohn added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. do-not-merge/hold Block automatic merging of a PR. labels Jun 29, 2024
@howardjohn
Copy link
Member Author

I think most of this is good to go regardless but in light of istio/istio#51805 I might change do some work to not drain on pod shutdown, only ztunnel. or I can do that in a followup, either one works forme

* Centralize draining logic in one helper function
* Fix inbound draining (HBONE). Before, we did not shut down the
  listener upon draining. This meant new connections would go to the old
ztunnel on a ztunnel restart.
* Simplify inbound draining; do not re-create the force shutdown logic,
  and instead let the common abstraction do it (which does it slightly
better)
* socsk5: add propery draining with force shutdown. Remove double-spawn,
  which adds some complexity around the proxy_to_cancellable.

This is primarily tested in istio/istio#51710,
which sends a large stream of requests and restarts ztunnel and the
backend app (2 different tests). With this change, these tests pass.

It would be good to get more isolated tests in this repo in the future
as well
@bleggett
Copy link
Contributor

bleggett commented Jul 3, 2024

I think most of this is good to go regardless but in light of istio/istio#51805 I might change do some work to not drain on pod shutdown, only ztunnel. or I can do that in a followup, either one works forme

Fair enough! I still think we don't need drain at all or the Cargo naming kludges that go with it, but if we are doing followups anyway I'm not gonna fuss too much here.

@istio-testing istio-testing removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 3, 2024
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 3, 2024
@howardjohn howardjohn removed do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. do-not-merge/hold Block automatic merging of a PR. labels Jul 5, 2024
@howardjohn
Copy link
Member Author

/test all

@istio-testing istio-testing merged commit c68a919 into istio:master Jul 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes-none Indicates a PR that does not require release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants