Abstract and fix draining #1176

howardjohn · 2024-06-25T17:12:54Z

Centralize draining logic in one helper function
Fix inbound draining (HBONE). Before, we did not shut down the
listener upon draining. This meant new connections would go to the old
ztunnel on a ztunnel restart.
Simplify inbound draining; do not re-create the force shutdown logic,
and instead let the common abstraction do it (which does it slightly
better)
socsk5: add propery draining with force shutdown. Remove double-spawn,
which adds some complexity around the proxy_to_cancellable.

This is primarily tested in istio/istio#51710,
which sends a large stream of requests and restarts ztunnel and the
backend app (2 different tests). With this change, these tests pass.

It would be good to get more isolated tests in this repo in the future
as well

bleggett

How much of #899 does this resolve? Specifically allowing drain logic to distinguish between (ztunnel shutdown) || (workload shutdown)?

src/proxy/util.rs

bleggett · 2024-06-26T16:00:55Z

src/drain.rs

+impl DrainWatcher {
+    /// wait_for_drain will return once a drain has been initiated.
+    /// The drain will not complete until the returned DrainBlocker is dropped
+    pub async fn wait_for_drain(self) -> DrainBlocker {
+        DrainBlocker(self.0.signaled().await)
+    }
+}
+
+#[allow(dead_code)]
+/// DrainBlocker provides a token that must be dropped to unblock the drain.
+pub struct DrainBlocker(drain_internal::ReleaseShutdown);
+
+/// New constructs a new pair for draining
+/// * DrainTrigger can be used to start a draining sequence and wait for it to complete.
+/// * DrainWatcher should be held by anything that wants to participate in the draining. This can be cloned,
+///   and a drain will not complete until all outstanding DrainWatchers are dropped.
+pub fn new() -> (DrainTrigger, DrainWatcher) {
+    let (tx, rx) = drain_internal::channel();
+    (DrainTrigger(tx), DrainWatcher(rx))
+}
+


Why do we need DrainBlocker and friends? tokio::sync::watch supports all the same semantics and has closed().await which already blocks until all listeners are dropped? We can just use that, or am I missing something?

I think this wrapper can literally just create and return both ends of a tokio::sync::watch and probably doesn't need to do much else.

The other nice thing is that makes it much easier to pair with an mpsc or other channel for bidirectional messaging, if we ever need it.

Here is my thinking:

Having nicely named things, with clear semantics (i.e. the ONLY thing you can do with a DrainTrigger is start_drain_and_wait) is a good idea, regardless of the underlying implementation. If I just have a Watch, its not clear what its for, and how to use it.

If we abstract it ,we can change the underlying implementation to directly use tokio::* instead of drain::*, if there is benefit to do so

If we abstract it ,we can change the underlying implementation to directly use tokio::* instead of drain::*, if there is benefit to do so

Yeah my point is I think the abstraction doesn't actually provide any benefit (other than the type name alias) if we just use tokio::sync::watch. Additionally, doing that lets us drop a Cargo dep we really don't need, that (IIRC) we are now only using here - and dropping deps is always good, especially in ztunnel.

I am fine wrapping tokio::sync::watch with this wrapper if we want to keep the type alias, but I don't want to keep drain AND the type alias if regular old tokio::sync::watch (which is already one of our deps) will do the job, and I am pretty sure it can.

(if I am wrong about that, feel free to ignore)

Here is what I want:

A wrapper with reasonable names and exactly one method. Watch has bad names (since they are general) and too many methods (since its general) making it hard to use

To not spend time right now migrating off drain since it works fine.

If someone wants to swap out the underlying implement of drain, now its trivial to do so -- edit the tiny new package. I am not actually convinced its better (maybe it is, just haven't looked into it), though. I do not think we can SIMPLY stick a watch in there, probably we will end up with 2, and then we have just re-inventing drain.Which is fine if we have reasons to, but its super low priority, and we need this PR anyways.

howardjohn · 2024-06-29T00:49:52Z

I think most of this is good to go regardless but in light of istio/istio#51805 I might change do some work to not drain on pod shutdown, only ztunnel. or I can do that in a followup, either one works forme

* Centralize draining logic in one helper function * Fix inbound draining (HBONE). Before, we did not shut down the listener upon draining. This meant new connections would go to the old ztunnel on a ztunnel restart. * Simplify inbound draining; do not re-create the force shutdown logic, and instead let the common abstraction do it (which does it slightly better) * socsk5: add propery draining with force shutdown. Remove double-spawn, which adds some complexity around the proxy_to_cancellable. This is primarily tested in istio/istio#51710, which sends a large stream of requests and restarts ztunnel and the backend app (2 different tests). With this change, these tests pass. It would be good to get more isolated tests in this repo in the future as well

bleggett · 2024-07-03T22:01:24Z

I think most of this is good to go regardless but in light of istio/istio#51805 I might change do some work to not drain on pod shutdown, only ztunnel. or I can do that in a followup, either one works forme

Fair enough! I still think we don't need drain at all or the Cargo naming kludges that go with it, but if we are doing followups anyway I'm not gonna fuss too much here.

howardjohn · 2024-07-05T17:01:50Z

/test all

howardjohn requested a review from a team as a code owner June 25, 2024 17:12

howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Jun 25, 2024

istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2024

howardjohn force-pushed the drain/perfect-draining branch from 4d7feac to 185d838 Compare June 25, 2024 17:13

howardjohn mentioned this pull request Jun 25, 2024

ambient: add tests for components restarting istio/istio#51710

Merged

bleggett reviewed Jun 25, 2024

View reviewed changes

src/proxy/util.rs Outdated Show resolved Hide resolved

istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2024

howardjohn added the do-not-merge/hold Block automatic merging of a PR. label Jun 26, 2024

howardjohn force-pushed the drain/perfect-draining branch from 315efe7 to 00a2c29 Compare June 26, 2024 00:56

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 26, 2024

bleggett reviewed Jun 26, 2024

View reviewed changes

howardjohn force-pushed the drain/perfect-draining branch 2 times, most recently from c0c1732 to 1f25727 Compare June 26, 2024 21:34

howardjohn removed the do-not-merge/hold Block automatic merging of a PR. label Jun 26, 2024

howardjohn requested a review from bleggett June 26, 2024 23:06

howardjohn assigned bleggett Jun 27, 2024

howardjohn added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. do-not-merge/hold Block automatic merging of a PR. labels Jun 29, 2024

howardjohn mentioned this pull request Jul 2, 2024

Implement improved draining #1191

Open

howardjohn added 3 commits July 3, 2024 14:19

Refactor our into own package

67df417

Add tests for draining

12558f2

bleggett approved these changes Jul 3, 2024

View reviewed changes

unclean but forceful shutdown

a2b20c6

howardjohn force-pushed the drain/perfect-draining branch from 033febc to a2b20c6 Compare July 3, 2024 22:13

istio-testing removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 3, 2024

istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 3, 2024

howardjohn removed do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. do-not-merge/hold Block automatic merging of a PR. labels Jul 5, 2024

fmt

caffa42

howardjohn force-pushed the drain/perfect-draining branch from 77b41e2 to caffa42 Compare July 5, 2024 17:56

howardjohn added 2 commits July 5, 2024 11:45

Fix flakes

1d809d7

fix flake

12d3957

howardjohn force-pushed the drain/perfect-draining branch from 54468b8 to 12d3957 Compare July 5, 2024 19:32

istio-testing merged commit c68a919 into istio:master Jul 5, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract and fix draining #1176

Abstract and fix draining #1176

howardjohn commented Jun 25, 2024

bleggett left a comment

bleggett Jun 26, 2024 •

edited

Loading

howardjohn Jun 26, 2024

bleggett Jun 28, 2024 •

edited

Loading

howardjohn Jun 28, 2024

howardjohn commented Jun 29, 2024

bleggett commented Jul 3, 2024 •

edited

Loading

howardjohn commented Jul 5, 2024

Abstract and fix draining #1176

Abstract and fix draining #1176

Conversation

howardjohn commented Jun 25, 2024

bleggett left a comment

Choose a reason for hiding this comment

bleggett Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

howardjohn Jun 26, 2024

Choose a reason for hiding this comment

bleggett Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

howardjohn Jun 28, 2024

Choose a reason for hiding this comment

howardjohn commented Jun 29, 2024

bleggett commented Jul 3, 2024 • edited Loading

howardjohn commented Jul 5, 2024

bleggett Jun 26, 2024 •

edited

Loading

bleggett Jun 28, 2024 •

edited

Loading

bleggett commented Jul 3, 2024 •

edited

Loading