-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract and fix draining #1176
Conversation
4d7feac
to
185d838
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of #899 does this resolve? Specifically allowing drain logic to distinguish between (ztunnel shutdown) || (workload shutdown)?
315efe7
to
00a2c29
Compare
src/drain.rs
Outdated
impl DrainWatcher { | ||
/// wait_for_drain will return once a drain has been initiated. | ||
/// The drain will not complete until the returned DrainBlocker is dropped | ||
pub async fn wait_for_drain(self) -> DrainBlocker { | ||
DrainBlocker(self.0.signaled().await) | ||
} | ||
} | ||
|
||
#[allow(dead_code)] | ||
/// DrainBlocker provides a token that must be dropped to unblock the drain. | ||
pub struct DrainBlocker(drain_internal::ReleaseShutdown); | ||
|
||
/// New constructs a new pair for draining | ||
/// * DrainTrigger can be used to start a draining sequence and wait for it to complete. | ||
/// * DrainWatcher should be held by anything that wants to participate in the draining. This can be cloned, | ||
/// and a drain will not complete until all outstanding DrainWatchers are dropped. | ||
pub fn new() -> (DrainTrigger, DrainWatcher) { | ||
let (tx, rx) = drain_internal::channel(); | ||
(DrainTrigger(tx), DrainWatcher(rx)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need DrainBlocker
and friends? tokio::sync::watch
supports all the same semantics and has closed().await
which already blocks until all listeners are dropped? We can just use that, or am I missing something?
I think this wrapper can literally just create and return both ends of a tokio::sync::watch
and probably doesn't need to do much else.
The other nice thing is that makes it much easier to pair with an mpsc
or other channel for bidirectional messaging, if we ever need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my thinking:
- Having nicely named things, with clear semantics (i.e. the ONLY thing you can do with a DrainTrigger is
start_drain_and_wait
) is a good idea, regardless of the underlying implementation. If I just have aWatch
, its not clear what its for, and how to use it. - If we abstract it ,we can change the underlying implementation to directly use
tokio::*
instead ofdrain::*
, if there is benefit to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we abstract it ,we can change the underlying implementation to directly use tokio::* instead of drain::*, if there is benefit to do so
Yeah my point is I think the abstraction doesn't actually provide any benefit (other than the type name alias) if we just use tokio::sync::watch
. Additionally, doing that lets us drop a Cargo dep we really don't need, that (IIRC) we are now only using here - and dropping deps is always good, especially in ztunnel
.
I am fine wrapping tokio::sync::watch
with this wrapper if we want to keep the type alias, but I don't want to keep drain
AND the type alias if regular old tokio::sync::watch
(which is already one of our deps) will do the job, and I am pretty sure it can.
(if I am wrong about that, feel free to ignore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I want:
- A wrapper with reasonable names and exactly one method. Watch has bad names (since they are general) and too many methods (since its general) making it hard to use
- To not spend time right now migrating off
drain
since it works fine.
If someone wants to swap out the underlying implement of drain
, now its trivial to do so -- edit the tiny new package. I am not actually convinced its better (maybe it is, just haven't looked into it), though. I do not think we can SIMPLY stick a watch in there, probably we will end up with 2, and then we have just re-inventing drain
.Which is fine if we have reasons to, but its super low priority, and we need this PR anyways.
c0c1732
to
1f25727
Compare
I think most of this is good to go regardless but in light of istio/istio#51805 I might change do some work to not drain on pod shutdown, only ztunnel. or I can do that in a followup, either one works forme |
* Centralize draining logic in one helper function * Fix inbound draining (HBONE). Before, we did not shut down the listener upon draining. This meant new connections would go to the old ztunnel on a ztunnel restart. * Simplify inbound draining; do not re-create the force shutdown logic, and instead let the common abstraction do it (which does it slightly better) * socsk5: add propery draining with force shutdown. Remove double-spawn, which adds some complexity around the proxy_to_cancellable. This is primarily tested in istio/istio#51710, which sends a large stream of requests and restarts ztunnel and the backend app (2 different tests). With this change, these tests pass. It would be good to get more isolated tests in this repo in the future as well
Fair enough! I still think we don't need |
033febc
to
a2b20c6
Compare
/test all |
77b41e2
to
caffa42
Compare
54468b8
to
12d3957
Compare
listener upon draining. This meant new connections would go to the old
ztunnel on a ztunnel restart.
and instead let the common abstraction do it (which does it slightly
better)
which adds some complexity around the proxy_to_cancellable.
This is primarily tested in istio/istio#51710,
which sends a large stream of requests and restarts ztunnel and the
backend app (2 different tests). With this change, these tests pass.
It would be good to get more isolated tests in this repo in the future
as well