-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(sinks): New StreamingSink
trait and file sink spike
#1945
Conversation
Signed-off-by: Lucio Franco <luciofranco14@gmail.com>
pub struct LazyStreamingSink<T> { | ||
sink: Option<(Receiver<Event>, T)>, | ||
inner: CompatSink<Sender<Event>, Event>, | ||
} | ||
|
||
impl<T: StreamingSink> Sink for LazyStreamingSink<T> { | ||
type SinkItem = Event; | ||
type SinkError = (); | ||
|
||
fn start_send(&mut self, item: Self::SinkItem) -> StartSend<Self::SinkItem, Self::SinkError> { | ||
if let Some((rx, mut sink)) = self.sink.take() { | ||
tokio02::spawn(async move { | ||
if let Err(error) = sink.run(rx).await { | ||
error!(message = "Unexpected sink failure.", %error); | ||
} | ||
}); | ||
} | ||
|
||
self.inner.start_send(item).map_err(drop) | ||
} | ||
|
||
fn poll_complete(&mut self) -> Poll<(), Self::SinkError> { | ||
self.inner.poll_complete().map_err(drop) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be dropped if we had access to an executor from build
via SinkContext
. This will just lazily spawn the bg task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to spawn a background task. There's only one task doing work here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we want to represent the actual logic of the sink as a its own task that gets fed items. Its just impossible to mix an async fn and Sink trait anyways. If this style works we may just want to feed a receiver to the sinks instead of using the sink trait. That would avoid the extra spawn but in end I don't think this hurts too much anyways.
I'm thinking that the |
@MOZGIII agreed, but I think this is to show that we can always model things in terms of sinks but have a different trait for what our sinks will actually implement. |
I don't get what you mean. However, I can say since I wrote my comment my ming has changed again, and I'm not so against the |
@MOZGIII right, what I mean is that we can always use |
Well, that was my initial point: I don't we why would we. But yeah, we can. So, what are the benefits of actually using sink there, or anywhere even? Compared to a simple future that can take the value and process it (like |
@MOZGIII sink is the most flexible api by far, allowing more fine grained control, so I don't think it makes sense to give that up just yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to communicate completion for the processing of a single event. Otherwise we can have rance conditions, where users call sink.send_all(...).await
and expects the processing of all the events completed when the control returns.
That's effectively what I hit at #1988 (comment).
This is similar to what I’ve said before: I’m curious if you see a simple way to make that happen. |
I'm working on a prototype for an #[async_trait]
pub trait AsyncSink<Item> {
type Error;
async fn ready(&mut self) -> Result<(), Self::Error>;
fn start_send(self: Pin<&mut Self>, item: Item) -> Result<(), Self::Error>;
async fn flush(&mut self) -> Result<(), Self::Error>;
async fn close(&mut self) -> Result<(), Self::Error>;
} So, pretty much like the |
There are some fundamental difficulties, so I might bail on the idea of having the async trait I outlined above. I'd be glad if we could create a task force to discuss the async APIs. Also, looking forward to a presentation that @LucioFranco suggested! I checked #1113, and what we have here is different. In short, our sink implementation from this PR and #1988 are problematic, and do not uphold the invariants the API users (including myself) expect. I think there's a simple fix to that. |
@MOZGIII so an issue with goig with that api is that it doesn't really model very well the stateless poll fn that sink has. I think we don't want to follow the async fn style sink but instead have a single future that consumes some sort of stream. |
What if we do it even simpler? Instead of This has good properties: per-even completion, composable, easy to implement. We should even be able to properly implement a Sink atop of this interface - if we always buffer things at Now, the Sink interface actually works with a single item at a time. It has no support for batching per se - only internal buffering, but the progress on the internal buffer is not communicated to the users of the API per item - the only option users have there is to Our So, to summarise all of the above, I think a simple Alternatively, we could try and build our own more-sink-like interface, but I doubt we really have a need for that. We can start simple! And P.S. I want to give some more context to my previous comment:
I concluded we'd need to pin multiple futures to somewhere - meaning we either need multiple stacks or we need a way of swap futures in and out. Either I'm missing something, or it's currently not possible to implement without |
@MOZGIII one thing that sticks out to me here is if we use a single event processing fn that doesn't really need to be async because if it were that'd mean we have no choice but to either spawn out the event sending future or block the next even. This also leads to the ability to not have full control over the full sink. Think I want to expire a cache, with this process model how do we do that? If we have a stream coming in we can select against it. To me that model fits much better what we will realistically use it for. |
Idk, having a non-async fn is as good as always blocking for the next event. And really, imo, we shouldn't do spawning to run the topology loops.
Let's see. So, an example. We want to expire the cache. With the proposed async process fn version, we only have two options: either do it during the We can do this elegantly. Consider this: trait EventProcessor {
type Error;
async fn process(&mut self, event: Event) -> Result<(), Self::Error>
}
enum ControlCommandProcessingError { ... }
trait ControlCommandProcessor {
type ControlCommand;
async fn control(&mut self, cmd: ControlCommand) -> Result<(), ControlCommandProcessingError>;
}
trait TopologyUnit: ControlCommandProcessor + EventProcessor {}
async fn topology_loop(units: &[Box<dyn TopologyUnit>]) {
// connect topology
// error handling ommited for simplcity
for (unit, input) in connected_units {
// create and expose the control_tx and control_rx somehow
spawn(async {
loop {
// I'm lazy, so it's more of a pseudocode than real rust:
select {
control = control_rx.next() => unit.control(control),
event = input.next() => unit.process(event),
}
}
})
}
} |
Why is that? I don't think we do enough spawning currently.
I am not following how having topology send a signal to evict works? To me this seems more complicated and less isolated than consuming/dropping a stream. |
Closing, superseded by #1988. |
This is an initial spike of what a
StreamingSink
trait would look like withtokio 0.2
and async/await. This is a proposal that could possibly get adopted for our file sink rewrite that is due since it is not compatible with our current runtime upgrade work.Mainly opening this to get some feedback and thoughts, this tries to implement the previous logic in a very intuitive and easy to follow method.
cc @lukesteensen @MOZGIII
Signed-off-by: Lucio Franco luciofranco14@gmail.com