-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: proxy::call appears to deadlock #350
Comments
Sounds like an issue with the rest of your runtime/asnyc code but of course it can always be a regression. A minimal reproducer would indeed be very helpful in identifying the issue, whether or not, it's in zbus. 🙏 |
For the record we've a completely separate queue for handling incoming signals and one for receiving replies (i-e |
Ahh, I hadn't considered that. I am using async fn reload_unit_state(&mut self) -> Result<(), zbus::Error> {
let units = self.manager.list_units().await?;
self.units = units
.into_iter()
.map(|unit| (unit.name.clone().into(), unit))
.collect();
Ok(())
} Are you saying if I have another task/call in parallel that blocks I could lockup the whole |
I've reviewed my code and I cannot see how it is possible that my application code is somehow sabotaging this async call. I will provide some additional context: I have confirmed via timeout that the tokio executor is not blocked, the timeout executes fine indicating my event loop is turning over correctly: // This timeout will trigger in the scenario where `list_units` fails to return (presumed deadlocked).
tokio::time::timeout(RELOAD_STATE_TIMEOUT, self.reload_unit_state()).await??; Of interest is I take out multiple signal streams, re-reading the docs I am reminded of this statement:
Is it the case that because I block waiting for the If it's relevant this is my core run loop (not expecting you to read/debug my code, only read this if it helps): Run Looploop { tokio::select! { // SOCKETS Some(update) = self.nexus_updates_rx.recv() => { debug!("Received update from nexus");
|
Yes, that's much possible (and likely causing your issue) if there are many messages coming in. I'd suggest re-organising your code to make use of separate tasks and communicate between them using channels, instead of one big select statement. Or you could just launch the specific calls in question in a throw-away task. If you need to know the results of the calls in the loop, you can use channels and then read from the channel in one of the arms of your Since most likely your hang is caused by a well-documented limitation with solugions/workarounds, I'll close this now. Feel free to re-open this if you've a reason to believe that's not the case. |
Actually, thinking more about this, I think with the recent separation of queues/channels, it should be possible to remove this limitation of For whoever will solve this (mostly likely me): Each |
Very sorry to get your hopes up but upon thinking more about this soution, I realized that this solution simply makes the queue bigger and adds an indirection. More importantly, it will not completely eliminate the possibility of deadlocks.
This won't solve the underlying issue completely either though, only help you make it less unlikely to happen. I'm afraid the real solution has to be in the client code and making use of tasks, select and join APIs to avoid getting into such situation. |
For those coming across this issue in the future, here is a simple/common way I deal with keeping the event loop unblocked: type ReloadUnitStateFut =
Pin<Box<dyn Future<Output = zbus::Result<Vec<Unit>>> + Send + Sync + 'static>>;
struct MyStruct {
manager: ManagerProxy<'static>,
reload_unit_files: FuturesUnordered<ReloadUnitStateFut>,
}
// Clone manager and move it into future to queue async.
let manager = self.manager.clone();
let work = Box::pin(async move { manager.list_units().await });
// Push the workload onto FuturesUnordered so we don't block the event loop.
// We need to continue processing signals, lest we risk a deadlock with
// zbus.
self.reload_unit_files.push(work as ReloadUnitStateFut); The proxy being clonable (presumably through an Happy for this issue to be closed out, I think it's already documented - it's just quite surprising if you're in a rush ;P |
Add getter and setter for message queue capacity. Related: dbus2#350.
That assumption is correct: #[derive(Clone, Debug)]
pub struct Proxy<'a> {
pub(crate) inner: Arc<ProxyInner<'a>>,
} And this also means that the cloning is very cheap. |
Add getter and setter for message queue capacity. Related: dbus2#350.
Looking for guidance on how to best create a minimum repro or provide useful info here. I have an application that is using systemd-zbus which should just be a zbus proxy with nicer types. I am running
zbus v3.12
and I periodically get deadlocks where the application will wait onManagerProxy::list_units
forever.Looking into the generated code,
list_units
expands to this:It dispatches a call and then awaits a reply. Is there any possibility the reply is never getting delivered because a different signal is in the pipe? For reference I subscribe to unit & job status changes, so it seems feasible those events are occurring at the same time im trying to push a call_method through the pipe. I don't have a mental model of the zbus queuing mechanism/async runtime.
Let me know if the best step forward is for me to pull out my systemd interaction code into a minimal repro, or if this is enough info to give you some ideas.
The text was updated successfully, but these errors were encountered: