Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next_complete waiting forever although the completion callback has been emitted #89

Closed
hrouault opened this issue Nov 8, 2024 · 7 comments

Comments

@hrouault
Copy link

hrouault commented Nov 8, 2024

Hi,

I am trying to use nusb with the cameleon library for usb3 cameras. Sometimes, the program hangs because the next_complete method never returns.

Here is how I read the packets:

    async fn read_leader(&mut self) -> StreamResult<()> {
        let leader_buf = self.queue.next_complete().await.into_result()?;
        self.leader_buf = Some(leader_buf);
        Ok(())
    }

I checked that no error is returned and that the program hangs on the await.

Here is a log where the packets are well received for a transfer and then it hangs for the next one.

2024-11-08T13:04:40.279223Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d2a520, status=0, len=52
2024-11-08T13:04:40.279280Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24520, status=0, len=65536
2024-11-08T13:04:40.279371Z DEBUG cameleon::u3v::stream_handle: Received the leader
2024-11-08T13:04:40.279400Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 0, 7 pending transfers
2024-11-08T13:04:40.279451Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 65536, 6 pending transfers
2024-11-08T13:04:40.279584Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24ba0, status=0, len=65536
2024-11-08T13:04:40.279675Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 131072, 5 pending transfers
2024-11-08T13:04:40.279923Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24a80, status=0, len=65536
2024-11-08T13:04:40.279991Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 196608, 4 pending transfers
2024-11-08T13:04:40.280272Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d243c0, status=0, len=65536
2024-11-08T13:04:40.280339Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 262144, 3 pending transfers
2024-11-08T13:04:40.280633Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d246a0, status=0, len=65536
2024-11-08T13:04:40.280700Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 327680, 2 pending transfers
2024-11-08T13:04:40.280951Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24460, status=0, len=61120
2024-11-08T13:04:40.280982Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d248a0, status=0, len=32
2024-11-08T13:04:40.281011Z DEBUG cameleon::u3v::stream_handle: Received the main payload
2024-11-08T13:04:40.281027Z DEBUG cameleon::u3v::stream_handle: Waiting for completion trailer, 1 pending transfers
2024-11-08T13:04:40.281038Z DEBUG cameleon::u3v::stream_handle: Received all the packets
2024-11-08T13:04:40.281227Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d248a0 on endpoint 81
2024-11-08T13:04:40.281303Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d24460 on endpoint 81
2024-11-08T13:04:40.281352Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d246a0 on endpoint 81
2024-11-08T13:04:40.281397Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d243c0 on endpoint 81
2024-11-08T13:04:40.281457Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d24a80 on endpoint 81
2024-11-08T13:04:40.281515Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d24ba0 on endpoint 81
2024-11-08T13:04:40.281565Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d24520 on endpoint 81
2024-11-08T13:04:40.281619Z DEBUG nusb::platform::macos_iokit::transfer: Submitted IN transfer 0x600000d24880 on endpoint 81
2024-11-08T13:04:40.281632Z DEBUG cameleon::u3v::stream_handle: trailer size: 56
2024-11-08T13:04:40.281642Z DEBUG cameleon::u3v::stream_handle: Submitted all the packets
2024-11-08T13:04:40.294321Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d248a0, status=0, len=52
2024-11-08T13:04:40.294422Z DEBUG cameleon::u3v::stream_handle: Received the leader
2024-11-08T13:04:40.294421Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24460, status=0, len=65536
2024-11-08T13:04:40.294455Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 0, 7 pending transfers
2024-11-08T13:04:40.294765Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d246a0, status=0, len=65536
2024-11-08T13:04:40.295098Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d243c0, status=0, len=65536
2024-11-08T13:04:40.295461Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24a80, status=0, len=65536
2024-11-08T13:04:40.295786Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24ba0, status=0, len=65536
2024-11-08T13:04:40.296131Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24520, status=0, len=61120
2024-11-08T13:04:40.296165Z DEBUG nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600000d24880, status=0, len=32

Is there something I should pay attention to?

@kevinmehall
Copy link
Owner

Hmm, if you are able to dive in and troubleshoot this, here's a tour of the code involved.

You're seeing this log message:

extern "C" fn transfer_callback(refcon: *mut c_void, result: IOReturn, len: *mut c_void) {
debug!(
"Completion callback for transfer {refcon:?}, status={result:x}, len={len}",
len = len as usize
);
unsafe {
let callback_data = {
let inner = &mut *(refcon as *mut TransferDataInner);
inner.actual_len = len as usize;
inner.status = result;
inner.callback_data
};
notify_completion::<super::TransferData>(callback_data)
}
}

When that calls notify_completion, it should find the transfer in STATE_PENDING, update it to STATE_COMPLETED and wake the waker:

/// Notify that a transfer has completed.
///
/// SAFETY: `transfer` must be a pointer previously passed to `submit`, and
/// the caller / kernel must no longer dereference it or its buffer.
pub(crate) unsafe fn notify_completion<P: PlatformTransfer>(transfer: *mut c_void) {
unsafe {
let transfer = transfer as *mut TransferInner<P>;
let waker = (*transfer).waker.take();
match (*transfer).state.swap(STATE_COMPLETED, Ordering::Release) {
STATE_PENDING => {
if let Some(waker) = waker {
waker.wake()
}
}
STATE_ABANDONED => {
drop(Box::from_raw(transfer));
}
s => panic!("Completing transfer in unexpected state {s}"),
}
}
}

The waker should have been previously registered here, which means that async task should wake up and poll again, and find the transfer in STATE_COMPLETED

fn poll_completion_generic(&mut self, cx: &Context) -> Poll<&mut P> {
let inner = self.inner();
inner.waker.register(cx.waker());
match inner.state.load(Ordering::Acquire) {
STATE_PENDING => Poll::Pending,
STATE_COMPLETED => {
// Relaxed because this doesn't synchronize with anything,
// just marks that we no longer need to drop the buffer
inner.state.store(STATE_IDLE, Ordering::Relaxed);
// SAFETY: while `TransferHandle` is alive, the only mutable access to `platform_data`
// is via this `TransferHandle`.
Poll::Ready(unsafe { &mut *inner.platform_data.get() })
}
s => panic!("Polling transfer in unexpected state {s}"),
}
}

You could start by adding additional logging to these (or using a debugger) to see which of those steps is not happening.

@hrouault
Copy link
Author

Hi,
Thanks for pointing me to the relevant functions. I didn't have time to look at it before but here are the results:

I've added logging in the function notify_completion and poll_completion_generic:

2024-11-15T09:45:50.086299Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 0, 7 pending transfers
2024-11-15T09:45:50.086324Z  INFO nusb::transfer::internal: state is completed
2024-11-15T09:45:50.086390Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 65536, 6 pending transfers
2024-11-15T09:45:50.086445Z  INFO nusb::transfer::internal: state is completed
2024-11-15T09:45:50.086501Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600001ad3e60, status=0, len=65536
2024-11-15T09:45:50.086517Z DEBUG cameleon::u3v::stream_handle: Waiting for completion: 131072, 5 pending transfers
2024-11-15T09:45:50.086551Z  INFO nusb::transfer::internal: state is pending
2024-11-15T09:45:50.086585Z  INFO nusb::transfer::internal: The transfer is pending
2024-11-15T09:45:50.086788Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600001ae3b60, status=0, len=65536
2024-11-15T09:45:50.086809Z  INFO nusb::transfer::internal: The transfer is pending
2024-11-15T09:45:50.087276Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600001ac0ea0, status=0, len=65536
2024-11-15T09:45:50.087296Z  INFO nusb::transfer::internal: The transfer is pending
2024-11-15T09:45:50.087492Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600001adf4e0, status=0, len=61120
2024-11-15T09:45:50.087515Z  INFO nusb::transfer::internal: The transfer is pending
2024-11-15T09:45:50.087535Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x600001a30320, status=0, len=32
2024-11-15T09:45:50.087547Z  INFO nusb::transfer::internal: The transfer is pending

The notify_completion seems to function as expected (the state was pending and the waker is called). However, the poll_completion_generic function is never called.

I would appreciate your further guidance. thanks

@hrouault hrouault changed the title next_complete waiting forever although the completion callback have been emitted next_complete waiting forever although the completion callback has been emitted Nov 15, 2024
@kevinmehall
Copy link
Owner

kevinmehall commented Nov 15, 2024

You should expect to see poll_completion_generic called at least twice for each next_complete().await -- once when the future is first polled, which will likely see STATE_PENDING. This will register the task's waker, so the task gets woken when the transfer is ready. When woken, the task will poll again, and see STATE_COMPLETED and return the data.

If it is not being called at all, I have to suspect that the returned future is not properly being polled.

The path from next_complete to poll_completion_generic is here:

nusb/src/transfer/queue.rs

Lines 158 to 192 in c4395fe

/// Return a `Future` that waits for the next pending transfer to complete, and yields its
/// buffer and status.
///
/// For an `IN` endpoint, the completion contains a [`Vec<u8>`].\
/// For an `OUT` endpoint, the completion contains a [`ResponseBuffer`][`super::ResponseBuffer`].
///
/// This future is cancel-safe: it can be cancelled and re-created without
/// side effects, enabling its use in `select!{}` or similar.
///
/// Panics if there are no transfers pending.
pub fn next_complete<'a>(
&'a mut self,
) -> impl Future<Output = Completion<R::Response>> + Unpin + Send + Sync + 'a {
poll_fn(|cx| self.poll_next(cx))
}
/// Get the next pending transfer if one has completed, or register the
/// current task for wakeup when the next transfer completes.
///
/// For an `IN` endpoint, the completion contains a [`Vec<u8>`].\
/// For an `OUT` endpoint, the completion contains a
/// [`ResponseBuffer`][`super::ResponseBuffer`].
///
/// Panics if there are no transfers pending.
pub fn poll_next(&mut self, cx: &mut Context) -> Poll<Completion<R::Response>> {
let res = self
.pending
.front_mut()
.expect("queue should have pending transfers when calling next_complete")
.poll_completion::<R>(cx);
if res.is_ready() {
self.cached = self.pending.pop_front();
}
res
}

The only thing I could see going wrong there is if front_mut() is the wrong transfer, such as if they somehow complete out of order, but even then, you'd still see it poll the first time before getting stuck.

What executor are you using / how are you polling the future or the future that .awaits it?

@hrouault
Copy link
Author

This was not clear in my first message, but this problem arises rarely: it hangs after thousand of successful transfers.

From what I see in successful transfers, the poll_completion_generic function is not always called twice for each next_complete().await: sometimes it is called only once when the transfer is already completed. Is it normal behavior? I thought this was expected, and an advantage of the queue system?

From what I understand, when it hangs, the poll_completion_generic function is not called after the notify_completion.

How would the transfer complete out of order? How could I check that?

I am used tasks from the iced gui, but it uses tokio under the hood. A thread is dedicated to the queue await s. I poll the future with functions similar to the one in the first code snippet of the first message. I have one such function for each stage of the transfer (3 functions in total, but all very similar).

@kevinmehall
Copy link
Owner

I found a race condition that could lead to lost wakeups. Can you try #92?

@hrouault
Copy link
Author

I cherry-picked the last commit from that branch and it seems to solve the issue (I waited for about 10 min when it was hanging in the first few minutes usually).
Thanks a lot for your reactivity!

@kevinmehall
Copy link
Owner

Fix released in v0.1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants