-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure iopub subscriptions propagate prior to accepting websocket connections #5908
Conversation
4fe3430
to
2b8e22c
Compare
66c5403
to
b4d41a5
Compare
46cbf72
to
d34bda6
Compare
I thought that there was one set of zmq channels (shell, iopub, etc.) shared across all websocket connections (but I haven't looked at the code carefully recently). Is that wrong, or is that changing in this PR? |
eb69745
to
caa1067
Compare
I think that it is a correct statement. |
b7c088d
to
430247d
Compare
430247d
to
7a3ae6e
Compare
Each websocket client gets its own zmq connection to the kernel, created here using This is what allows the notebook server to not need client-identifying logic, since reply-routing on e.g. stdin, stream channels is handled in the kernel. IOPub messages are indeed wastefully duplicated, however. |
Having poked and prodded quite a bit now, I understand how both branches need a nudge in case of restart:
So it's possible for a 'new' websocket connection, as part of the client's restart steps, to re-use zmq sockets and thus never nudging the new kernel's sockets. The downside of doing this nudge on every ws connect, even on established, already-nudged zmq sockets, is that websocket connections will not be accepted while the kernel is busy, even though they should be. Currently, it is valid to connect and start sending requests without waiting for iopub, but this can result in the issues folks are seeing. The most robust and efficient nudge would be on:
where requests on existing, open sockets are blocked until the nudge resolves, not just connection-opening. That's more complicated, and I think we can go with this solution as first pass, and refine it later, especially if the delay is bothering folks. The last thing I want to look at before merging is what happens if nudge is still outstanding when the connections are closed, then I think this is okay to merge. Thanks @SylvainCorlay! |
I noticed that there are test failures caused by a change in status messages delivered. The nudge waits for any iopub message, then resolves, which means it will tend to resolve on the 'busy' status message, and the 'idle' status message will end up propagating to the client after resolving the connection. It may be better to wait for specifically the idle message associated with the info request to preserve message expectations, even though this is not strictly required. |
ce24932
to
3165c2a
Compare
3165c2a
to
9467854
Compare
@minrk I think this is ready to go. When this gets in I will update the |
- connect iopub first (tiny effect on the race!) - docstrings, log details - resolve immediately if kernel is busy, rather than setting up timeouts, futures - use gen.with_timeout instead of separately managed timeout - use gen.multi to wait for both futures instead of duplicated check in each handler, third Future - add various cancel conditions (sockets closed, kernel stopped, etc.)
Thanks, @SylvainCorlay, @minrk, and @jasongrout! |
|
||
which in turn triggers cleanup | ||
""" | ||
for f in (info_future, iopub_future): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - the function argument is f and the loop variable is f. Is that going to cause a problem?
@kevin-bates - what is the plan for a release with this PR merged? We're seeing consistent problems that this PR is designed to fix, and would love to have a release fixing the problem. I'm happy to help with the release process if that is helpful. |
I can work on a patch release, since @kevin-bates is on vacation this month. |
Awesome, thanks @Zsailer! Again, I'm happy to help if needed. |
Oops, I thought I was a maintainer on PyPI for this repo, but it looks like I was mistaken. 🤦 |
I thought I was too. Looks like @blink1073 and @minrk are owners of the |
I am not an owner, only a maintainer. |
Notebook 6.1.6 released with this PR included. |
Looks like the changes for jupyter/notebook#5908 introduced an additional status response that pushed the actual kernel_info_reply out of the loop's range. Increasing it by one resolves the issue.
Nudge kernel with info request upon opening websocket until some IOPub message is received.
cf jupyter/jupyter_client#593