-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect kernel readiness more reliably #2207
Conversation
Davis found this discussion: https://github.com/jupyter/enhancement-proposals/blob/master/65-jupyter-xpub/jupyter-xpub.md#current-solution It warns against the possibility of Shell connecting and servicing requests before the IOPub socket is fully connected. When this happens, the output does not reach the frontend and the user loses data. To prevent this, we now send dummy requests until we get a status message on IOPub. Our notion of readiness becomes:
|
6fb947f
to
fa001c7
Compare
// Ignore Busy and Idle statuses that emitted on IOPub during startup | ||
// (e.g. during the kernel-info request that we emit to detect kernel | ||
// readiness). This prevents switching from Starting to Busy before we | ||
// switch to Ready. | ||
if (this.isStarting() && this.isRuntimeStatus(status)) { | ||
// We've got a status, which signals completion of the first half | ||
// of the kernel startup | ||
this._receivedInitialStatus = true; | ||
|
||
// Return for now, we'll emit statuses after the second half of | ||
// startup (reception of a kernel-info reply) has completed | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense in my head to keep setStatus()
as simple as possible.
How would you feel about moving this block into this._iopub?.onMessage(
? From what I can tell, it would go right before the switch (state) {
line. It feels "right" to me to push this as close as possible to the native handling of the raw message.
This would also prevent these initial statuses from getting into the _idleMessageIds
and _busyMessageIds
Sets, which seems nice.
It also aligns with the fact that the rest of the related code is in the similar hook, this._shell?.onMessage(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I have now done that
fa001c7
to
ece4ec0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I wish there were a better way to do this!
ece4ec0
to
405e32c
Compare
Closes #569 This PR fixes a race condition regarding subscriptions to IOPub that causes clients to miss IOPub messages: - On startup a client connects to the server sockets of a kernel. - The client sends a request on Shell. - The kernel starts processing the request and emits busy on IOPub. If the client hasn't been able to fully subscribe to IOPub, messages can be lost, in particular the Busy message that encloses the request output. On the Positron side we fixed it by sending kernel-info requests in a loop until we get a Ready message on IOPub. This signals Positron that the kernel is fully connected and in the Ready state: posit-dev/positron#2207. We haven't implemented a similar fix in our dummy clients for integration tests and we believe this is what is causing the race condition described in #569. As noted in posit-dev/positron#2207, there is an accepted JEP proposal (JEP 65) that aims at solving this problem by switching to XPUB. https://jupyter.org/enhancement-proposals/65-jupyter-xpub/jupyter-xpub.html jupyter/enhancement-proposals#65 The XPUB socket allows the server to get notified of all new subscriptions. A message of type `iopub_welcome` is sent to all connected clients. They should generally ignore it but clients that have just started up can use it as a cue that IOPub is correctly connected and that they won't miss any output from that point on. Approach: The subscription notification comes in as a message on the IOPub socket. This is problematic because the IOPub thread now needs to listens to its crossbeam channel and to the 0MQ socket at the same time, which isn't possible without resorting to timeout polling. So we use the same approach and infrastructure that we implemented in #58 for listeing to both input replies on the StdIn socket and interrupt notifications on a crossbeam channel. The forwarding thread now owns the IOPub socket and listens for subscription notifications and fowrards IOPub messages coming from the kernel components. --- * Start moving IOPub messages to forwarding thread * Remove unused import * Resolve the roundabout `Message` problem The solution was to move the conversion to `JupyterMessage<T>` up into the match, so we "know" what `T` is! * Use correct `Welcome` `MessageType` * Implement `SubscriptionMessage` support and switch to `XPUB` * The `Welcome` message doesn't come from ark * Use `amalthea::Result` * Add more comments --------- Co-authored-by: Davis Vaughan <davis@posit.co> Co-authored-by: Lionel Henry <lionel@posit.co>
Closes #569 This PR fixes a race condition regarding subscriptions to IOPub that causes clients to miss IOPub messages: - On startup a client connects to the server sockets of a kernel. - The client sends a request on Shell. - The kernel starts processing the request and emits busy on IOPub. If the client hasn't been able to fully subscribe to IOPub, messages can be lost, in particular the Busy message that encloses the request output. On the Positron side we fixed it by sending kernel-info requests in a loop until we get a Ready message on IOPub. This signals Positron that the kernel is fully connected and in the Ready state: posit-dev/positron#2207. We haven't implemented a similar fix in our dummy clients for integration tests and we believe this is what is causing the race condition described in #569. As noted in posit-dev/positron#2207, there is an accepted JEP proposal (JEP 65) that aims at solving this problem by switching to XPUB. https://jupyter.org/enhancement-proposals/65-jupyter-xpub/jupyter-xpub.html jupyter/enhancement-proposals#65 The XPUB socket allows the server to get notified of all new subscriptions. A message of type `iopub_welcome` is sent to all connected clients. They should generally ignore it but clients that have just started up can use it as a cue that IOPub is correctly connected and that they won't miss any output from that point on. Approach: The subscription notification comes in as a message on the IOPub socket. This is problematic because the IOPub thread now needs to listens to its crossbeam channel and to the 0MQ socket at the same time, which isn't possible without resorting to timeout polling. So we use the same approach and infrastructure that we implemented in #58 for listeing to both input replies on the StdIn socket and interrupt notifications on a crossbeam channel. The forwarding thread now owns the IOPub socket and listens for subscription notifications and fowrards IOPub messages coming from the kernel components. --- * Start moving IOPub messages to forwarding thread * Remove unused import * Resolve the roundabout `Message` problem The solution was to move the conversion to `JupyterMessage<T>` up into the match, so we "know" what `T` is! * Use correct `Welcome` `MessageType` * Implement `SubscriptionMessage` support and switch to `XPUB` * The `Welcome` message doesn't come from ark * Use `amalthea::Result` * Add more comments --------- Co-authored-by: Davis Vaughan <davis@posit.co> Co-authored-by: Lionel Henry <lionel@posit.co>
@DavisVaughan This is the idea we discussed yesterday to provide a more practical notion of readiness than just receiving a heartbeat from the kernel, in a way that's generic across kernels. Instead of switching to Ready as soon as we get a heartbeat, we first send a dummy kernel-info request to the backend. When the reply comes in we know the kernel init has sufficiently progressed that it's able to service a request about its internal state. That's our cue to switch to Ready.
Alternative to posit-dev/ark#234