Slow Starting Kernels Proposal #592

blink1073 · 2021-10-14T20:33:58Z

Problem

Jupyter Notebook was originally built with the assumption that kernels would start quickly. This turns out
to not be true for some local kernels and most remote kernels.

Proposed Solution

We previously proposed changing the REST API to reflect kernels/sessions that were "pending". The downside to a REST API change is that the server would need to advertise capability through a versioned API or some other status, and clients would need to be updated to accommodate the changes.
An alternative method is to leave the current REST APIs intact and instead introduce the concept of a "pending" kernel that
acts like a regular kernel from the client's perspective.
A POST to /api/sessions or /api/kernels would create a "pending" kernel and return immediately before starting the kernel.
It remains to be seen during implementation what changes need to be made to handlers and managers, but at the very least we will use a scheduled callback to actually start the kernels when we are handling the POST.
The MappingKernelManager will also need to be updated to handle pending kernels internally in its public methods.
We should use the kernel manager to get the kernel id
We need to think about how kernel failure to start is handled for the user. Previously, it could be given to the user in the response to a POST
We might even be able to add the pending logic to the handlers without needing to affect the managers (e.g by calling save_state on the managers directly)
We might also want to address slow-stopping kernels as part of these changes

The text was updated successfully, but these errors were encountered:

echarles · 2021-10-15T10:26:51Z

A POST to /api/sessions or /api/kernels would create a "pending" kernel and return immediately before starting the kernel.

For now the response to POST /api/sessions, see https://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyter/jupyter_server/master/jupyter_server/services/api/api.yaml#/sessions/post_api_sessions, is e.g.

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "path": "string",
  "name": "string",
  "type": "string",
  "kernel": {
    "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "name": "string",
    "last_activity": "string",
    "connections": 0,
    "execution_state": "string"
  }
}

Is the intent to add an additional status field with value pending for those cases?

blink1073 · 2021-10-15T14:46:15Z

Thanks for the suggestion about state machines, but I agree that we should make this change as small as possible for consumers. Our thought yesterday is that the kernel should still be thought of as "starting" from the point of the view of the REST client, we're extending the "starting" phase to include starting the process.

blink1073 · 2021-10-16T12:03:50Z

Following up from a suggestion @vidartf had during the meeting, there is in fact a _starting_kernels property in MultiKernelManager in jupyter_client, but since it only stores a future for the kernel id, we can't use it to create a model for the GET response. An option is to push this logic down to jupyter_client and allow it to use _starting_kernels in more of its public functions. What do folks think?

blink1073 · 2021-10-16T21:05:38Z

We could make this an opt-in behavior at the level of MultiKernelManager in jupyter_client, whether to use KernelManager objects before waiting for them to start. For consumers opting into this behavior, we add KernelManager.ready property that is a future that resolves when the kernel process has started. We may even have that wait until the "nudge" is complete. Then, a consumer like jupyter_server's websocket handler could wait for the ready future before attempting to send/receive messages to the kernel.

When the opt-in behavior is selected, we do not wait for the future in _async_start_kernel and we instead add the kernel to our internal map of kernels immediately.

blink1073 · 2021-10-18T15:35:30Z

I opened jupyter/jupyter_client#712 to explore the ideas from the previous comment.

mlucool · 2021-10-21T20:47:23Z

This is a great idea!

We worked with @Carreau to tackle slow kernels from another angle. For remote kernels, scheduling is a somewhat large fixed cost (which this proposal seems like it'll bring down). Restarting a remote kernel need not be slow since a user typically means "restart my kernel" and not "reschedule me". With https://github.com/Carreau/inplace_restarter, you can run a restart magic to just restart the kernel, which ends up being very fast for this use case.

If other's like this idea, maybe this can be included as one way to help tackle the slow starting kernel problem for remote kernels. I understand if you feel this is far enough from the rest of this issue to warrant a separate discussion.

blink1073 · 2021-10-21T21:21:59Z

Interesting! Yes, I think that warrants its own discussion. There's also @echarles's recent efforts in https://github.com/datalayer/jupyterpool.

echarles · 2021-10-22T05:31:38Z

Interesting! Yes, I think that warrants its own discussion. There's also @echarles's recent efforts in https://github.com/datalayer/jupyterpool.

The jupyterpool effort came from my frustration as a user to wait 30s (sometimes more) to get an up-and-running Spark on Hadoop (big data) kernel. BTW Things are much better nowadays on that specific are with faster Spark kernels, but if you extrapolate a bit, you can say, hey I want a kernel preloaded with that 30TB of dataset in a ready -to use dataframe in the second.

Having ready-to-be-user Jupyter Kernels to which a user/notebook can bind is something I am working on and is part of making the server more microservice-like, where the security, the code content, the kernel, the datasets... are separated concerns.

Having such a pool of kernel to be used can be simple for python kernel, but drive interesting questions in terms of user impersonation when you want to bind user foo to a running kernel and assing that kernel the permissions of user foo (thinking to e.g. a pod running on a Kubernetes cluster).

To wrap-up I thing this specific issue is a great quick-win to build a better user interaction (say you show message to the user like "Your kernel is starting, we keep you updated") but is just the very first step a long road that we need to discuss and address in may other issues and PRs.

blink1073 added the enhancement label Oct 14, 2021

echarles mentioned this issue Oct 15, 2021

[DISCUSS] RESTful Asynchronous Kernel Starts #197

Closed

blink1073 mentioned this issue Oct 18, 2021

Add support for pending kernels jupyter/jupyter_client#712

Merged

7 tasks

blink1073 mentioned this issue Oct 19, 2021

Use pending kernels #593

Merged

8 tasks

mlucool mentioned this issue Oct 21, 2021

Proposal: Restart in place #596

Open

blink1073 mentioned this issue Oct 27, 2021

Fix Handling of WebSocket Startup Errors jupyterlab/jupyterlab#11358

Merged

Zsailer mentioned this issue Oct 28, 2021

Jupyter Server Notes 2021 jupyter-server/team-compass#4

Closed

blink1073 closed this as completed in #593 Nov 23, 2021

timg512372 mentioned this issue Aug 8, 2023

Pre-Proposal: Add Restart In Place API Support jupyter/enhancement-proposals#117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Starting Kernels Proposal #592

Slow Starting Kernels Proposal #592

blink1073 commented Oct 14, 2021

echarles commented Oct 15, 2021

blink1073 commented Oct 15, 2021

blink1073 commented Oct 16, 2021

blink1073 commented Oct 16, 2021 •

edited

Loading

blink1073 commented Oct 18, 2021

mlucool commented Oct 21, 2021

blink1073 commented Oct 21, 2021 •

edited

Loading

echarles commented Oct 22, 2021

Slow Starting Kernels Proposal #592

Slow Starting Kernels Proposal #592

Comments

blink1073 commented Oct 14, 2021

Problem

Proposed Solution

echarles commented Oct 15, 2021

blink1073 commented Oct 15, 2021

blink1073 commented Oct 16, 2021

blink1073 commented Oct 16, 2021 • edited Loading

blink1073 commented Oct 18, 2021

mlucool commented Oct 21, 2021

blink1073 commented Oct 21, 2021 • edited Loading

echarles commented Oct 22, 2021

blink1073 commented Oct 16, 2021 •

edited

Loading

blink1073 commented Oct 21, 2021 •

edited

Loading