Are servers culled if there are busy kernels? #10

stevenstetzler · 2020-08-07T23:23:55Z

When using a JupyterHub, users will sometimes start a long-running computation in a Jupyter notebook and leave notebook servers inactive (closing their laptop or shutting the JupyterHub web page). I am hoping to have servers culled only if there isn't a busy kernel running. Does the jupyterhub idle culler take into account that there may be no server activity from the user, but there may still be a running kernel?

I've traced how the latest activity from the server is computed starting with the server activity being sent to the JupyterHub (jupyterhub.singleuser), using the max of the latest activity from the server API, kernel activity, and terminal activity (notebook.notebookapp), and how the kernel activity is updated only when there is a kernel communication (notebook.services.kernels.kernelmanager). It doesn't seem to me based on this that the culler will take into account whether the kernel is still active when deciding to cull a server (instead deciding not to cull if the kernel has been interacted with).

Could anyone confirm that this feature isn't available in the idle culler? If it isn't, would it be feasible to implement?

Kernel status is available through the notebook REST API:

$ curl -H "Authorization: token <token>" <server-url>/api/kernels
[{"id": "<id>", "name": "python3", "last_activity": "2020-08-07T22:27:15.449630Z", "execution_state": "busy", "connections": 1}]

which includes an execution_state key. Additionally, it looks like the server object returned from the JupyterHub REST API as used in the idle culler has a server key to generate the above <server-url>:

$ curl -H "Authorization: token <token>" <hub-url>/hub/api/users/<user>
{"kind": "user", "name": "stevenstetzler", "admin": true, "groups": [], "server": "/user/stevenstetzler/", ...}

so I can see how it might be implemented. If there's interest, is this the right path to go towards implementing this behavior for a pull request?

The text was updated successfully, but these errors were encountered:

welcome · 2020-08-07T23:23:56Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

yuvipanda · 2020-08-11T06:49:22Z

Thanks for opening this issue, @stevenstetzler! I would like to have this functionality here too.

Right now, the culler makes API requests only to JupyterHub, and not to the individual notebooks. We could potentially change this, and have it make requests to each notebook. That gives us more flexibility to do things like this. However, as of now, we'll have to find a way to get this info the JupyterHub 'last activity' API reported by jupyterhub.singleuser for the culler to know about this.

You could do some of this with the notebook config - once the notebook process dies, the pod can be garbage collected. However, not sure if that's a good long term solution.

@minrk would know more.

minrk · 2020-08-11T07:26:56Z

Could anyone confirm that this feature isn't available in the idle culler? If it isn't, would it be feasible to implement?

Short answer:

Correct, it's not available now, and qualified "yes" for feasibility, depending on your experience. To do this, you would need to write a new culler that retrieves activity data directly from single-user servers instead of considering only the information in the Hub API. This is doable, but requires:

c.JupyterHub.admin_access = True enabled to authorize the activity API requests, and
an API request to each server for the activity poll (possibly limited after filtering for culling candidates), which may be a performance/scalability concern.

Long answer:

For fine-grained culling, I do think the notebook server itself has the best control since it can do things like cull idle kernels, consider active connections and execution_state as activity-sources or not, etc.. I actually don't think using the notebook config for culling is a bad solution, but working in concert with the hub culler, it's best if the notebook's internal culler is strictly more aggressive than the Hub activity culler, since the Hub has only a single timestamp to consider, while the notebook's internal logic has more fine-grained parameters.

The single-user server does publish the notebook's own last_activity, collected here so this is an input to the Hub culler. I don't believe there is currently a mechanism to treat long-term 'busy' kernels as activity that propagates, though.

I opened jupyterhub/jupyterhub#3101 because I think we can update singleuser's activity-posting to publish more generic metrics, which would enable this feature at the hub-culler level, in a more flexible way, but that's a longer-term project, I think.

meeseeksmachine · 2020-10-02T12:53:32Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/terminal-manager-cull-properties-not-being-applied-to-notebook-config/6222/1

dipen-epi · 2020-10-05T07:15:10Z

I did use the notebook's internal culler configs as suggested here because I wanted the culling functionality similar to @stevenstetzler 's.
However, in addition to the MappingKernelManager's timeouts, I added TerminalManager cull timeouts as well. This works well when I use jupyter/docker-stacks images and I can see the logs for TerminalManager

Polling every 600 seconds for terminals inactive for > 10800 seconds...

When I try using an image built on top of docker-stacks' notebooks, the TerminalManager no longer gets initialized and I cannot see any logs for TerminalManager.

This is the modified image I'm using

FROM jupyter/datascience-notebook:notebook-6.0.3

USER $NB_UID
RUN conda install --quiet --yes -c conda-forge awscli \
    && \
    conda clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

WORKDIR $HOME

Am I doing something wrong which causes the TerminalManager to not get initialized in the modified image and therefore leads to terminals(and subsequently, pods) not getting terminated.

bruwozniak · 2020-10-06T17:59:39Z

@dipen-epi I think the functionality was added in notebook version 6.1: jupyter/notebook#5372 and I see you're building from 6.0.3 so it's quite logical it would not work

mfloresVicomtech · 2021-05-14T11:23:10Z

I'm too interested on not dropping containers if there is activity on them. Colleages are claiming to be able to leave ML processes running in background, but those are actually being dropped.

consideRatio · 2021-09-19T12:52:50Z

The topic of this issue is a question, but I think there are related action points to it. The actual action points are already represented by concrete issues about them though.

With these, one can implement custom metrics and take actions based on them, or opt to configure the notebook server's internal culler mechanisms etc.

I'd like to close this issue as it has no concrete action point by its own as I see it. If it did have one that I missed, I suggest a new issue is opened focused on that.

minrk added the question Further information is requested label Apr 16, 2021

cdibble mentioned this issue Jun 21, 2021

Clarify documentation about the culling configuration jupyterhub/zero-to-jupyterhub-k8s#2244

Closed

consideRatio closed this as completed Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are servers culled if there are busy kernels? #10

Are servers culled if there are busy kernels? #10

stevenstetzler commented Aug 7, 2020

welcome bot commented Aug 7, 2020

yuvipanda commented Aug 11, 2020

minrk commented Aug 11, 2020 •

edited

Loading

meeseeksmachine commented Oct 2, 2020

dipen-epi commented Oct 5, 2020 •

edited

Loading

bruwozniak commented Oct 6, 2020

mfloresVicomtech commented May 14, 2021

consideRatio commented Sep 19, 2021

Are servers culled if there are busy kernels? #10

Are servers culled if there are busy kernels? #10

Comments

stevenstetzler commented Aug 7, 2020

welcome bot commented Aug 7, 2020

yuvipanda commented Aug 11, 2020

minrk commented Aug 11, 2020 • edited Loading

meeseeksmachine commented Oct 2, 2020

dipen-epi commented Oct 5, 2020 • edited Loading

bruwozniak commented Oct 6, 2020

mfloresVicomtech commented May 14, 2021

consideRatio commented Sep 19, 2021

minrk commented Aug 11, 2020 •

edited

Loading

dipen-epi commented Oct 5, 2020 •

edited

Loading