Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are servers culled if there are busy kernels? #10

Closed
stevenstetzler opened this issue Aug 7, 2020 · 8 comments
Closed

Are servers culled if there are busy kernels? #10

stevenstetzler opened this issue Aug 7, 2020 · 8 comments
Labels
question Further information is requested

Comments

@stevenstetzler
Copy link

When using a JupyterHub, users will sometimes start a long-running computation in a Jupyter notebook and leave notebook servers inactive (closing their laptop or shutting the JupyterHub web page). I am hoping to have servers culled only if there isn't a busy kernel running. Does the jupyterhub idle culler take into account that there may be no server activity from the user, but there may still be a running kernel?

I've traced how the latest activity from the server is computed starting with the server activity being sent to the JupyterHub (jupyterhub.singleuser), using the max of the latest activity from the server API, kernel activity, and terminal activity (notebook.notebookapp), and how the kernel activity is updated only when there is a kernel communication (notebook.services.kernels.kernelmanager). It doesn't seem to me based on this that the culler will take into account whether the kernel is still active when deciding to cull a server (instead deciding not to cull if the kernel has been interacted with).

Could anyone confirm that this feature isn't available in the idle culler? If it isn't, would it be feasible to implement?

Kernel status is available through the notebook REST API:

$ curl -H "Authorization: token <token>" <server-url>/api/kernels
[{"id": "<id>", "name": "python3", "last_activity": "2020-08-07T22:27:15.449630Z", "execution_state": "busy", "connections": 1}]

which includes an execution_state key. Additionally, it looks like the server object returned from the JupyterHub REST API as used in the idle culler has a server key to generate the above <server-url>:

$ curl -H "Authorization: token <token>" <hub-url>/hub/api/users/<user>
{"kind": "user", "name": "stevenstetzler", "admin": true, "groups": [], "server": "/user/stevenstetzler/", ...}

so I can see how it might be implemented. If there's interest, is this the right path to go towards implementing this behavior for a pull request?

@welcome
Copy link

welcome bot commented Aug 7, 2020

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@yuvipanda
Copy link
Collaborator

Thanks for opening this issue, @stevenstetzler! I would like to have this functionality here too.

Right now, the culler makes API requests only to JupyterHub, and not to the individual notebooks. We could potentially change this, and have it make requests to each notebook. That gives us more flexibility to do things like this. However, as of now, we'll have to find a way to get this info the JupyterHub 'last activity' API reported by jupyterhub.singleuser for the culler to know about this.

You could do some of this with the notebook config - once the notebook process dies, the pod can be garbage collected. However, not sure if that's a good long term solution.

@minrk would know more.

@minrk
Copy link
Member

minrk commented Aug 11, 2020

Could anyone confirm that this feature isn't available in the idle culler? If it isn't, would it be feasible to implement?

Short answer:

Correct, it's not available now, and qualified "yes" for feasibility, depending on your experience. To do this, you would need to write a new culler that retrieves activity data directly from single-user servers instead of considering only the information in the Hub API. This is doable, but requires:

  1. c.JupyterHub.admin_access = True enabled to authorize the activity API requests, and
  2. an API request to each server for the activity poll (possibly limited after filtering for culling candidates), which may be a performance/scalability concern.

Long answer:

For fine-grained culling, I do think the notebook server itself has the best control since it can do things like cull idle kernels, consider active connections and execution_state as activity-sources or not, etc.. I actually don't think using the notebook config for culling is a bad solution, but working in concert with the hub culler, it's best if the notebook's internal culler is strictly more aggressive than the Hub activity culler, since the Hub has only a single timestamp to consider, while the notebook's internal logic has more fine-grained parameters.

The single-user server does publish the notebook's own last_activity, collected here so this is an input to the Hub culler. I don't believe there is currently a mechanism to treat long-term 'busy' kernels as activity that propagates, though.

I opened jupyterhub/jupyterhub#3101 because I think we can update singleuser's activity-posting to publish more generic metrics, which would enable this feature at the hub-culler level, in a more flexible way, but that's a longer-term project, I think.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/terminal-manager-cull-properties-not-being-applied-to-notebook-config/6222/1

@dipen-epi
Copy link

dipen-epi commented Oct 5, 2020

I did use the notebook's internal culler configs as suggested here because I wanted the culling functionality similar to @stevenstetzler 's.
However, in addition to the MappingKernelManager's timeouts, I added TerminalManager cull timeouts as well. This works well when I use jupyter/docker-stacks images and I can see the logs for TerminalManager

Polling every 600 seconds for terminals inactive for > 10800 seconds...

When I try using an image built on top of docker-stacks' notebooks, the TerminalManager no longer gets initialized and I cannot see any logs for TerminalManager.

This is the modified image I'm using

FROM jupyter/datascience-notebook:notebook-6.0.3

USER $NB_UID
RUN conda install --quiet --yes -c conda-forge awscli \
    && \
    conda clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

WORKDIR $HOME

Am I doing something wrong which causes the TerminalManager to not get initialized in the modified image and therefore leads to terminals(and subsequently, pods) not getting terminated.

@bruwozniak
Copy link

@dipen-epi I think the functionality was added in notebook version 6.1: jupyter/notebook#5372 and I see you're building from 6.0.3 so it's quite logical it would not work

@minrk minrk added the question Further information is requested label Apr 16, 2021
@mfloresVicomtech
Copy link

I'm too interested on not dropping containers if there is activity on them. Colleages are claiming to be able to leave ML processes running in background, but those are actually being dropped.

@consideRatio
Copy link
Member

The topic of this issue is a question, but I think there are related action points to it. The actual action points are already represented by concrete issues about them though.

With these, one can implement custom metrics and take actions based on them, or opt to configure the notebook server's internal culler mechanisms etc.

I'd like to close this issue as it has no concrete action point by its own as I see it. If it did have one that I missed, I suggest a new issue is opened focused on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants