-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subprocesses of kernels are not killed by restart #104
Comments
I suspect that makes sense on Unix, but The question you linked has a couple of answers that would apply on Windows: one with psutil, and one that involves launching |
ipyparallel has some better way to handle signals on windows: https://github.com/ipython/ipyparallel/blob/d35b4901adde6e651bb6ffc5c8aa9df0d20f1a73/ipyparallel/apps/launcher.py#L282-L288 Would you take a patch which uses |
In principle I'd be OK with that. Though a note on this page is a bit concerning: "Home editions of Windows do not have TASKKILL, useTSKILL instead." Maybe that's outdated? Do we have a good way of checking? |
Not on this side: only pro systems here :-/ |
BTW: can someone on linux check if the problem exists there, too? E.g. run the cell in the initial message and restart the kernel and see if the subprocess is also gone...? |
Yes, with a bit of adapting (turning command string into command list), I can run that (on Linux), and when I restart the kernel, the subprocess is orphaned and continues running. However, thinking about this some more, I'm not sure that the proposed fixes will help. When we restart a kernel, we first ask it nicely to shut down, and only if that doesn't work do we forcibly kill it. So I guess it should be the kernel's own responsibility to deal with child processes in the first instance. |
Ok, I did a bit of googling turned up that taskkill is not available in win7 home. I've not found out about Win8 and Win10. So the workaround is to taskkill, check if the command was found and if not to add a RE "subprocess cleanup": that's true as long as the kernel can answer, but sometimes that's not the case: e.g. if the user starts a http server for displaying a plot and then the plotting itself takes too long, so that the kernel is unreachable and can't answer the "shutdown kernel" message. Or the irkernel call chain mess where the kernel isn't responding as long as it executes something... Another way might be using psutil: http://stackoverflow.com/questions/4789837/how-to-terminate-a-python-subprocess-launched-with-shell-true/25134985#25134985 https://github.com/giampaolo/psutil/blob/1b71ca531ba377e8c71d2c581debfce7eae35a40/psutil/tests/__init__.py#L326 -> If psutil is importable, use that method to kill all children, otherwise the old one? |
The "kill all children" might also becomes a necessity if the future kernel nanny becomes unresponsive itself (not sure if this might ever happen...), e.g. how to kill the chain |
Part of the idea of the kernel nanny is that it should always be responsive, because it's not running user code, or any significant tasks. So the frontend would ask the nanny to shut down, and the nanny would deal with asking the kernel nicely and then forcefully killing it if necessary. However, since the normal case is a kernel that is responsive, I think the first thing is to work out how kernels should shut themselves down and terminate child processes, before we worry too much about what to do when they're not responsive. |
Is that for this issue or the kernel nanny proposal? If the first: that's over my head :-/ -> Should I still submit a PR for the taskkill (or even the psutil) way to kill the kernel and all it's children? |
That's for this issue. I don't think the kernel nanny thing changes this in any significant way. I'd rather leave this in a consistent state, i.e. as it is now, than introduce a bunch of extra complexity that fixes it in some situations but not others. Especially if we don't know how to fix it in the most common case, where the kernel is responding. |
So, here is my argument for having such a "kernel massacre" function included in both the nanny and the juypter_client code. I think there are at least two use cases here for such a function:
But IMO both cases are a nice feature of the kernel nanny (and the code in jupyter_client which would have the same role in case there is no nanny): cleaning up after the user:
|
psutil can find all children of a process, though |
Hi, Adding another simple example of this issue: Run the following code then restart your notebook. Then run it again and you'll see the children piling up (this assumes you have no other things sleeping for the demo to work).
If you want to cleanup run later:
On a high level, this stems from the choice in this toy example to ignore SIGINT. Sometimes a subprocess may do this and it's out of your control so restarting kernels leaves behind a mess (if you don't ignore SIGINT things work as expected). Can we add logic to handle this variant of process cleanup? |
Hi @mlucool, Thanks for the test case! I've spent the better part of today looking into this and it's quite interesting (IMHO). There are three areas of interest, two of which are still a bit of mystery to me.
What I don't understand is why handling One thing we could do relative to restarts is perform shutdown in an immediate fashion ( This would still leave an issue when just terminating a kernel that has set up its child processes to ignore For now, I'd like to fix the regression issue in Should we make the shutdown issued from It would be great to hear from someone with some kernel-side experience and how "shutdown-request" with ignored |
CC @Carreau I think what's happening is as follows (I am not familiar with the client or kernel code, so this is a guess):
Proposal (maybe this should move to IPykernel/IPython?):
|
The I think the parent process should be responsible for the termination of its immediate children (and so on). We have the ability to cheat when the kernel is a local process and send a signal to the process group but (per my previous comment) that only occurs in "fallback" situations when the message-based shutdown request has proven ineffective. (Note: this was one of the primary reasons the interrupt was introduced in the shutdown sequence, so the kernel could respond to "shutdown_request" messages.) I would agree that when receiving a "shutdown_request" message, the kernel process should do whatever it can to terminate its children. This could probably be something like |
Fixes #jupyter/jupyter_client#104 This should make sure we properly cull all subprocesses at shutdown, it does change one of the private method from sync to async in order to no user time.sleep or thread so this may affect subclasses, though I doubt it. It's also not completely clear to me whether this works on windows as SIGINT I belove is not a thing. Regardless as this affects things like dask, and others that are mostly on unix, it should be an improvement. It does the following, stopping as soon as it does not find any more children to current process. - Send sigint to everything - Immediately send sigterm in look with an exponential backoff from 0.01 to 1 second roughtly multiplying the delay until next send by 3 each time. - Switch to sending sigkill with same backoff. There is no delay after sigint, as this is just a courtesy. The delays backoff are not configurable. I can imagine that on slow systems it may make sens
Fixes #jupyter/jupyter_client#104 This should make sure we properly cull all subprocesses at shutdown, it does change one of the private method from sync to async in order to no user time.sleep or thread so this may affect subclasses, though I doubt it. It's also not completely clear to me whether this works on windows as SIGINT I belove is not a thing. Regardless as this affects things like dask, and others that are mostly on unix, it should be an improvement. It does the following, stopping as soon as it does not find any more children to current process. - Send sigint to everything - Immediately send sigterm in look with an exponential backoff from 0.01 to 1 second roughtly multiplying the delay until next send by 3 each time. - Switch to sending sigkill with same backoff. There is no delay after sigint, as this is just a courtesy. The delays backoff are not configurable. I can imagine that on slow systems it may make sens
FTR, ipykernel 6.9.2 fixes #104 (comment). |
Is there a way to make jupyter NOT stop sub-processes? |
Hi @alita-moore. Life-cycle management of the kernel process (i.e., the sub-preocess) is the responsibility of the kernel provisioner. This gives one the ability to implement whatever kind of process management they'd like beyond what is provided by the local provisioner by default. |
Restarting a kernel only kills the direct child, but not any opened subprocesses of that child.
This is a problem if the kernel spawns multiple childprocess like the R kernel (on windows:
R - cmd - R - cmd - Rterm
) or if the python kernel opens a subprocess which sleeps.Issue on the irkernel repo: IRkernel/IRkernel#226
reproducible example (at least on windows): put it into a cell, execute it and then restart the kernel via the notebook UI.
The text was updated successfully, but these errors were encountered: