Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process kill() swallows synchronously known problems (does not error out) #30668

Closed
jgehrcke opened this issue Nov 26, 2019 · 6 comments
Closed
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. stale

Comments

@jgehrcke
Copy link
Contributor

jgehrcke commented Nov 26, 2019

  • Version: v12.13.1
  • Platform: Linux 5.3.11
  • Subsystem: child_process

See https://gist.github.com/jgehrcke/ab4656353c1155173d2dde5ffceb0d0b for repro code. The output when executing this:

$ node kill_no_pid_repro_js.js 
2019-11-26T16:49:50.509Z info: waiting for the startup error to be handled
2019-11-26T16:49:50.511Z info: loop iteration
2019-11-26T16:49:50.514Z error: 'error' handler: Error: spawn does-not-exist-- ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn does-not-exist--',
  path: 'does-not-exist--',
  spawnargs: []
}
2019-11-26T16:49:50.521Z error: child process startup error
2019-11-26T16:49:50.521Z info: send SIGTERM
2019-11-26T16:49:51.523Z info: send SIGTERM

The log output with millisecond resolution shows the chronological order of events.

In the repro code I use an asynchronous startup error detection technique, using the "error" event. In the log output we can see that this event is being caught, indicating ENOENT (command/executable not found, as desired in this repro). That's great.

After this one can still call the kill() method on the process. The call "succeeds", silently swallowing a problem. The problem as I would put it into words: "there is no process that you can kill here", and that problem should not be silently swallowed, because it makes it too easy to write code with race conditions.

I believe that this should be considered a bug in NodeJS: this is a programmer error, i.e. this should throw an Error, as it would in other programming environments. If unhandled, this should crash the code, indicating to the programmer that their assumption that the process is alive was wrong. The programmer should be required to explicitly handle an error thrown by kill().

It's not necessary to demonstrate the struggle, but maybe makes it easier to understand: the repro code calls kill() another time, about 1 second after the startup error had happened. That kill() also swallows the problem.

I don't know if internally the kill() method just is a noop in this case (where the runtime knows that there is no PID to issue a kill() system call to), or if it actually calls the system call and then swallows the ENOENT. But in both cases it makes a conscious choice, knows about the absence of the process, and hides the erroneous attempt from the programmer.

There could be two ways to check for the error synchronously:

  • The underlying kill() system call does fail with ENOENT, if it is even executed.
  • If the underlying kill() system call is not executed then the runtime seems to have internal state about the fact that the process is not there (I am pretty sure that this is what's happening, see my code comments in the repro code). That state could be used.

It is documented that

The ChildProcess object may emit an 'error' event if the signal cannot be delivered

If this is meant to be the only, reliable, documented way to find out that a kill() failed (not quite clear from the documentation) then I think the repro code is also quite insightful: in the repro code that event handler is not called after the erroneous kill(). That's in fact proven by the additional 1-second wait, during which I would expect that handler to be called.

In the repro code comments I have pointed out that the runtime magically detects that the child process is gone, and it terminates the code pre-maturely, to prevent an indefinitely long wait from happening.

That is, I think this issue mainly reveals a rather mean inconsistency where the runtime sometimes seems to consider the fact that there is no process, and sometimes it doesn't, making it difficult to write robust child process management code.

Note: currently there does not seem to be a documented synchronous way to detect a child process startup error (upon common system call errors such as ENOENT and EACCES), and no documented synchronous way to check for process "liveness", although both could be done synchronously with "fast" system calls. This might be strongly related to this topic here. Related: eclipse-theia/theia#3447 and nodejs/help#1191.

@cjihrig
Copy link
Contributor

cjihrig commented Nov 26, 2019

For your own reference, here is the code for subprocess.kill().

At least one thing that jumps out is that kill()'s return value is not documented. I've opened #30669 for that.

@jgehrcke
Copy link
Contributor Author

jgehrcke commented Nov 26, 2019

Thanks @cjihrig. I modified my repro to check for the return value. In my repro kill() indeed returns false in all cases. Following the implementation details -- since no Error is thrown, it looks like we are sure that in my case the code hits

    if (err === UV_ESRCH) {
      /* Already dead. */

It's a bit sad that the implementation does not expose the error specifics (errno) to the caller.

Given the current implementation we can say that UV_ESRCH ("no such process") definitely results in a false return value. But what other errors also do? That ambiguity is a bit painful. Without that ambiguity we could take your PR and clearly document that UV_ESRCH ("no such process") results in a return value of false. That would be useful. But with the ambiguity we can only say "no such process and some other errors result in the return value false".

@cjihrig
Copy link
Contributor

cjihrig commented Nov 26, 2019

It's not hitting the UV_ESRCH branch. It's not entering the if (this._handle) condition at all, so it just returns false. The _handle isn't created because the application you're trying to spawn doesn't exist. You should also be able to check subprocess.killed === false to see if kill() succeeded.

@jgehrcke
Copy link
Contributor Author

jgehrcke commented Nov 26, 2019

It's not entering the if (this._handle) condition at all [...] The _handle isn't created

Gotcha. For the record, that confirms what I wrote above:

If the underlying kill() system call is not executed then the runtime seems to have internal state about the fact that the process is not there (I am pretty sure that this is what's happening, see my code comments in the repro code).

That is, I can now more specifically say what my proposal is: to raise an explicit Error in that case, indicating to the programmer that there is no child process to manage (to exhibit the kill() on).

cjihrig added a commit to cjihrig/node that referenced this issue Nov 29, 2019
This commit documents the return value from subprocess.kill().

PR-URL: nodejs#30669
Refs: nodejs#30668
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Nov 30, 2019
This commit documents the return value from subprocess.kill().

PR-URL: #30669
Refs: #30668
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@devsnek devsnek added child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. labels Dec 3, 2019
targos pushed a commit that referenced this issue Dec 5, 2019
This commit documents the return value from subprocess.kill().

PR-URL: #30669
Refs: #30668
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
This commit documents the return value from subprocess.kill().

PR-URL: #30669
Refs: #30668
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2022

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Mar 4, 2022
@targos targos moved this to Pending Triage in Node.js feature requests Mar 6, 2022
@targos targos moved this from Pending Triage to Stale in Node.js feature requests Mar 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2022

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. stale
Projects
None yet
Development

No branches or pull requests

3 participants