child_process kill() swallows synchronously known problems (does not error out) #30668

jgehrcke · 2019-11-26T17:49:25Z

Version: v12.13.1
Platform: Linux 5.3.11
Subsystem: child_process

See https://gist.github.com/jgehrcke/ab4656353c1155173d2dde5ffceb0d0b for repro code. The output when executing this:

$ node kill_no_pid_repro_js.js 
2019-11-26T16:49:50.509Z info: waiting for the startup error to be handled
2019-11-26T16:49:50.511Z info: loop iteration
2019-11-26T16:49:50.514Z error: 'error' handler: Error: spawn does-not-exist-- ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn does-not-exist--',
  path: 'does-not-exist--',
  spawnargs: []
}
2019-11-26T16:49:50.521Z error: child process startup error
2019-11-26T16:49:50.521Z info: send SIGTERM
2019-11-26T16:49:51.523Z info: send SIGTERM

The log output with millisecond resolution shows the chronological order of events.

In the repro code I use an asynchronous startup error detection technique, using the "error" event. In the log output we can see that this event is being caught, indicating ENOENT (command/executable not found, as desired in this repro). That's great.

After this one can still call the kill() method on the process. The call "succeeds", silently swallowing a problem. The problem as I would put it into words: "there is no process that you can kill here", and that problem should not be silently swallowed, because it makes it too easy to write code with race conditions.

I believe that this should be considered a bug in NodeJS: this is a programmer error, i.e. this should throw an Error, as it would in other programming environments. If unhandled, this should crash the code, indicating to the programmer that their assumption that the process is alive was wrong. The programmer should be required to explicitly handle an error thrown by kill().

It's not necessary to demonstrate the struggle, but maybe makes it easier to understand: the repro code calls kill() another time, about 1 second after the startup error had happened. That kill() also swallows the problem.

I don't know if internally the kill() method just is a noop in this case (where the runtime knows that there is no PID to issue a kill() system call to), or if it actually calls the system call and then swallows the ENOENT. But in both cases it makes a conscious choice, knows about the absence of the process, and hides the erroneous attempt from the programmer.

There could be two ways to check for the error synchronously:

The underlying kill() system call does fail with ENOENT, if it is even executed.
If the underlying kill() system call is not executed then the runtime seems to have internal state about the fact that the process is not there (I am pretty sure that this is what's happening, see my code comments in the repro code). That state could be used.

It is documented that

The ChildProcess object may emit an 'error' event if the signal cannot be delivered

If this is meant to be the only, reliable, documented way to find out that a kill() failed (not quite clear from the documentation) then I think the repro code is also quite insightful: in the repro code that event handler is not called after the erroneous kill(). That's in fact proven by the additional 1-second wait, during which I would expect that handler to be called.

In the repro code comments I have pointed out that the runtime magically detects that the child process is gone, and it terminates the code pre-maturely, to prevent an indefinitely long wait from happening.

That is, I think this issue mainly reveals a rather mean inconsistency where the runtime sometimes seems to consider the fact that there is no process, and sometimes it doesn't, making it difficult to write robust child process management code.

Note: currently there does not seem to be a documented synchronous way to detect a child process startup error (upon common system call errors such as ENOENT and EACCES), and no documented synchronous way to check for process "liveness", although both could be done synchronously with "fast" system calls. This might be strongly related to this topic here. Related: eclipse-theia/theia#3447 and nodejs/help#1191.

The text was updated successfully, but these errors were encountered:

cjihrig · 2019-11-26T18:39:59Z

For your own reference, here is the code for subprocess.kill().

At least one thing that jumps out is that kill()'s return value is not documented. I've opened #30669 for that.

jgehrcke · 2019-11-26T18:47:32Z

Thanks @cjihrig. I modified my repro to check for the return value. In my repro kill() indeed returns false in all cases. Following the implementation details -- since no Error is thrown, it looks like we are sure that in my case the code hits

    if (err === UV_ESRCH) {
      /* Already dead. */

It's a bit sad that the implementation does not expose the error specifics (errno) to the caller.

Given the current implementation we can say that UV_ESRCH ("no such process") definitely results in a false return value. But what other errors also do? That ambiguity is a bit painful. Without that ambiguity we could take your PR and clearly document that UV_ESRCH ("no such process") results in a return value of false. That would be useful. But with the ambiguity we can only say "no such process and some other errors result in the return value false".

cjihrig · 2019-11-26T19:02:48Z

It's not hitting the UV_ESRCH branch. It's not entering the if (this._handle) condition at all, so it just returns false. The _handle isn't created because the application you're trying to spawn doesn't exist. You should also be able to check subprocess.killed === false to see if kill() succeeded.

jgehrcke · 2019-11-26T19:21:09Z

It's not entering the if (this._handle) condition at all [...] The _handle isn't created

Gotcha. For the record, that confirms what I wrote above:

If the underlying kill() system call is not executed then the runtime seems to have internal state about the fact that the process is not there (I am pretty sure that this is what's happening, see my code comments in the repro code).

That is, I can now more specifically say what my proposal is: to raise an explicit Error in that case, indicating to the programmer that there is no child process to manage (to exhibit the kill() on).

This commit documents the return value from subprocess.kill(). PR-URL: nodejs#30669 Refs: nodejs#30668 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com>

This commit documents the return value from subprocess.kill(). PR-URL: #30669 Refs: #30668 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com>

github-actions · 2022-03-04T18:06:22Z

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions · 2022-04-04T01:29:49Z

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

cjihrig mentioned this issue Nov 26, 2019

child_process: document kill() return value #30669

Merged

3 tasks

devsnek added child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. labels Dec 3, 2019

jgehrcke mentioned this issue May 26, 2020

child_process.spawn: does not throw error when uv_spawn() fails #33566

Closed

github-actions bot added the stale label Mar 4, 2022

targos moved this to Pending Triage in Node.js feature requests Mar 6, 2022

targos added this to Node.js feature requests Mar 6, 2022

targos moved this from Pending Triage to Stale in Node.js feature requests Mar 6, 2022

github-actions bot closed this as completed Apr 4, 2022

avivkeller removed this from Node.js feature requests Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

child_process kill() swallows synchronously known problems (does not error out) #30668

child_process kill() swallows synchronously known problems (does not error out) #30668

jgehrcke commented Nov 26, 2019 •

edited

Loading

cjihrig commented Nov 26, 2019 •

edited

Loading

jgehrcke commented Nov 26, 2019 •

edited

Loading

cjihrig commented Nov 26, 2019

jgehrcke commented Nov 26, 2019 •

edited

Loading

github-actions bot commented Mar 4, 2022

github-actions bot commented Apr 4, 2022

child_process kill() swallows synchronously known problems (does not error out) #30668

child_process kill() swallows synchronously known problems (does not error out) #30668

Comments

jgehrcke commented Nov 26, 2019 • edited Loading

cjihrig commented Nov 26, 2019 • edited Loading

jgehrcke commented Nov 26, 2019 • edited Loading

cjihrig commented Nov 26, 2019

jgehrcke commented Nov 26, 2019 • edited Loading

github-actions bot commented Mar 4, 2022

github-actions bot commented Apr 4, 2022

jgehrcke commented Nov 26, 2019 •

edited

Loading

cjihrig commented Nov 26, 2019 •

edited

Loading

jgehrcke commented Nov 26, 2019 •

edited

Loading

jgehrcke commented Nov 26, 2019 •

edited

Loading