Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'close' event for spawn() isn't emitted under unknown conditions #7159

Closed
iamdoron opened this issue Jun 5, 2016 · 21 comments
Closed

'close' event for spawn() isn't emitted under unknown conditions #7159

iamdoron opened this issue Jun 5, 2016 · 21 comments
Assignees
Labels
child_process Issues and PRs related to the child_process subsystem. stream Issues and PRs related to the stream subsystem.

Comments

@iamdoron
Copy link
Contributor

iamdoron commented Jun 5, 2016

  • Version: v4.4.5
  • Platform: OS X 10.11.5 (15F34) [Darwin 15.5.0 Darwin Kernel Version 15.5.0: Tue Apr 19 18:36:36 PDT 2016; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64]
  • Subsystem: child_process

I wasn't able to create a code example for this issue, but is happens consistently in our integration tests. We spawn a python process (2.7.10) with binary encoding - eventually we get stderr 'end' & process 'exit' events, but no stdout 'end' or process 'close' events - it seems like it's specific to some of the tests, but not for all. I tested on v4.4.4 (and some other earlier versions) with no similar issues at all - so you might be able to pin point the issue without a proper code example.

a sketch of the code

const process = spawn(pythonPath, command, { encoding: 'binary' } );
process
.on('exit', (code) => { console.log('exit') })
.on('close', (code) => { console.log('close') }) // for some cases this event doesn't arrive
@addaleax addaleax added the child_process Issues and PRs related to the child_process subsystem. label Jun 5, 2016
@bnoordhuis
Copy link
Member

There is nothing in the diff between v4.4.4 and v4.4.5 that stands out. I don't think we'll be able to do much without a test case.

@iamdoron
Copy link
Contributor Author

iamdoron commented Jun 5, 2016

@bnoordhuis - I think it might be connected to gunzip, the following test case demonstrates the issue (at least on my machine)

example.sh:

#!/bin/bash
cat ./example.gz

example.gz

javascript:

const spawn = require("child_process").spawn;
const zlib = require('zlib');
const child = spawn("./example.sh")
child.on('exit', () => {
  console.log("exit");
}).on('close', () => {
  console.log("close");
})
const gunzip = zlib.createGunzip();
gunzip.on('end', () => {
  console.log("gunzip ended");
})
child.stdout.pipe(gunzip).on('data', function() {})

output on 4.4.5 (on my machine)

exit

on 4.4.4:

exit
close
gunzip ended

@addaleax addaleax added the zlib Issues and PRs related to the zlib subsystem. label Jun 5, 2016
@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

48684af is the only zlib-related change in v4.4.5, and even though it touches the zlib close() logic, it does not seem like it could have this effect. I also can’t reproduce any change in behaviour from v4.4.4 to v4.4.5 using the above example on Linux.

There’s also my 51c0808, which touches the piping login for streams, but I wouldn’t say that that’s a likely candidate here, either.

@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

@iamdoron Does anything change from Node v4.1.2 to v4.2.0 for you?

@iamdoron
Copy link
Contributor Author

iamdoron commented Jun 5, 2016

I was able to reproduce it on ubuntu Linux 3.13.0-71-generic #114-Ubuntu SMP Tue Dec 1 02:34:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

just gunzip the example.gz file and append it to a new file until you have ~96MB of unzipped file (maybe you'll need more, I guess it depends on the machine), then gzip it and try running the example again

@MylesBorins
Copy link
Contributor

@addaleax I plan to take a look into this on Monday. Taking advantage of githubs new ability to assign multiple people!

Feel free to unassign

@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

Can confirm that 51c0808 is actually responsible for this… sigh.

@addaleax addaleax added the stream Issues and PRs related to the stream subsystem. label Jun 5, 2016
@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

/cc @nodejs/streams I guess

@MylesBorins
Copy link
Contributor

Meta data from 51c0808

Fixes: #5820
Fixes: #5257
PR-URL: #6023

@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

Sufficiently large example.gz for reproduction: example.gz

addaleax added a commit to addaleax/node that referenced this issue Jun 5, 2016
Reset the `readableState.awaitDrain` counter after manual calls to
`.resume()`.

What might happen otherwise is that a slow consumer at the end of the
pipe could end up stalling the piping in the following scenario:

1. The writable stream indicates that its buffer is full.
2. This leads the readable stream to `pause()` and increase its
   `awaitDrain` counter, which will be decreased by the writable’s next
   `drain` event.
3. Something calls `.resume()` manually.
4. The readable continues to pipe to the writable, but once again
   the writable stream indicates that the buffer is full.
5. The `awaitDrain` counter is thus increased again, but since it has
   now been increased twice for a single piping destination, the next
   `drain` event will not be able to reset `awaitDrain` to zero.
6. The pipe is stalled and no data is passed along anymore.

The solution in this commit is to reset the `awaitDrain` counter to
zero when `resume()` is called.

Fixes: nodejs#7159
@addaleax
Copy link
Member

addaleax commented Jun 5, 2016

PR: #7160

@timlesallen
Copy link

timlesallen commented Jun 8, 2016

We have had to peg to 4.4.4 because of a issue like this. In our case we are piping from sox through ffmpeg both of which we are spawn()ing. We get the exit event on sox but not for ffmpeg.

@addaleax
Copy link
Member

addaleax commented Jun 8, 2016

@thealphanerd Is there a reasonable chance the fix will make it into v4.5.0?

@addaleax
Copy link
Member

addaleax commented Jun 8, 2016

For future reference, bisecting using @petrosagg’s test says that this specific problem with child processes was introduced in 34b535f (#4071).

@addaleax addaleax removed the zlib Issues and PRs related to the zlib subsystem. label Jun 8, 2016
@MylesBorins
Copy link
Contributor

@addaleax #7160 fixes @petrosagg’s problem too?

@addaleax
Copy link
Member

addaleax commented Jun 8, 2016

@thealphanerd Yes, I think we pretty much worked out that it is a duplicate of this one here.

@MylesBorins
Copy link
Contributor

so this was not introduced in v4.4.5?

@addaleax
Copy link
Member

addaleax commented Jun 8, 2016

Yes and no – It was introduced in v4.2.5, but had no chance to affect anything because it was covered by another regression that was present from v4.2.0 to v4.4.5 (The one fixed by #6023 and reported in #5820/#5257, as you mentioned above).

@MylesBorins
Copy link
Contributor

MylesBorins commented Jun 8, 2016

Ok so... the regression that #6023 fixed made this other regression possible? TLDR we need the new fix and everything should be "better"

@addaleax
Copy link
Member

addaleax commented Jun 8, 2016

Yes, exactly. If you should decide that you want to see live #7160 for a longer time in v6, then it’s probably better to revert the #6023 fix temporarily because I’d consider this bug more serious… I’m pretty sure that #7160 is safe to land in v4 and won’t introduce any more regressions, but I get that there are reasons for the rules around this process.

@MylesBorins
Copy link
Contributor

MylesBorins commented Jun 8, 2016

@addaleax thanks for all the insight... will keep you in the loop about what the decision is

evanlucas pushed a commit that referenced this issue Jun 16, 2016
Reset the `readableState.awaitDrain` counter after manual calls to
`.resume()`.

What might happen otherwise is that a slow consumer at the end of the
pipe could end up stalling the piping in the following scenario:

1. The writable stream indicates that its buffer is full.
2. This leads the readable stream to `pause()` and increase its
   `awaitDrain` counter, which will be decreased by the writable’s next
   `drain` event.
3. Something calls `.resume()` manually.
4. The readable continues to pipe to the writable, but once again
   the writable stream indicates that the buffer is full.
5. The `awaitDrain` counter is thus increased again, but since it has
   now been increased twice for a single piping destination, the next
   `drain` event will not be able to reset `awaitDrain` to zero.
6. The pipe is stalled and no data is passed along anymore.

The solution in this commit is to reset the `awaitDrain` counter to
zero when `resume()` is called.

Fixes: #7159
PR-URL: #7160
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Jun 23, 2016
Reset the `readableState.awaitDrain` counter after manual calls to
`.resume()`.

What might happen otherwise is that a slow consumer at the end of the
pipe could end up stalling the piping in the following scenario:

1. The writable stream indicates that its buffer is full.
2. This leads the readable stream to `pause()` and increase its
   `awaitDrain` counter, which will be decreased by the writable’s next
   `drain` event.
3. Something calls `.resume()` manually.
4. The readable continues to pipe to the writable, but once again
   the writable stream indicates that the buffer is full.
5. The `awaitDrain` counter is thus increased again, but since it has
   now been increased twice for a single piping destination, the next
   `drain` event will not be able to reset `awaitDrain` to zero.
6. The pipe is stalled and no data is passed along anymore.

The solution in this commit is to reset the `awaitDrain` counter to
zero when `resume()` is called.

Fixes: #7159
PR-URL: #7160
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Jun 24, 2016
Reset the `readableState.awaitDrain` counter after manual calls to
`.resume()`.

What might happen otherwise is that a slow consumer at the end of the
pipe could end up stalling the piping in the following scenario:

1. The writable stream indicates that its buffer is full.
2. This leads the readable stream to `pause()` and increase its
   `awaitDrain` counter, which will be decreased by the writable’s next
   `drain` event.
3. Something calls `.resume()` manually.
4. The readable continues to pipe to the writable, but once again
   the writable stream indicates that the buffer is full.
5. The `awaitDrain` counter is thus increased again, but since it has
   now been increased twice for a single piping destination, the next
   `drain` event will not be able to reset `awaitDrain` to zero.
6. The pipe is stalled and no data is passed along anymore.

The solution in this commit is to reset the `awaitDrain` counter to
zero when `resume()` is called.

Fixes: #7159
PR-URL: #7160
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Jun 24, 2016
Reset the `readableState.awaitDrain` counter after manual calls to
`.resume()`.

What might happen otherwise is that a slow consumer at the end of the
pipe could end up stalling the piping in the following scenario:

1. The writable stream indicates that its buffer is full.
2. This leads the readable stream to `pause()` and increase its
   `awaitDrain` counter, which will be decreased by the writable’s next
   `drain` event.
3. Something calls `.resume()` manually.
4. The readable continues to pipe to the writable, but once again
   the writable stream indicates that the buffer is full.
5. The `awaitDrain` counter is thus increased again, but since it has
   now been increased twice for a single piping destination, the next
   `drain` event will not be able to reset `awaitDrain` to zero.
6. The pipe is stalled and no data is passed along anymore.

The solution in this commit is to reset the `awaitDrain` counter to
zero when `resume()` is called.

Fixes: #7159
PR-URL: #7160
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants