Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send stream failures up the promise chain #1

Closed
wants to merge 1 commit into from

Conversation

lddubeau
Copy link
Contributor

@lddubeau lddubeau commented Jul 3, 2019

The problematic code prior to this PR is this:

function tryExtract (spec, tarStream, dest, opts) {
  return new BB((resolve, reject) => {
    tarStream.on('error', reject)
    setImmediate(resolve)
  })
    .then(() => rimraf(dest))
    .then(() => mkdirp(dest))
    .then(() => new BB((resolve, reject) => {
      const xtractor = extractStream(spec, dest, opts)
      tarStream.on('error', reject)
      xtractor.on('error', reject)
      xtractor.on('close', resolve)
      tarStream.pipe(xtractor)
    }))
    .catch(err => {
      if (err.code === 'EINTEGRITY') {
        err.message = `Verification failed while extracting ${spec}:\n${err.message}`
      }
      throw err
    })
}

There's a race condition in this code. The problem is that neither of the tarStream.on('error', reject) lines are capable of reporting all errors with tarStream. If a tarStream error happens prior to the outer resolve being called, then the first tarStream.on('error'... will pass the error up correctly. (Actually, I don't think this case is possible. At some point I had instrumented the callback passed to both calls to tarStream.on('error'... and the callback passed to setImmediate(...), and it never ever ever ever ever happened that the first tarStream.on('error' caught an error prior to the call scheduled with setImmediate.)

However, for those errors that cannot be detected immediately, the first tarStream.on('error' call is useless. By the time these errors are raised, the resolve call scheduled by setImmediate will have happened, and thus even if the first tarStream.on('error' calls reject, this will have no effect whatsoever because the promise has already resolved.

So there is a window of time between the first tarStream.on('error' and the second one where if a tarStream error occurs, it won't be sent up the promise chain. How big a window it is depends on how long rimraf and mkdirp take to do their work. If a tarStream error is raised during that window, the first error event handler cannot do anything about it, and the 2nd event handler is not setup yet so it cannot do anything either.

When I first filed this issue in the wrong place I had included a test, but further testing indicates that the test was not reliable. Replicating race conditions in a stable way is extremely hard unless the test can get control of all the conditions that participate in the race so that the events that lead to the failure can always be replicated in the same order. I thought I had it, but nope... hope springs eternal I guess.

The problem I initially ran into was that I was running npm in a Docker build where git was not installed, and I got the utterly non-descriptive cb() never called error message. There was no other message with it. I've seen cases where cb() never called is accompanied with some more information but in my case that was the sole error message and --loglevel=silly did not help either. The versions involved were Node v12.5.0 and npm v6.9.2. This macro-issue can be replicated in the following way:

  1. Create a new temporary directory for the test, and move into it.

  2. We want to create a directory that contains just enough executables to allow npm to run overall but does not allow it to run git specifically:

mkdir jail
cd jail
ln -s `which node` .
ln -s `which npm` .
cd ..
  1. Create a package.json with this:
{
  "dependencies": {
    "dragula": "git+https://github.com/lddubeau/dragula.git"
  }
}

(The main thing here is to have a dependency that requires the use of git.)

  1. Run:
npm install
rm -rf node_modules

(This first run of npm install to do a real install is necessary. If PATH=jail is used here, npm will error but the error message is useful.)

  1. Run:
PATH=jail npm install

This results in:

npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

npm ERR! A complete log of this run can be found in:
[path to log]

Running the same, with the PR being proposed here, results in:

npm WARN t4 No description
npm WARN t4 No repository field.
npm WARN t4 No license field.

npm ERR! path git
npm ERR! code ENOENT
npm ERR! errno ENOENT
npm ERR! syscall spawn git
npm ERR! enoent Error while executing:
npm ERR! enoent undefined ls-remote -h -t https://github.com/lddubeau/dragula.git
npm ERR! enoent 
npm ERR! enoent 
npm ERR! enoent spawn git ENOENT
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent 

npm ERR! A complete log of this run can be found in:
[path to log]

Again, this is a race condition so I would not be surprised if some folks try the above and don't get cb() never called.

Prior to the change, race conditions could cause errors on the stream to be
lost. The symptom for npm users would be the infamous "cb() never called" error.
@isaacs
Copy link
Contributor

isaacs commented Jul 16, 2019

The test in the original PR fails consistently for me without this patch, and passes consistently with it. It's a race condition, so yeah, I wouldn't be surprised if the test passes spuriously sometimes, but it's still better than nothing. Reading through the code, this is clearly a race condition, and the correct way to fix it. Good job. I'll cut a release with this shortly.

@isaacs isaacs closed this Jul 16, 2019
@isaacs isaacs reopened this Jul 16, 2019
@isaacs isaacs closed this in 7f07b5d Jul 16, 2019
@isaacs
Copy link
Contributor

isaacs commented Jul 17, 2019

Hey, interesting news!

So, as of this commit, the test/network/registry.js npm/cli test started failing with an EINTEGRITY error. I'd seen that error before, but it was super flaky, and usually very hard to reproduce.

The problem was that the test was installing a package from the test registry, and getting the wrong integrity value. But, always the same integrity value was desired, and always the same one was showing up instead, so it seemed pretty deterministic. Git bisect found this as the first bad commit.

After a few hours of digging, I discovered that it was getting an integrity value from the package-lock.json file from a prior test's install which wasn't cleaned up. (The test was removing node_modules, but not package-lock.json.) The interesting part is that it should have been failing all along, but the error was usually falling between the cracks in the race condition here.

⚡️ this patch exposes a class of integrity issues which were completely ignored before ⚡️

Good job. I'm even more convinced that this was a legit problem before you found and fixed it. Thanks!

@lddubeau
Copy link
Contributor Author

That's great! I suspected this fix was probably going to solve a bunch of difficult-to-diagnose issues, but I did not know how far it would reach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants