Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: fix race condition in ~NodeTraceBuffer #25896

Closed
wants to merge 2 commits into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented Feb 2, 2019

Stress test CI looks good so far: https://ci.nodejs.org/job/node-stress-single-test/2146/ (✔️)


Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the flush_signal_ handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
~NodeTraceBuffer may be destroyed.

Credit for debugging goes to Gireesh Punathil.

Fixes: #25512

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the `flush_signal_` handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
`~NodeTraceBuffer` may be destroyed.

Credit for debugging goes to Gireesh Punathil.

Fixes: nodejs#25512
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Feb 2, 2019
@addaleax addaleax added trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Feb 2, 2019
@addaleax
Copy link
Member Author

addaleax commented Feb 2, 2019

@addaleax
Copy link
Member Author

addaleax commented Feb 2, 2019

(btw, @gireeshpunathil, if you want I can add a Co-authored-by: for you -- you really did most of the work here!)

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 2, 2019
@gireeshpunathil
Copy link
Member

gireeshpunathil commented Feb 3, 2019

I see the author-ready label; request some more time before anyone lands this as I am running the test and digesting the changes (it is not that my review is critical, but I have a lot of context on the issue).

@addaleax - sure, that makes sense to me, but if it can wait until I get back on this? Thanks for the kids kind words! For me, problem isolation and determination for bringing CI to stable itself is a great motivation. Walking backwards from the crash site upto the design is painful and error-prone, and the work is complete only if complemented and supported by experts like you who interpret the bits and pieces to make the translation pretty quick and promptly!

@gireeshpunathil
Copy link
Member

Unfortunately the test still segfaults, but manifests in a slightly different manner. Determination needs to be made to see whether it is a fix not complete, or a side effect of the fix, or totally unrelated issue. Analysis in #25512

@gireeshpunathil
Copy link
Member

Does the change in NodeTraceBuffer::ExitSignalCb applies to NodeTraceWriter::ExitSignalCb too?

@Trott Trott removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 3, 2019
@Trott
Copy link
Member

Trott commented Feb 3, 2019

(I removed author ready because @gireeshpunathil wants a bit more time to test and interpret.)

@gireeshpunathil
Copy link
Member

@addaleax - I can confirm that the change is relevant to NodeTraceWriter::ExitSignalCb as well.

@gireeshpunathil
Copy link
Member

@addaleax - I pushed a change to that effect, pls have a look.

@addaleax
Copy link
Member Author

addaleax commented Feb 3, 2019

@gireeshpunathil Yes, your update LGTM 👍

CI: https://ci.nodejs.org/job/node-test-pull-request/20551/

@addaleax
Copy link
Member Author

addaleax commented Feb 3, 2019

CI is good – @gireeshpunathil I assume you’re okay with landing this after the usual 48 hours?

@gireeshpunathil
Copy link
Member

@addaleax - yes, LGTM.
(don't know of a process clause around approving a co-authored PR - a queer situation!)

@addaleax
Copy link
Member Author

addaleax commented Feb 4, 2019

Landed in 5506dcd 🎉

@addaleax addaleax closed this Feb 4, 2019
@addaleax addaleax deleted the trace-buffer-close branch February 4, 2019 17:19
addaleax added a commit that referenced this pull request Feb 4, 2019
Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the `flush_signal_` handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
`~NodeTraceBuffer` may be destroyed.

The same applies for `~NodeTraceWriter`.

Credit for debugging goes to Gireesh Punathil.

Fixes: #25512
Co-authored-by: Gireesh Punathil <gpunathi@in.ibm.com>

PR-URL: #25896
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Eugene Ostroukhov <eostroukhov@google.com>
addaleax added a commit that referenced this pull request Feb 6, 2019
Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the `flush_signal_` handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
`~NodeTraceBuffer` may be destroyed.

The same applies for `~NodeTraceWriter`.

Credit for debugging goes to Gireesh Punathil.

Fixes: #25512
Co-authored-by: Gireesh Punathil <gpunathi@in.ibm.com>

PR-URL: #25896
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Eugene Ostroukhov <eostroukhov@google.com>
@targos targos mentioned this pull request Feb 14, 2019
MylesBorins pushed a commit that referenced this pull request May 16, 2019
Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the `flush_signal_` handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
`~NodeTraceBuffer` may be destroyed.

The same applies for `~NodeTraceWriter`.

Credit for debugging goes to Gireesh Punathil.

Fixes: #25512
Co-authored-by: Gireesh Punathil <gpunathi@in.ibm.com>

PR-URL: #25896
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Eugene Ostroukhov <eostroukhov@google.com>
MylesBorins pushed a commit that referenced this pull request May 16, 2019
Libuv does not guarantee that handles have their close
callbacks called in the order in which they were added
(and in fact, currently calls them in reverse order).

This patch ensures that the `flush_signal_` handle
is no longer in use (i.e. its close callback has already
been run) when we signal to the main thread that
`~NodeTraceBuffer` may be destroyed.

The same applies for `~NodeTraceWriter`.

Credit for debugging goes to Gireesh Punathil.

Fixes: #25512
Co-authored-by: Gireesh Punathil <gpunathi@in.ibm.com>

PR-URL: #25896
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Eugene Ostroukhov <eostroukhov@google.com>
@BethGriggs BethGriggs mentioned this pull request May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. flaky-test Issues and PRs related to the tests with unstable failures on the CI. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows CI failures: parallel/test-trace-events-fs-sync
7 participants