-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: clear stalled jobs on POSIX CI hosts #11246
Conversation
Makefile
Outdated
@@ -188,6 +188,9 @@ test/addons/.buildstamp: config.gypi \ | |||
# TODO(bnoordhuis) Force rebuild after gyp update. | |||
build-addons: $(NODE_EXE) test/addons/.buildstamp | |||
|
|||
clear-stalled: | |||
ps awwwx | grep Release/node | grep -v grep | awk '{print $1}' | xargs kill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about simply using pkill
instead? Isn't that available on all *nix platforms in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not on our AIX hosts.
e537333
to
f402ec6
Compare
f402ec6
to
f4695f4
Compare
CI is green. Removing /cc @nodejs/build for reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this, we've had to clean the AIX machines several times in the last few days so this is great to see.
I'm terrible at grok'ing Makefile stuff. I put this as an order-only prerequisite. If someone knowledgable could indicate whether that's the right thing to do or not in this case, that would be appreciated. |
The If the target is declared phony, I'm not sure that it matters if it is order-only or not as a prerequisite. |
Related to nodejs/build#591 As discussed in that issue, I'd rather have something in the test runner that kept track of processes that hadn't been cleaned up at the end of each run, and failed the job if there were any. However I do think this is better than what we currently have (later jobs randomly failing and Currently I don't think this is logging what processes it is cleaning up though, which is information I do think we need, so I'd be -1 on this without that. I'm also not sure under what circumstances processes get left behind, doesn't tools/test.py clean up once the TIMEOUT happens? I assumed it was for orphaned node subprocesses, but I'm not sure. |
Makefile
Outdated
XARGS = xargs -r | ||
endif | ||
clear-stalled: | ||
ps awwwx | grep Release/node | grep -v grep | awk '{print $$1}' | $(XARGS) kill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just add the same line up to the awk above to list the processes that will be killed, i.e.:
ps awwwx | grep Release/node | grep -v grep
ps awwwx | grep Release/node | grep -v grep | awk '{print $$1}' | $(XARGS) kill
That seems right to me. Always seems to be subprocesses from cluster or debug tests. |
f4695f4
to
08735d5
Compare
Implemented changes per @richardlau and @gibfahn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code addition LGTM, no idea about the Makefile syntax
I agree on this. If a test is leaving processes behind I think it should fail.
I also agree on this. |
@santigimeno So with the latest update this will now list the processes it's killing, so I'd say this is an improvement over what we currently do. Are you -1 on this landing as a step in the right direction? If we could fix |
Sorry, I had overlooked the |
08735d5
to
0140171
Compare
Updated Re-running CI: https://ci.nodejs.org/job/node-test-pull-request/6317/ |
@santigimeno wrote:
Queued up for a subsequent PR: Trott@48693fc |
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Landed in 90ab68b |
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
would need backport PRs to land on v6 and v4 |
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Josh Gavant <josh.gavant@outlook.com>
Sometimes, after a cluster or debug test fails, a fixture hangs around
and holds onto a needed port, causing subsequent CI runs to fail. This
adds a command I've been running manually when this occurs. The command
will clear the stalled jobs before a CI run.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
build