Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky addons/dlopen-ping-pong/test-worker #27186

Closed
Trott opened this issue Apr 11, 2019 · 2 comments
Closed

Investigate flaky addons/dlopen-ping-pong/test-worker #27186

Trott opened this issue Apr 11, 2019 · 2 comments
Labels
aix Issues and PRs related to the AIX platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@Trott
Copy link
Member

Trott commented Apr 11, 2019

https://ci.nodejs.org/job/node-test-commit-aix/22493/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-2

00:14:04 not ok 2360 addons/dlopen-ping-pong/test-worker
00:14:04   ---
00:14:04   duration_ms: 0.275
00:14:04   severity: fail
00:14:04   exitcode: 1
00:14:04   stack: |-
00:14:04     Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
00:14:04         at Object.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:335:10)
00:14:04         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/addons/dlopen-ping-pong/test-worker.js:17:23)
00:14:04         at Module._compile (internal/modules/cjs/loader.js:766:30)
00:14:04         at Object.Module._extensions..js (internal/modules/cjs/loader.js:777:10)
00:14:04         at Module.load (internal/modules/cjs/loader.js:635:32)
00:14:04         at Function.Module._load (internal/modules/cjs/loader.js:562:12)
00:14:04         at Function.Module.runMain (internal/modules/cjs/loader.js:830:10)
00:14:04         at internal/main/run_main_module.js:17:11
00:14:04   ...
@Trott Trott added flaky-test Issues and PRs related to the tests with unstable failures on the CI. aix Issues and PRs related to the AIX platform. labels Apr 11, 2019
@richardlau
Copy link
Member

@BethGriggs kicked off a stress run https://ci.nodejs.org/job/node-stress-single-test/18/nodes=aix61-ppc64/console but it failed with a different error while building the addons:

09:34:18 /home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/tools/build-addons.js:58
09:34:18 main(process.argv[3]).catch((err) => setImmediate(() => { throw err; }));
09:34:18                                                           ^
09:34:18 
09:34:18 Error: Command failed: /home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/out/Release/node /home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/deps/npm/node_modules/node-gyp/bin/node-gyp.js rebuild --directory=/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/addons/async-hooks-id
09:34:18 g++: error: vfork: Resource temporarily unavailable
09:34:18 collect2: fatal error: gcc returned 1 exit status
09:34:18 compilation terminated.
09:34:18 gmake[1]: *** [Release/obj.target/binding.node] Error 1
09:34:18 
09:34:18     at ChildProcess.exithandler (child_process.js:299:12)
09:34:18     at ChildProcess.emit (events.js:217:5)
09:34:18     at maybeClose (internal/child_process.js:1026:16)
09:34:18     at Socket.<anonymous> (internal/child_process.js:441:11)
09:34:18     at Socket.emit (events.js:217:5)
09:34:18     at Pipe.<anonymous> (net.js:662:12) {
09:34:18   killed: false,
09:34:18   code: 1,
09:34:18   signal: null,
09:34:18   cmd: '/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/out/Release/node /home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/deps/npm/node_modules/node-gyp/bin/node-gyp.js rebuild --directory=/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/addons/async-hooks-id',
09:34:18   stdout: "gmake[1]: Entering directory `/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/addons/async-hooks-id/build'\n" +
09:34:18     '  CXX(target) Release/obj.target/binding/binding.o\n' +
09:34:18     '  SOLINK_MODULE(target) Release/obj.target/binding.node\n' +
09:34:18     "gmake[1]: Leaving directory `/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/addons/async-hooks-id/build'\n",
09:34:18   stderr: 'g++: error: vfork: Resource temporarily unavailable\n' +
09:34:18     'collect2: fatal error: gcc returned 1 exit status\n' +
09:34:18     'compilation terminated.\n' +
09:34:18     'gmake[1]: *** [Release/obj.target/binding.node] Error 1\n'
09:34:18 }
09:34:18 gmake: *** [test/addons/.buildstamp] Error 1

This new error looks like something we saw in the CITGM job (nodejs/build#1908 (comment)) where parallelism based on the number of CPU's (20 for our current AIX CI hosts) was causing resource issues.

const parallelization = +process.env.JOBS || require('os').cpus().length;

I've modified the stress job to export JOBS=5 on AIX (as we do in the https://ci.nodejs.org/job/node-test-commit-aix/ job) and rerun: https://ci.nodejs.org/job/node-stress-single-test/nodes=aix61-ppc64/20/

No failures in 100 runs. I don't think we've seen this fail recently, so maybe close and reopen if it reoccurs?

There's some follow up work to set the JOBS environment variable at the AIX CI host level (rather than in the individual jobs) but that should be handled separately from this issue.

cc FYI @nodejs/build @nodejs/platform-aix

@Trott
Copy link
Member Author

Trott commented Nov 27, 2019

No failures in 100 runs. I don't think we've seen this fail recently, so maybe close and reopen if it reoccurs?

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aix Issues and PRs related to the AIX platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

No branches or pull requests

2 participants