Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-inspector-port-zero-cluster #13343

Closed
Trott opened this issue May 31, 2017 · 21 comments · Fixed by #13373
Closed

Investigate flaky test-inspector-port-zero-cluster #13343

Trott opened this issue May 31, 2017 · 21 comments · Fixed by #13373
Labels
inspector Issues and PRs related to the V8 inspector protocol test Issues and PRs related to the tests.

Comments

@Trott
Copy link
Member

Trott commented May 31, 2017

  • Version: v9.0.0-pre
  • Platform: fedora24
  • Subsystem: test,inspector

https://ci.nodejs.org/job/node-test-commit-linux/10262/nodes=fedora24/console

not ok 1411 inspector/test-inspector-port-zero-cluster
  ---
  duration_ms: 0.265
  severity: fail
  stack: |-
    Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora24/test/common/index.js:483:10)
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora24/test/inspector/test-inspector-port-zero-cluster.js:13:33)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:158:16)
    Debugger listening on ws://127.0.0.1:32831/7ed5db69-ee20-4f65-951b-1e3aa9097229
    For help see https://nodejs.org/en/docs/inspector
    Debugger listening on ws://127.0.0.1:32833/d212c3cd-5602-4f5d-a22b-19f8e63e1d7a
    For help see https://nodejs.org/en/docs/inspector
    Starting inspector on 127.0.0.1:32832 failed: address already in use
    Debugger listening on ws://127.0.0.1:32834/cad0eb49-5a4d-463f-90bc-56fe914ad0b3
    For help see https://nodejs.org/en/docs/inspector
@Trott
Copy link
Member Author

Trott commented May 31, 2017

/cc @bnoordhuis

@Trott Trott added test Issues and PRs related to the tests. inspector Issues and PRs related to the V8 inspector protocol labels May 31, 2017
@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

Happened again, this time on ubuntu1604_docker_alpine34-64.

https://ci.nodejs.org/job/node-test-commit-linux/10277/nodes=ubuntu1604_docker_alpine34-64/console

not ok 1413 inspector/test-inspector-port-zero-cluster
  ---
  duration_ms: 0.329
  severity: fail
  stack: |-
    Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604_docker_alpine34-64/test/common/index.js:483:10)
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604_docker_alpine34-64/test/inspector/test-inspector-port-zero-cluster.js:13:33)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:158:16)
    Debugger listening on ws://127.0.0.1:45189/8f57c8a5-a17f-4f7e-809b-1f1672a54ae3
    For help see https://nodejs.org/en/docs/inspector
    Debugger listening on ws://127.0.0.1:45191/38e5e2f4-cadb-4c0a-a1fd-af925e5e7e36
    For help see https://nodejs.org/en/docs/inspector
    Starting inspector on 127.0.0.1:45190 failed: address already in use
    Debugger listening on ws://127.0.0.1:45192/c595ccac-dbea-4443-8c2b-8cd003d52a9a
    For help see https://nodejs.org/en/docs/inspector

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

In both cases, one worker fails because the port is already in use:

    Starting inspector on 127.0.0.1:32832 failed: address already in use
    Starting inspector on 127.0.0.1:45190 failed: address already in use

It looks like this is not guaranteeing an available port but perhaps just using sequential ports?

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

/cc @cjihrig @sam-github @refack

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

This failure is replicable by running tools/test.py --repeat 10000 test/inspector/test-inspector-port-zero-cluster.js in two separate shells at once. Tests aren't necessarily expected to be multi-process safe, but in this case, it seems like they ought not interfere with each other, but they do. And the resulting error is similar to be what we're seeing in CI with a port in use and therefore the callback not firing for one of the workers:

$ tools/test.py --repeat 10000 test/inspector/test-inspector-port-zero-cluster.js 
=== release test-inspector-port-zero-cluster ===                    
Path: inspector/test-inspector-port-zero-cluster
Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
    at Object.exports.mustCall (/Users/trott/io.js/test/common/index.js:483:10)
    at Object.<anonymous> (/Users/trott/io.js/test/inspector/test-inspector-port-zero-cluster.js:13:33)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at Function.Module.runMain (module.js:605:10)
    at startup (bootstrap_node.js:158:16)
Debugger listening on ws://127.0.0.1:59247/6206a5c6-560a-4d48-85ce-86da803afe59
For help see https://nodejs.org/en/docs/inspector
Starting inspector on 127.0.0.1:59248 failed: address already in use
Debugger listening on ws://127.0.0.1:59249/75356721-69d4-47df-bdb9-a856946f42a1
For help see https://nodejs.org/en/docs/inspector
Debugger listening on ws://127.0.0.1:59250/8c1ec737-202f-4dd7-9fa8-9765fffed167
For help see https://nodejs.org/en/docs/inspector
Command: out/Release/node --inspect=0 /Users/trott/io.js/test/inspector/test-inspector-port-zero-cluster.js
[snip]

@refack
Copy link
Contributor

refack commented Jun 1, 2017

@Trott for context, is the /inspector/ suite run sequentially?
Ref: #12941
Any way I'll try to make the worker creation more synchronous?

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

@refack Yes, AFAICT, the inspector suite is run sequentially.

@refack
Copy link
Contributor

refack commented Jun 1, 2017

Tests aren't necessarily expected to be multi-process safe

Than that's a new "known-issue" since cluster has always set sequential debug ports for it's children (initial port=0 or not)
https://github.com/nodejs/node/blob/master/lib/internal/cluster/master.js#L113

My assumption in #13373 is that there might be an test-internal race.

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

cluster has always set sequential debug ports for it's children

Ooof. That seems like a bug to me, albeit one that would cause problems only infrequently and one that might be very tricky to fix.

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

OK, so to summarize my understanding thus far:

  • The test is probably not internally safe. Workers can both claim the same port. That needs to be fixed in the test itself and @refack is on the case in test: fix test-inspector-port-zero-cluster #13373. (EDIT: Actually, I'm not sure this is the problem because the CI output above does not show two workers grabbing the same port. Maybe it's conflicting with a pre-existing process on the host? In which case, I'm not sure there's anything we can do about it in the test itself?)

  • Incrementing port numbers this way is probably not ideal, but it's not clear that there's a better solution. Perhaps there's a way to use port 0 in lib/internal/cluster/master.js and have everything work in harmony, but there might be all sorts of problems that arise in trying to implement that because we're going from something deterministic and synchronous to something non-deterministic and asynchronous. But it might be A Good Thing to do because...

  • ...it seems likely that the sort of bug we're seeing in the test here probably occurs in the real world. We probably don't hear about it because it causes problems only intermittently.

Sound about right?

@refack
Copy link
Contributor

refack commented Jun 1, 2017

There is discussion in #12941 about how to open debugPort allocation to user customization.

For some use-cases all 0 will work, some need all workers to always use the same predetermined port.

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

Perhaps use 0 if the cluster master is called with 0, otherwise use predetermined ports so as to be backwards-compatible?

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

Failed on linux-fips now too.

https://ci.nodejs.org/job/node-test-commit-linux-fips/8796/nodes=ubuntu1404-64/console

not ok 1413 inspector/test-inspector-port-zero-cluster
  ---
  duration_ms: 0.513
  severity: fail
  stack: |-
    Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux-fips/nodes/ubuntu1404-64/test/common/index.js:483:10)
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-fips/nodes/ubuntu1404-64/test/inspector/test-inspector-port-zero-cluster.js:13:33)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:158:16)
    Debugger listening on ws://127.0.0.1:59463/8b9592c9-26b4-4b0a-bb00-380cc41c308e
    For help see https://nodejs.org/en/docs/inspector
    Debugger listening on ws://127.0.0.1:59464/1f88ca44-cbd7-4433-89b0-d36c34b0b4d5
    For help see https://nodejs.org/en/docs/inspector
    Debugger listening on ws://127.0.0.1:59465/55d232eb-bac6-4d3a-9f6a-9eb7d6251cf7
    For help see https://nodejs.org/en/docs/inspector
    Starting inspector on 127.0.0.1:59466 failed: address already in use

@Trott
Copy link
Member Author

Trott commented Jun 1, 2017

@nodejs/testing @mcollina

@refack
Copy link
Contributor

refack commented Jun 1, 2017

NOOOOO my favorite linuxone:

1413	inspector/test-inspector-port-zero-cluster	
duration_ms	0.135
severity	fail
stack	
Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
    at Object.exports.mustCall (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/common/index.js:483:10)
    at Object.<anonymous> (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/inspector/test-inspector-port-zero-cluster.js:13:33)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at Function.Module.runMain (module.js:605:10)
    at startup (bootstrap_node.js:158:16)
Debugger listening on ws://127.0.0.1:54152/cb7e8873-8d49-4edd-85d1-dd22a1a787f1
For help see https://nodejs.org/en/docs/inspector
Debugger listening on ws://127.0.0.1:54153/49cb3db2-cf0a-4ba0-b80b-41811fd63ad2
For help see https://nodejs.org/en/docs/inspector
Starting inspector on 127.0.0.1:54154 failed: address already in use
Debugger listening on ws://127.0.0.1:54155/ab76c17a-25a0-4c3a-9bd9-7b707623a7d8
For help see https://nodejs.org/en/docs/inspector

https://ci.nodejs.org/job/node-test-commit-linuxone/6296/nodes=rhel72-s390x/

@mcollina
Copy link
Member

mcollina commented Jun 2, 2017

I'm a bit lost, is this fixed by #13375?

@Trott
Copy link
Member Author

Trott commented Jun 2, 2017

@mcollina No, that just marked the test as flaky so CI will be yellow instead of red when this test fails.

@bnoordhuis
Copy link
Member

bnoordhuis commented Jun 3, 2017

cluster has always set sequential debug ports for it's children

Ooof. That seems like a bug to me, albeit one that would cause problems only infrequently and one that might be very tricky to fix.

No, it's a feature and one that should be preserved.

If the flakiness is caused by concurrent test runs biting each other, we need to figure out a way to make it run exclusively (moving to test/sequential won't be enough) or accept a certain amount of flakiness.

@Trott
Copy link
Member Author

Trott commented Jun 3, 2017

No, it's a feature and one that should be preserved.

Does it make sense to preserve that behavior when a port is specified or the default port is used, but not when started with the port 0 option?

If that makes sense, then lets do that.

If that doesn't make sense, then the test needs to be changed to allow for unpredictable port collisions.

@bnoordhuis
Copy link
Member

Does it make sense to preserve that behavior when a port is specified or the default port is used, but not when started with the port 0 option?

I'm personally of the opinion that having predictable port numbers is an important feature for ease-of-use in debugging.

I don't expect that users are going to run into port conflicts often, that's just something we'll have to deal with in our test suite.

Trott added a commit to Trott/io.js that referenced this issue Jun 16, 2017
With a properly functioning test, it is possible for a cluster worker to
fail to launch due to a port collision. For better or for worse, this is
working as expected and so the test now accommodates that reality.

Fixes: nodejs#13343
@Trott
Copy link
Member Author

Trott commented Jun 16, 2017

Proposed fix: #13711

refack added a commit to refack/node that referenced this issue Jun 16, 2017
* re-implemented test to parse args instead of post binding (exit 12)
* saved failing case as known issue

PR-URL: nodejs#13373
Fixes: nodejs#13343
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Jun 17, 2017
* re-implemented test to parse args instead of post binding (exit 12)
* saved failing case as known issue

PR-URL: #13373
Fixes: #13343
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Jun 21, 2017
* re-implemented test to parse args instead of post binding (exit 12)
* saved failing case as known issue

PR-URL: #13373
Fixes: #13343
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inspector Issues and PRs related to the V8 inspector protocol test Issues and PRs related to the tests.
Projects
None yet
4 participants