Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: guard against undefined message handlers #6902

Merged
merged 1 commit into from
May 25, 2016

Conversation

cjihrig
Copy link
Contributor

@cjihrig cjihrig commented May 20, 2016

Checklist
  • tests and code linting passes
  • the commit message follows commit guidelines
Affected core subsystem(s)

cluster

Description of change

cluster's internal message handling includes a cache of callback functions. Once the message for that callback is received, it is removed from the cache. If, for any reason, the same message ID is processed twice, the callback will be missing from the cache and cluster will try to call undefined as a function. This commit guards against this scenario.

Fixes #6561

@nodejs-github-bot nodejs-github-bot added the cluster Issues and PRs related to the cluster subsystem. label May 20, 2016
@santigimeno
Copy link
Member

What about adding a test based on the code from #6561 (comment)?

@cjihrig
Copy link
Contributor Author

cjihrig commented May 20, 2016

@santigimeno I could definitely do that. I wasn't sure if relying on undefined behavior (doing async things in the uncaughtException handler) in our tests was a great idea or not.

@santigimeno
Copy link
Member

@cjihrig I understand. You're probably right.
As you explain, it looks like the worker is sending the same message twice. Shouldn't that be fixed as well?

@santigimeno
Copy link
Member

Sorry, I meant, the worker is receiving the same message twice:

{"cmd":"NODE_CLUSTER","data":null,"ack":1,"key":":3000:4::0","errno":-48,"seq":0}

@santigimeno
Copy link
Member

@cjihrig I've tried a different fix: santigimeno@7dae5d7. Not too sure about it. What do you think?

@cjihrig
Copy link
Contributor Author

cjihrig commented May 20, 2016

@santigimeno I was just commenting. The problem, as your solution shows, is that the IPC channel doesn't get a chance to consume anything. I found the problem by adding a try...catch here. I'm trying to see what the best way forward would be. I think the changes from this PR are still relevant though.

@santigimeno
Copy link
Member

I think the changes from this PR are still relevant though.

Agreed

Something interesting: changing the scheduling policy to SCHED_NONE causes a different kind of error:

cluster.js:726
      handle.owner.close(checkWaitingCount);
                  ^

TypeError: Cannot read property 'close' of null
    at Worker._disconnect (cluster.js:726:19)
    at Worker.disconnect (cluster.js:686:17)
    at process.on.e (/Users/sgimeno/node/node/test_cluser.js:14:22)
    at emitOne (events.js:96:13)
    at process.emit (events.js:188:7)
    at process._fatalException (node.js:272:26)

@cjihrig
Copy link
Contributor Author

cjihrig commented May 20, 2016

@santigimeno I opened #6909 as an alternative/complementary idea for this. I solved it in internal/child_process instead of cluster, as I think the problem is not specific to cluster, that's just where we observed it.

@cjihrig cjihrig force-pushed the 6561 branch 2 times, most recently from c6c4c2a to 4a64666 Compare May 23, 2016 14:18
@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

@santigimeno I came up with a test for this one. It fails for me before this change, and passes after. PTAL

@santigimeno
Copy link
Member

@cjihrig I wonder if it's a good idea having this change as it may mask other hidden issues in the code (as #6909).

Anyway, the test itself LGTM

@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

it may mask other hidden issues in the code

I definitely see where you're coming from with that comment, but I also think it would be good to make the code a little more robust. @bnoordhuis thoughts?

@bnoordhuis
Copy link
Member

The change looks alright to me. LGTM.

@cjihrig
Copy link
Contributor Author

cjihrig commented May 25, 2016

@cjihrig
Copy link
Contributor Author

cjihrig commented May 25, 2016

CI was green minus what appears to be a Jenkins hiccup on ARM.

cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: nodejs#6561
PR-URL: nodejs#6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@cjihrig cjihrig merged commit 33c7b45 into nodejs:master May 25, 2016
@cjihrig cjihrig deleted the 6561 branch May 25, 2016 16:08
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request May 30, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: nodejs#6561
PR-URL: nodejs#6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
rvagg pushed a commit that referenced this pull request Jun 2, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@MylesBorins
Copy link
Contributor

@cjihrig lts?

@cjihrig
Copy link
Contributor Author

cjihrig commented Jul 5, 2016

Yes, please.

MylesBorins pushed a commit that referenced this pull request Jul 11, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Jul 12, 2016
MylesBorins pushed a commit that referenced this pull request Jul 14, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 14, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants