-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-child-process-fork-regr-gh-2847 - fails on pLinux BE and AIX #3245
Comments
@mhdawson you've probably already checked -- but in case you didn't, have a look and see if there are trailing processes from prior runs. It's the most common cause of ECONNREFUSED. |
@gireeshpunathil you'd see both; one being workers not being able to connect and one being master not being able to spawn. Not saying this is the issue here though. |
@jbergstroem, I believe trailing processes from prior runs would case EADDRINUSE, not ECONNREFUSED. The scenario this test case covers is a cluster topology, wherein the worker exits on the first message from the master, and the master, sends a dummy message to the worker after connectivity is established with a server, which is done 100 times back-to-back. The intention is to make sure that the queued up data in the channel causes the process.disconnect() to wait until they are flushed. I believe there are two issues with the test case:
#Error: channel closed 2.It assumes that when the first message is received in the worker and subsequently it's close callback is invoked in the master, next message is already dispatched by the master. If this does not happen due to timing differences, the server gets closed, and net.connect() will fail, as the server is already closed: Error: connect ECONNREFUSED 127.0.0.1:12346 |
@jbergstroem, thanks for the quick reply. I confirmed that is not the case here with a clean run, that too with new, free port number. Also, with a bit of timing change (changing the for loop to a setInterval or delaying the second worker.send call etc.) I am able to reproduce these two errors in Linux IA32 as well. |
Gireesh could you add an update of your investigation so far |
Further debugging revealed the following:
I propose these changes to the test:
Please let me know about this approach. If agree, I will come up with a PR. |
It seems to fail on PPC BE consistently in the CI https://ci.nodejs.org/job/node-test-commit-plinux/17/nodes=ppcbe-fedora20/console not ok 45 test-child-process-fork-regr-gh-2847.js #events.js:141 # throw er; // Unhandled 'error' event # ^ # #Error: connect ECONNREFUSED 127.0.0.1:12346 # at Object.exports._errnoException (util.js:889:11) # at exports._exceptionWithHostPort (util.js:912:20) # at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1063:14) --- duration_ms: 2.712 |
The suggested approach seems reasonable. Validating that at least some number of request get through will catch the case when we might otherwise ignore real errors. Gireesh if you can put together a pull request or even just a branch that includes your changes we can start node-test-commit-plinux pointed at that in order to see if it resolves the problems seen on PPC in the CI runs |
@mhdawson , thanks. Here is the test in a branch with the proposed modification for your validation in CI. A pull request is on its way. |
CI run here L https://ci.nodejs.org/job/node-test-commit/883/ Shows that failure is resolved for PPC, there are some issues with formatting reported by the linter though. You should be able to catch these with a complete make test run and fix up when putting together the PR The other CI failures look unrelated to the test change to me. |
Since Gireesh is on vacation this week went ahead and resolved lint issues and created PR #3459 |
PR closed, closing |
https://ci.nodejs.org/job/iojs+pr+ppc/nodes=ppcbe-fedora20/21/console
Failure on pLinux in CI https://ci.nodejs.org/job/iojs+pr+ppc/nodes=ppcbe-fedora20/21/console:
I see the AIX failures in our internal builds but not the one on pLinux
The text was updated successfully, but these errors were encountered: