Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: investigate child-process-pass-fd #15656

Closed
BridgeAR opened this issue Sep 28, 2017 · 16 comments
Closed

test: investigate child-process-pass-fd #15656

BridgeAR opened this issue Sep 28, 2017 · 16 comments
Labels
aix Issues and PRs related to the AIX platform. child_process Issues and PRs related to the child_process subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests.

Comments

@BridgeAR
Copy link
Member

This failed on a recent CI run.

https://ci.nodejs.org/job/node-test-commit-aix/8817/nodes=aix61-ppc64/console

not ok 1753 sequential/test-child-process-pass-fd
  ---
  duration_ms: 2.0
  severity: fail
  stack: |-
    events.js:182
          throw er; // Unhandled 'error' event
          ^
    
    Error: spawn /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node EAGAIN
        at _errnoException (util.js:1018:13)
        at Process.ChildProcess._handle.onexit (internal/child_process.js:202:19)
        at onErrorNT (internal/child_process.js:390:16)
        at _combinedTickCallback (internal/process/next_tick.js:138:11)
        at process._tickCallback (internal/process/next_tick.js:180:9)
        at Function.Module.runMain (module.js:643:11)
        at startup (bootstrap_node.js:187:16)
        at bootstrap_node.js:608:3
  ...
@BridgeAR BridgeAR added aix Issues and PRs related to the AIX platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests. labels Sep 28, 2017
@mhdawson
Copy link
Member

I think I've see if fail a few times today

@mhdawson
Copy link
Member

I don't see any changes to the test recently.

@mhdawson
Copy link
Member

@mhdawson
Copy link
Member

@nodejs/platform-aix

@mscdex mscdex added the child_process Issues and PRs related to the child_process subsystem. label Sep 28, 2017
@richardlau
Copy link
Member

Stress test (with corrected RUN_TESTS parameter): https://ci.nodejs.org/job/node-stress-single-test/1425/

@mhdawson
Copy link
Member

Even though stress test passed, still seeing failures: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/8867/console

just this morning.

@BridgeAR
Copy link
Member Author

BridgeAR commented Oct 1, 2017

This is the test that I see failing most often currently. It would really be great to get this fixed.

Ping @nodejs/testing @nodejs/platform-aix

@Trott
Copy link
Member

Trott commented Oct 1, 2017

Stress test with 10K runs rater than 200: https://ci.nodejs.org/job/node-stress-single-test/1431/

@Trott
Copy link
Member

Trott commented Oct 1, 2017

This test does spawn 80 copies of itself, so perhaps an EAGAIN is something it should reasonably accommodate (either by ignoring or by retrying).

Pinging @santigimeno for an opinion on that approach as they wrote the test...

@Trott
Copy link
Member

Trott commented Oct 2, 2017

Build failure on the stress test. Let's try agian: https://ci.nodejs.org/job/node-stress-single-test/1432/nodes=aix61-ppc64/console

EDIT: Hmm, I guess this 10K tests will take more than a day to complete. Still, would like to let it run for a few hours at least...

EDIT AGAIN: ...and the weekend is probably the best time for that so....

@Trott
Copy link
Member

Trott commented Oct 2, 2017

844 successful runs and no failures. I'm going to cancel the stress test.

@Trott
Copy link
Member

Trott commented Oct 2, 2017

Small sample size, but I suspect the issue may be host-specific.

The two failures documented in this issue both were on test-osuosl-aix61-ppc64_be-2. The two successful stress tests were both on test-osuosl-aix61-ppc64_be-1.

Perhaps not coincidentally, test-osuosl-aix61-ppc64_be-2 is exhibiting more extensive problems right now. https://ci.nodejs.org/computer/test-osuosl-aix61-ppc64_be-2/builds is all red right now. It's not even building, erroring out with this:

Started by upstream project "node-test-commit-aix" build number 8918
originally caused by:
 Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on test-osuosl-aix61-ppc64_be-2 (aix61-ppc64) in workspace /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64
Cloning the remote Git repository
Cloning repository https://github.com/nodejs/node.git
 > git init /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64 # timeout=30
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:717)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:511)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
	at hudson.remoting.UserRequest.perform(UserRequest.java:152)
	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
	at hudson.remoting.Request$2.run(Request.java:332)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
	at hudson.remoting.Engine$1$1.run(Engine.java:85)
	at java.lang.Thread.run(Thread.java:809)
	at ......remote call to Channel to /140.211.9.100(Native Method)
	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545)
	at hudson.remoting.UserResponse.retrieve(UserRequest.java:253)
	at hudson.remoting.Channel.call(Channel.java:830)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
	at sun.reflect.GeneratedMethodAccessor539.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
	at com.sun.proxy.$Proxy79.execute(Unknown Source)
	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1075)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1115)
	at hudson.scm.SCM.checkout(SCM.java:496)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1281)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
	at hudson.model.Run.execute(Run.java:1728)
	at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:405)
Caused by: hudson.plugins.git.GitException: Error performing command: git init /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1931)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1892)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1888)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1533)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:715)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:511)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
	at hudson.remoting.UserRequest.perform(UserRequest.java:152)
	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
	at hudson.remoting.Request$2.run(Request.java:332)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
	at hudson.remoting.Engine$1$1.run(Engine.java:85)
	at java.lang.Thread.run(Thread.java:809)
Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64"): error=11, Resource temporarily unavailable
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1059)
	at hudson.Proc$LocalProc.<init>(Proc.java:245)
	at hudson.Proc$LocalProc.<init>(Proc.java:214)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:846)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1920)
	... 16 more
Caused by: java.io.IOException: error=11, Resource temporarily unavailable
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:212)
	at java.lang.ProcessImpl.start(ProcessImpl.java:164)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1040)
	... 21 more
ERROR: Error cloning remote repo 'origin'
Run condition [Always] enabling perform for step [[]]
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: *.tap
Did not find any matching files. Setting build result to FAILURE.
Checking ^not ok
Jenkins Text Finder: File set '*.tap' is empty
Sending e-mails to: michael_dawson@ca.ibm.com gib@uk.ibm.com Bethany.Griggs@uk.ibm.com
Notifying upstream projects of job completion
Finished: FAILURE

With any luck, re-imaging the host or whatever will make this all go away.

In the meantime, I'm going to take the node out of Jenkins.

I'll open an issue in the build repo, but /cc @nodejs/build in case there's more insight about the specific test.

@mhdawson
Copy link
Member

mhdawson commented Oct 2, 2017

I was going to check be-1 but I see that there is a stress test running there and it seems to be passing ok:

https://ci.nodejs.org/job/node-stress-single-test/nodes=aix61-ppc64/1433/console

1247 successful runs so far ...

@mhdawson
Copy link
Member

mhdawson commented Oct 2, 2017

and this was opened/handled to address the failure to build problem that was seen earlier: nodejs/build#900 (comment)

@apapirovski
Copy link
Member

@Trott @mhdawson any status updates on this? It seems like this should maybe be closed now?

@Trott
Copy link
Member

Trott commented Apr 12, 2018

Seems likely that it was fixed either via infra fixes or else in d64b0a8. Haven't noticed it in a long time and can't find it in CI. Closing. Feel free to re-open if you think I'm wrong.

@Trott Trott closed this as completed Apr 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aix Issues and PRs related to the AIX platform. child_process Issues and PRs related to the child_process subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

6 participants