Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dind deployment flake: Timeout waiting for 3 nodes to report readiness #11315

Closed
0xmichalis opened this issue Oct 11, 2016 · 10 comments
Closed
Assignees
Labels
area/tests component/networking kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2

Comments

@0xmichalis
Copy link
Contributor

0xmichalis commented Oct 11, 2016

Any dind deployment failures in ci will result in an error message similar to the following:

............................................................
[ERROR] Timeout waiting for 3 nodes to report readiness
[ERROR] PID 30612: hack/dind-cluster.sh:45: `return 1` exited with status 1.
[INFO]      Stack Trace: 
[INFO]        1: hack/dind-cluster.sh:45: `return 1`
[INFO]        2: hack/dind-cluster.sh:114: wait-for-cluster
[INFO]        3: hack/dind-cluster.sh:344: start
[INFO]   Exiting with code 1.
[INFO] Saving cluster configuration
[ERROR] Failed to deploy cluster for plugin: {multitenant}

Diagnosing the problem will require looking at the dind cluster artifacts saved by the job.

@0xmichalis 0xmichalis added component/networking area/tests kind/test-flake Categorizes issue or PR as related to test flakes. labels Oct 11, 2016
@marun marun assigned marun and unassigned knobunc Oct 11, 2016
@marun marun changed the title [ERROR] Failed to deploy cluster for plugin: {multitenant} dind deployment flake: Failed to deploy cluster for plugin Oct 11, 2016
@marun marun changed the title dind deployment flake: Failed to deploy cluster for plugin dind deployment flake: Timeout waiting for 3 nodes to report readiness Oct 11, 2016
@0xmichalis
Copy link
Contributor Author

@0xmichalis 0xmichalis reopened this Jul 10, 2017
@marun marun assigned knobunc and unassigned marun Sep 7, 2017
@tnozicka
Copy link
Contributor

tnozicka commented Oct 5, 2017

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/14910/test_pull_request_origin_extended_networking_minimal/8761/

Stopping dind cluster 'nettest'
Starting dind cluster 'nettest' with plugin 'redhat/openshift-ovs-multitenant' and runtime 'dockershim'
Waiting for ok
............................................................................
Done
Waiting for 3 nodes to report readiness
........................................................................................................................
[ERROR] Timeout waiting for 3 nodes to report readiness
[ERROR] PID 31630: hack/dind-cluster.sh:41: `return 1` exited with status 1.
... skipping 7 lines ...
[ERROR] Failed to deploy cluster for plugin: {multitenant}
... skipping 10712 lines ...
[ERROR] 1 plugin(s) failed one or more tests
[ERROR] PID 29201: test/extended/networking-minimal.sh:6: `NETWORKING_E2E_MINIMAL=1 "${OS_ROOT}/test/extended/networking.sh"` exited with status 1.
... skipping 7 lines ...
########## FINISHED STAGE: FAILURE: RUN EXTENDED TESTS [00h 08m 10s] ##########

@sosiouxme
Copy link
Member

@gabemontero
Copy link
Contributor

gabemontero commented Feb 22, 2018

@danwinship danwinship assigned danwinship and unassigned knobunc Feb 27, 2018
@danwinship
Copy link
Contributor

(The recent burst of failures was caused by the CA serial number bug that was fixed by #18713)

@danwinship
Copy link
Contributor

Actually, the recent (and still continuing) burst of failures is caused by an actual docker-in-docker setup bug revealed by the new CA-serial-number-handling code (#18765), not related to #18713.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 28, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 27, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests component/networking kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2
Projects
None yet
Development

No branches or pull requests