Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure nodes [localhost] nickhammond.logrotate : nickhammond.logrotate | Setup logrotate.d scripts #18527

Closed
aveshagarwal opened this issue Feb 8, 2018 · 11 comments
Assignees
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 sig/master sig/pod

Comments

@aveshagarwal
Copy link
Contributor

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18475/test_pull_request_origin_extended_conformance_install/7120/

 [WARNING]: Could not create retry file '/usr/share/ansible/openshift-
ansible/playbooks/deploy_cluster.retry'.         [Errno 13] Permission denied:
u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'

PLAY RECAP *********************************************************************
localhost                  : ok=399  changed=149  unreachable=0    failed=1   


INSTALLER STATUS ***************************************************************
Initialization             : Complete (0:00:16)
Health Check               : Complete (0:00:22)
etcd Install               : Complete (0:00:39)
Master Install             : Complete (0:02:00)
Master Additional Install  : Complete (0:00:27)
Node Install               : In Progress (0:03:22)
	This phase can be restarted by running: playbooks/openshift-node/config.yml



Failure summary:


  1. Hosts:    localhost
     Play:     Configure nodes
     Task:     restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
               
++ export status=FAILURE
++ status=FAILURE
+ set +o xtrace
########## FINISHED STAGE: FAILURE: INSTALL ORIGIN [00h 07m 15s] ##########

Node logs:

Feb 08 17:01:41 ip-172-18-6-38.ec2.internal origin-node[28890]: F0208 17:01:41.141377   28890 network.go:46] SDN node startup failed: failed to validate network configuration: master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start
Feb 08 17:01:41 ip-172-18-6-38.ec2.internal systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Feb 08 17:01:41 ip-172-18-6-38.ec2.internal systemd[1]: Failed to start OpenShift Node.
Feb 08 17:01:41 ip-172-18-6-38.ec2.internal systemd[1]: Unit origin-node.service entered failed state.
Feb 08 17:01:41 ip-172-18-6-38.ec2.internal systemd[1]: origin-node.service failed.

@sdodson

@sdodson
Copy link
Member

sdodson commented Feb 8, 2018

Need @openshift/networking to check why the sdn registration failed.

@jwforres
Copy link
Member

jwforres commented Feb 8, 2018

@openshift/sig-networking

@jwforres jwforres added the kind/test-flake Categorizes issue or PR as related to test flakes. label Feb 8, 2018
@aveshagarwal
Copy link
Contributor Author

This has been failing so many times that this PR #18475 (comment) seems to just stuck.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18475/test_pull_request_origin_extended_conformance_install/7162/

Feb 09 03:28:47 ip-172-18-5-65.ec2.internal origin-node[29111]: I0209 03:28:47.204835   29111 manager.go:174] CRI-O not connected: Get http://%2Fvar%2Frun%2Fcrio%2Fcrio.sock/info: dial unix /var/run/crio/crio.sock: connect: no such file or directory
Feb 09 03:28:47 ip-172-18-5-65.ec2.internal origin-node[29111]: F0209 03:28:47.217138   29111 network.go:46] SDN node startup failed: failed to validate network configuration: master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start
Feb 09 03:28:47 ip-172-18-5-65.ec2.internal systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Feb 09 03:28:47 ip-172-18-5-65.ec2.internal systemd[1]: Failed to start OpenShift Node.
Feb 09 03:28:47 ip-172-18-5-65.ec2.internal systemd[1]: Unit origin-node.service entered failed state.
Feb 09 03:28:47 ip-172-18-5-65.ec2.internal systemd[1]: origin-node.service failed.

@danwinship
Copy link
Contributor

The node is failing to start because the master is failing to start. origin-master-controllers.service is looping over and over again failing with:

Feb 08 16:57:44 ip-172-18-6-38.ec2.internal origin-master-controllers[21128]: F0208 16:57:44.937222   21128 plugins.go:234] Invalid configuration: Predicate type not found for NoVolumeNodeConflict

@aveshagarwal
Copy link
Contributor Author

NoVolumeNodeConflict has been removed so should not be used.

@jwforres
Copy link
Member

@openshift/sig-master

@deads2k
Copy link
Contributor

deads2k commented Feb 22, 2018

@openshift/sig-master

@openshift/sig-pod
/assign @aveshagarwal
@jwforres I think I saw @aveshagarwal link an ansible pull somewhere.

@deads2k
Copy link
Contributor

deads2k commented Feb 22, 2018

@openshift/sig-master

@jwforres oh, also, pretty sure that's in the scheduler

@aveshagarwal
Copy link
Contributor Author

@sjenning
Copy link
Contributor

this is fixed

@jboyd01
Copy link
Contributor

jboyd01 commented Mar 28, 2018

test_pull_request_origin_extended_conformance_install seems to be hitting this issue with #19117 repeatedly even though this was closed as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 sig/master sig/pod
Projects
None yet
Development

No branches or pull requests

10 participants