Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cmd/openshift-install/analyze: Attempt to analyze bootstrap tarballs
Instead of just dropping them into the users lap "here's a big tarball, have fun", look through them for obvious things that we can summarize. With: func runGatherBootstrapCmd(directory string) error { + return analyzeGatheredBootstrap("/tmp/log-bundle.tar.gz") to feed [1] into the analysis logic, the output looks like: WARNING control-plane/10.0.134.229 had failing systemd units: crio.service WARNING control-plane/10.0.134.229: crio.service: ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-default-env.conf Active: failed (Result: exit-code) since Thu 2019-10-24 11:11:31 UTC; 320ms ago Docs: https://github.com/cri-o/cri-o Process: 8491 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE) Main PID: 8491 (code=exited, status=1/FAILURE) CPU: 144ms Oct 24 11:11:31 ip-10-0-134-229 systemd[1]: Starting Open Container Initiative Daemon... Oct 24 11:11:31 ip-10-0-134-229 crio[8491]: time="2019-10-24 11:11:31.895986612Z" level=fatal msg="opening seccomp profile (/etc/crio/seccomp.json) failed: open /etc/crio/seccomp.json: no such file or directory" Oct 24 11:11:31 ip-10-0-134-229 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE Oct 24 11:11:31 ip-10-0-134-229 systemd[1]: crio.service: Failed with result 'exit-code'. Oct 24 11:11:31 ip-10-0-134-229 systemd[1]: Failed to start Open Container Initiative Daemon. Oct 24 11:11:31 ip-10-0-134-229 systemd[1]: crio.service: Consumed 144ms CPU time WARNING control-plane/10.0.134.243 had failing systemd units: crio.service WARNING control-plane/10.0.134.243: crio.service: ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-default-env.conf Active: failed (Result: exit-code) since Thu 2019-10-24 11:11:35 UTC; 8s ago Docs: https://github.com/cri-o/cri-o Process: 8439 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE) Main PID: 8439 (code=exited, status=1/FAILURE) CPU: 151ms Oct 24 11:11:35 ip-10-0-134-243 systemd[1]: Starting Open Container Initiative Daemon... Oct 24 11:11:35 ip-10-0-134-243 crio[8439]: time="2019-10-24 11:11:35.238163016Z" level=fatal msg="opening seccomp profile (/etc/crio/seccomp.json) failed: open /etc/crio/seccomp.json: no such file or directory" Oct 24 11:11:35 ip-10-0-134-243 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE Oct 24 11:11:35 ip-10-0-134-243 systemd[1]: crio.service: Failed with result 'exit-code'. Oct 24 11:11:35 ip-10-0-134-243 systemd[1]: Failed to start Open Container Initiative Daemon. Oct 24 11:11:35 ip-10-0-134-243 systemd[1]: crio.service: Consumed 151ms CPU time WARNING control-plane/10.0.157.61 had failing systemd units: crio.service WARNING control-plane/10.0.157.61: crio.service: ● crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-default-env.conf Active: failed (Result: exit-code) since Thu 2019-10-24 11:11:36 UTC; 1s ago Docs: https://github.com/cri-o/cri-o Process: 8379 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE) Main PID: 8379 (code=exited, status=1/FAILURE) CPU: 158ms Oct 24 11:11:36 ip-10-0-157-61 systemd[1]: Starting Open Container Initiative Daemon... Oct 24 11:11:36 ip-10-0-157-61 crio[8379]: time="2019-10-24 11:11:36.807284677Z" level=fatal msg="opening seccomp profile (/etc/crio/seccomp.json) failed: open /etc/crio/seccomp.json: no such file or directory" Oct 24 11:11:36 ip-10-0-157-61 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE Oct 24 11:11:36 ip-10-0-157-61 systemd[1]: crio.service: Failed with result 'exit-code'. Oct 24 11:11:36 ip-10-0-157-61 systemd[1]: Failed to start Open Container Initiative Daemon. Oct 24 11:11:36 ip-10-0-157-61 systemd[1]: crio.service: Consumed 158ms CPU time That's maybe a bit noisy, but mostly because all three control-plane machines failed the same way. Journal field docs are linked from the Go docstrings, but if these evolve later, the ones I'm using are from [2], except for UNIT, where I have opened an upstream documentation request [3]. It might be worth exposing this as: $ openshift-install analyze bootstrap PATH so folks could look at bootstrap logs which had been gathered by third-parties, but I'm punting on that for now. [1]: https://storage.googleapis.com/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.3/2455/artifacts/e2e-aws/installer/log-bundle-20191024111122.tar [2]: https://github.com/systemd/systemd/blob/v246/man/systemd.journal-fields.xml [3]: systemd/systemd#17538
- Loading branch information