-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add call to installer-gather.sh on failure #3475
Add call to installer-gather.sh on failure #3475
Conversation
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
Outdated
Show resolved
Hide resolved
This will be awesome! @openshift/sig-master |
/test pj-rehearse |
Why is this WIP?
It looks like there's an issue with the SSH-agent: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/3475/rehearse-3475-pull-ci-openshift-ovn-kubernetes-master-e2e-aws/1/artifacts/e2e-aws/container-logs/teardown.log.gz | gunzip | grep -3 'Could not open a connection to your authentication agent'
curl --insecure --silent --connect-timeout 5 --retry 3 --cert /tmp/artifacts/installer/tls/journal-gatewayd.crt --key /tmp/artifacts/installer/tls/journal-gatewayd.key --url https://3.84.112.162:19531/entries?_SYSTEMD_UNIT=openshift.service
curl --insecure --silent --connect-timeout 5 --retry 3 --cert /tmp/artifacts/installer/tls/journal-gatewayd.crt --key /tmp/artifacts/installer/tls/journal-gatewayd.key --url https://3.84.112.162:19531/entries?_SYSTEMD_UNIT=kubelet.service
curl --insecure --silent --connect-timeout 5 --retry 3 --cert /tmp/artifacts/installer/tls/journal-gatewayd.crt --key /tmp/artifacts/installer/tls/journal-gatewayd.key --url https://3.84.112.162:19531/entries?_SYSTEMD_UNIT=crio.service
Could not open a connection to your authentication agent.
No user exists for uid 1263960000
unknown user 1263960000
oc --insecure-skip-tls-verify --request-timeout=5s get apiserver.config.openshift.io authentication.config.openshift.io build.config.openshift.io console.config.openshift.io dns.config.openshift.io featuregate.config.openshift.io image.config.openshift.io infrastructure.config.openshift.io ingress.config.openshift.io network.config.openshift.io oauth.config.openshift.io project.config.openshift.io scheduler.config.openshift.io -o json |
I was trying to figure out if it worked or not. I'll have to look into how to fix the agent. |
/test pj-rehearse |
Last rehearsal didn't seem to include the teardown from this job. I thought it would only select jobs that would be affected by the change? |
/test pj-rehearse |
I'm relatively certain that the items required to find the bootstrap ip are also there in the ansible job but I at least want to get proof that this works in general before I add it there as well. roll 1d100 and hope for the best... |
/test pj-rehearse |
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
Show resolved
Hide resolved
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd | ||
fi | ||
fi | ||
ssh-add /tmp/cluster/ssh-privatekey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key exists at /etc/openshift-installer/ssh-privatekey
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
Outdated
Show resolved
Hide resolved
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
depends on openshift/origin#22592 |
/test pj-rehearse |
980e92f
to
3a0f9c9
Compare
eval $(ssh-agent) | ||
ssh-add /etc/openshift-installer/ssh-privatekey | ||
ssh -A -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip} /bin/bash -x /usr/local/bin/installer-gather.sh | ||
scp -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip}:log-bundle.tar.gz /tmp/artifacts/bootstrap-logs.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would /tmp/artifacts/installer/bootstrap-logs.tar.gz
be a better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i'll change that real quick.
sh-4.2$ bootstrap_ip=$(python -c \
> 'import sys, json; d=reduce(lambda x,y: dict(x.items() + y.items()), map(lambda x: x["resources"], json.load(sys.stdin)["modules"])); k="aws_instance.bootstrap"; print d[k]["primary"]["attributes"]["public_ip"] if k in d else ""' \
> < /tmp/artifacts/installer/terraform.tfstate
> )
sh-4.2$ whoami
whoami: cannot find name for user ID 1031310000
sh-4.2$ ls -lah /etc/passwd
-rw-rw-r--. 1 root root 630 Mar 6 02:32 /etc/passwd
sh-4.2$ echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
sh-4.2$ whoami
default
sh-4.2$ eval $(ssh-agent)
Agent pid 36
sh-4.2$ ssh-add /etc/openshift-installer/ssh-privatekey
Identity added: /etc/openshift-installer/ssh-privatekey (/etc/openshift-installer/ssh-privatekey)
sh-4.2$ ssh -A -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip} /bin/bash -x /usr/local/bin/installer-gather.sh
Could not create directory '/.ssh'.
Warning: Permanently added '54.209.181.157' (ECDSA) to the list of known hosts.
Gathering bootstrap journals ...
+ ARTIFACTS=/tmp/artifacts
+ echo 'Gathering bootstrap journals ...'
+ mkdir -p /tmp/artifacts/bootstrap/journals
+ for service in bootkube openshift kubelet crio
+ journalctl --boot --no-pager --output=short --unit=bootkube
+ for service in bootkube openshift kubelet crio
+ journalctl --boot --no-pager --output=short --unit=openshift
+ for service in bootkube openshift kubelet crio
+ journalctl --boot --no-pager --output=short --unit=kubelet
+ for service in bootkube openshift kubelet crio
+ journalctl --boot --no-pager --output=short --unit=crio
Gathering bootstrap containers ...
+ echo 'Gathering bootstrap containers ...'
+ mkdir -p /tmp/artifacts/bootstrap/containers
+ sudo crictl ps --all --quiet
+ read -r container
++ grep -oP 'Name: \K(.*)'
++ sudo crictl ps -a --id 76d398bdbefdbe43f513e8adb7c4e84b22000c35f02d662fdf03b0204b7e83ea -v
+ container_name=machine-config-server
+ sudo crictl logs 76d398bdbefdbe43f513e8adb7c4e84b22000c35f02d662fdf03b0204b7e83ea
+ sudo crictl inspect 76d398bdbefdbe43f513e8adb7c4e84b22000c35f02d662fdf03b0204b7e83ea
+ read -r container
++ sudo crictl ps -a --id ef2290a9d7b8899dbb35b8894134f6f1f91f318c66c8a326a172857b5314b6bc -v
++ grep -oP 'Name: \K(.*)'
+ container_name=machine-config-controller
+ sudo crictl logs ef2290a9d7b8899dbb35b8894134f6f1f91f318c66c8a326a172857b5314b6bc
+ sudo crictl inspect ef2290a9d7b8899dbb35b8894134f6f1f91f318c66c8a326a172857b5314b6bc
+ read -r container
+ mkdir -p /tmp/artifacts/bootstrap/pods
+ read -r container
+ sudo podman ps --all --quiet
+ sudo podman logs 192cada536d7
+ sudo podman inspect 192cada536d7
+ read -r container
+ sudo podman logs 8b11a7008838
+ sudo podman inspect 8b11a7008838
+ read -r container
+ sudo podman logs 770e4f9df136
+ sudo podman inspect 770e4f9df136
+ read -r container
+ sudo podman logs 9abfd8340668
+ sudo podman inspect 9abfd8340668
+ read -r container
+ sudo podman logs e98b0ea3dd36
+ sudo podman inspect e98b0ea3dd36
+ read -r container
+ sudo podman logs a4047d8c4229
+ sudo podman inspect a4047d8c4229
+ read -r container
+ sudo podman logs a390c27012c8
+ sudo podman inspect a390c27012c8
+ read -r container
+ sudo podman logs d0b1eae518df
+ sudo podman inspect d0b1eae518df
+ read -r container
+ mkdir -p /tmp/artifacts/control-plane /tmp/artifacts/resources
Gathering cluster resources ...
+ echo 'Gathering cluster resources ...'
+ queue resources/nodes.list oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o jsonpath --template '{range .items[*]}{.metadata.name}{"\n"}{end}'
+ local TARGET=/tmp/artifacts/resources/nodes.list
+ shift
++ jobs
++ wc -l
+ local LIVE=0
+ [[ 0 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/masters.list oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o jsonpath -l node-role.kubernetes.io/master --template '{range .items[*]}{.metadata.name}{"\n"}{end}'
+ local TARGET=/tmp/artifacts/resources/masters.list
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o jsonpath --template '{range .items[*]}{.metadata.name}{"\n"}{end}'
++ wc -l
++ jobs
+ local LIVE=1
+ [[ 1 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/containers oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods --all-namespaces --template '{{ range .items }}{{ $name := .metadata.name }}{{ $ns := .metadata.namespace }}{{ range .spec.containers }}-n {{ $ns }} {{ $name }} -c {{ .name }}{{ "\n" }}{{ end }}{{ range .spec.initContainers }}-n {{ $ns }} {{ $name }} -c {{ .name }}{{ "\n" }}{{ end }}{{ end }}'
+ local TARGET=/tmp/artifacts/resources/containers
+ shift
++ wc -l
++ jobs
+ local LIVE=2
+ [[ 2 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/api-pods oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods -l apiserver=true --all-namespaces --template '{{ range .items }}-n {{ .metadata.namespace }} {{ .metadata.name }}{{ "\n" }}{{ end }}'
+ local TARGET=/tmp/artifacts/resources/api-pods
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods --all-namespaces --template '{{ range .items }}{{ $name := .metadata.name }}{{ $ns := .metadata.namespace}}{{ range .spec.containers }}-n {{ $ns }} {{ $name }} -c {{ .name }}{{ "\n" }}{{ end }}{{ range .spec.initContainers }}-n {{ $ns }} {{ $name }} -c {{ .name }}{{ "\n" }}{{ end }}{{ end }}'
++ jobs
++ wc -l
+ local LIVE=3
+ [[ 3 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/apiservices.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get apiservices -o json
+ local TARGET=/tmp/artifacts/resources/apiservices.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o jsonpath -l node-role.kubernetes.io/master --template '{range .items[*]}{.metadata.name}{"\n"}{end}'
++ wc -l
++ jobs
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods -l apiserver=true --all-namespaces --template '{{ range .items }}-n {{ .metadata.namespace }} {{ .metadata.name }}{{ "\n" }}{{ end }}'
+ local LIVE=4
+ [[ 4 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/clusteroperators.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get clusteroperators -o json
+ local TARGET=/tmp/artifacts/resources/clusteroperators.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get apiservices -o json
++ wc -l
++ jobs
+ local LIVE=5
+ [[ 5 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/clusterversion.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get clusterversion -o json
+ local TARGET=/tmp/artifacts/resources/clusterversion.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get clusteroperators -o json
++ wc -l
++ jobs
+ local LIVE=6
+ [[ 6 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/configmaps.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get configmaps --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/configmaps.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get clusterversion -o json
++ wc -l
++ jobs
+ local LIVE=7
+ [[ 7 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/csr.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get csr -o json
+ local TARGET=/tmp/artifacts/resources/csr.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get configmaps --all-namespaces -o json
++ wc -l
++ jobs
+ local LIVE=8
+ [[ 8 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/endpoints.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get endpoints --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/endpoints.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get csr -o json
++ jobs
++ wc -l
+ local LIVE=9
+ [[ 9 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/events.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get events --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/events.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get endpoints --all-namespaces -o json
++ jobs
++ wc -l
+ local LIVE=10
+ [[ 10 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/kubeapiserver.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get kubeapiserver -o json
+ local TARGET=/tmp/artifacts/resources/kubeapiserver.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get events --all-namespaces -o json
++ jobs
++ wc -l
+ local LIVE=11
+ [[ 11 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/kubecontrollermanager.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get kubecontrollermanager -o json
+ local TARGET=/tmp/artifacts/resources/kubecontrollermanager.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get kubeapiserver -o json
++ jobs
++ wc -l
+ local LIVE=12
+ [[ 12 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/machineconfigpools.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get machineconfigpools -o json
+ local TARGET=/tmp/artifacts/resources/machineconfigpools.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get kubecontrollermanager -o json
++ wc -l
++ jobs
+ local LIVE=13
+ [[ 13 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/machineconfigs.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get machineconfigs -o json
+ local TARGET=/tmp/artifacts/resources/machineconfigs.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get machineconfigpools -o json
++ jobs
++ wc -l
+ local LIVE=14
+ [[ 14 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/namespaces.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get namespaces -o json
+ local TARGET=/tmp/artifacts/resources/namespaces.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get machineconfigs -o json
++ jobs
++ wc -l
+ local LIVE=15
+ [[ 15 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/nodes.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o json
+ local TARGET=/tmp/artifacts/resources/nodes.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get namespaces -o json
++ jobs
++ wc -l
+ local LIVE=16
+ [[ 16 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/openshiftapiserver.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get openshiftapiserver -o json
+ local TARGET=/tmp/artifacts/resources/openshiftapiserver.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get nodes -o json
++ wc -l
++ jobs
+ local LIVE=17
+ [[ 17 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/pods.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/pods.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get openshiftapiserver -o json
++ wc -l
++ jobs
+ local LIVE=18
+ [[ 18 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/rolebindings.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get rolebindings --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/rolebindings.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get pods --all-namespaces -o json
++ jobs
++ wc -l
+ local LIVE=19
+ [[ 19 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/roles.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get roles --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/roles.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get rolebindings --all-namespaces -o json
++ jobs
++ wc -l
+ local LIVE=20
+ [[ 20 -ge 45 ]]
+ [[ -n '' ]]
+ queue resources/services.json oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get services --all-namespaces -o json
+ local TARGET=/tmp/artifacts/resources/services.json
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get roles --all-namespaces -o json
++ jobs
++ wc -l
+ local LIVE=21
+ [[ 21 -ge 45 ]]
+ [[ -n '' ]]
+ FILTER=gzip
+ queue resources/openapi.json.gz oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get --raw /openapi/v2
+ local TARGET=/tmp/artifacts/resources/openapi.json.gz
+ shift
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get services --all-namespaces -o json
++ jobs
++ wc -l
Waiting for logs ...
+ local LIVE=22
+ [[ 22 -ge 45 ]]
+ [[ -n gzip ]]
+ echo 'Waiting for logs ...'
+ wait
+ gzip
+ sudo oc --config=/opt/openshift/auth/kubeconfig --request-timeout=5s get --raw /openapi/v2
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "clusteroperators"
error: the server doesn't have a resource type "apiservices"
error: the server doesn't have a resource type "configmaps"
error: the server doesn't have a resource type "clusterversion"
error: the server doesn't have a resource type "csr"
error: the server doesn't have a resource type "endpoints"
error: the server doesn't have a resource type "kubecontrollermanager"
error: the server doesn't have a resource type "kubeapiserver"
error: the server doesn't have a resource type "events"
error: the server doesn't have a resource type "machineconfigpools"
error: the server doesn't have a resource type "nodes"
error: the server doesn't have a resource type "machineconfigs"
error: the server doesn't have a resource type "namespaces"
error: the server doesn't have a resource type "openshiftapiserver"
error: the server doesn't have a resource type "pods"
error: the server doesn't have a resource type "roles"
Error from server (NotFound): the server could not find the requested resource
error: the server doesn't have a resource type "services"
error: the server doesn't have a resource type "rolebindings"
Gather remote logs
+ echo 'Gather remote logs'
+ MASTERS=()
+ export MASTERS
+ '[' 0 -ne 0 ']'
++ stat --printf=%s /tmp/artifacts/resources/masters.list
+ '[' 0 -ne 0 ']'
++ sudo oc --config=/opt/openshift/auth/kubeconfig whoami --show-server
++ grep -oP 'api.\K([a-z\.]*)'
+ DOMAIN=ci
+ mapfile -t MASTERS
++ dig -t SRV _etcd-server-ssl._tcp.ci +short
++ cut -f 4 -d ' '
++ sed 's/.$//'
/usr/local/bin/installer-gather.sh: line 92: $(dig -t SRV "_etcd-server-ssl._tcp.${DOMAIN}" +short | cut -f 4 -d ' ' | sed 's/.$//'): No such file or directory
+ tar cz -C /tmp/artifacts .
Log bundle written to ~/log-bundle.tar.gz
+ echo 'Log bundle written to ~/log-bundle.tar.gz'
sh-4.2$ scp -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip}:log-bundle.tar.gz /tmp/artifacts/bootstrap-logs.tar.gz
Could not create directory '/.ssh'.
Warning: Permanently added '54.209.181.157' (ECDSA) to the list of known hosts.
log-bundle.tar.gz 100% 26KB 938.4KB/s 00:00
sh-4.2$ ls -lah /tmp/artifacts/
total 28K
drwxrwsrwx. 3 root 1031310000 52 Apr 18 21:09 .
drwxrwxrwt. 1 root root 61 Apr 18 21:08 ..
-rw-r--r--. 1 default 1031310000 27K Apr 18 21:09 bootstrap-logs.tar.gz
drwxr-sr-x. 4 default 1031310000 199 Apr 18 21:06 installer this looks like this is working. :yay: |
3a0f9c9
to
476bb0c
Compare
476bb0c
to
58ecd9e
Compare
@@ -385,7 +385,15 @@ objects: | |||
--key /tmp/artifacts/installer/tls/journal-gatewayd.key \ | |||
--url "https://${bootstrap_ip}:19531/entries?_SYSTEMD_UNIT=${service}.service" | |||
done | |||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to keep a closing fi
on if [ -n "${bootstrap_ip}" ]
, although it should live after the block you add below, but before the else
for if [ -f /tmp/artifacts/installer/terraform.tfstate ]
.
eval $(ssh-agent) | ||
ssh-add /etc/openshift-installer/ssh-privatekey | ||
ssh -A -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip} /bin/bash -x /usr/local/bin/installer-gather.sh | ||
scp -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip}:log-bundle.tar.gz /tmp/artifacts/installer/bootstrap-logs.tar.gz fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, also we want to drop the fi
here.
58ecd9e
to
1ce1902
Compare
eval $(ssh-agent) | ||
ssh-add /etc/openshift-installer/ssh-privatekey | ||
ssh -A -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip} /bin/bash -x /usr/local/bin/installer-gather.sh | ||
scp -o PreferredAuthentications=publickey -o StrictHostKeyChecking=false -o UserKnownHostsFile=/dev/null core@${bootstrap_ip}:log-bundle.tar.gz /tmp/artifacts/installer/bootstrap-logs.tar.gz fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have a trailing fi
here.
Don't remove existing artifact gathering just yet. Depends on openshift/installer#1561
1ce1902
to
3661841
Compare
@sdodson: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm Dunno what's up with CI, though. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sdodson: Updated the following 8 configmaps:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This lets us SSH from the teardown container into the cluster without hitting: $ ssh -A core@$bootstrap_ip No user exists for uid 1051910000 OpenSSH has a very early getpwuid call [1] with no provision for bypassing via HOME or USER environment variables like we did for Bazel [2]. OpenShift runs with the random UIDs by default [3]: By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream. while Kubernetes sorts out user namespacing [4]. Despite the high UIDs, all users on the cluster are GID 0, so the g+w is sufficient (vs. a+w), and maybe this mitigates concerns about increased writability for such an important file. The main mitigation is that these are throw-away CI containers, and not long-running production containers where we are concerned about malicious entry. A more polished fix has landed in CRI-O [5], but the CI cluster is stuck on OpenShift 3.11 and Docker at the moment. Our SSH usecase is for gathering logs in the teardown container [6], but we've been using the tests image for both tests and teardown since b16dcfc (images/tests/Dockerfile*: Install gzip for compressing logs, 2019-02-19, openshift#22094). [1]: https://github.com/openssh/openssh-portable/blob/V_7_4_P1/ssh.c#L577 [2]: openshift/release#1185 [3]: https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ [4]: kubernetes/enhancements#127 [5]: cri-o/cri-o#2022 [6]: openshift/release#3475
…athering Carries changes from [1] and [2] [1]: openshift#3573 [2]: openshift#3475
Don't remove existing artifact gathering just yet.
Depends on openshift/installer#1561