Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libvirt] Failed to get console route for v0.10.0 tag and ImagePullBackoff for clusterapi-manager-controllers #1078

Closed
praveenkumar opened this issue Jan 16, 2019 · 49 comments

Comments

@praveenkumar
Copy link
Contributor

Version

$ openshift-install version
bin/openshift-install v0.10.0

Platform (aws|libvirt|openstack):

libvirt

What happened?

Installer is failed to get the console route and exit after 10 mins due to context deadline exceeded.

What you expected to happen?

Installer should able to create the cluster without any issue.

How to reproduce it (as minimally and precisely as possible)?

$ env TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4 ./bin/openshift-install create cluster --dir=test --log-level debug
[...]
INFO Waiting up to 10m0s for the openshift-console route to be created... 
DEBUG Still waiting for the console route...       
DEBUG Still waiting for the console route...
[...]
DEBUG Still waiting for the console route...       
FATAL waiting for openshift-console URL: context deadline exceeded 

$ export KUBECONFIG=/home/prkumar/work/github/practice/go/src/github.com/openshift/installer/test/auth/kubeconfig

$ kubectl get pods --all-namespaces
NAMESPACE                                    NAME                                                         READY     STATUS             RESTARTS   AGE
kube-system                                  etcd-member-crcont-master-0                                  1/1       Running            0          17m
openshift-apiserver-operator                 openshift-apiserver-operator-7fc9bc59d9-wpbw6                1/1       Running            0          16m
openshift-apiserver                          apiserver-j6g59                                              1/1       Running            2          10m
openshift-cluster-api                        cluster-autoscaler-operator-7f74bdf7f9-26tsp                 1/1       Running            0          12m
openshift-cluster-api                        clusterapi-manager-controllers-db4fbd5fc-f7x6x               2/4       ImagePullBackOff   0          11m

$ kubectl logs clusterapi-manager-controllers-db4fbd5fc-f7x6x -n openshift-cluster-api -c nodelink-controller | less
[...]
W0116 08:26:01.170879       1 main.go:379] no matching machine found for node
I0116 08:26:01.170935       1 main.go:312] finished syncing node, duration: 207.751µs
I0116 08:26:01.170988       1 main.go:296] Error syncing node crcont-master-0: no matching machine found for node: crcont-master-0
I0116 08:26:04.914170       1 main.go:147] updating node: crcont-master-0
I0116 08:26:04.914282       1 main.go:310] syncing node
I0116 08:26:04.914334       1 main.go:368] searching machine cache for IP match for node
W0116 08:26:04.914373       1 main.go:379] no matching machine found for node
I0116 08:26:04.914410       1 main.go:312] finished syncing node, duration: 129.151µs
I0116 08:26:04.914447       1 main.go:296] Error syncing node crcont-master-0: no matching machine found for node: crcont-master-0

$ kubectl logs clusterapi-manager-controllers-db4fbd5fc-f7x6x -n openshift-cluster-api -c machine-healthcheck
[...]
I0116 08:25:58.615279       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:25:58.615409       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0
I0116 08:26:08.629294       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:26:08.629354       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0
I0116 08:26:18.649707       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:26:18.649756       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0

Anything else we need to know?

Looks like the issue with clusterapi-manager-controllers-db4fbd5fc-f7x6x pod since it is in ImagePullBackoff state and logs shows that it is not able to identify the master node.

References

  • enter text here.
@johwes
Copy link

johwes commented Jan 16, 2019

I have the same issue, it fails to pull the image doing an "oc describe pod " shows the following.

Normal Scheduled 46m default-scheduler Successfully assigned openshift-cluster-api/clusterapi-manager-controllers-db4fbd5fc-bmlhw to ocp-master-0
Normal Pulled 46m kubelet, ocp-master-0 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d1b7554af827639b86372f1d9c0fbdbe609c878381f40c433a1f788ec0fe5a7" already present on machine
Normal Started 46m kubelet, ocp-master-0 Started container
Normal Created 46m kubelet, ocp-master-0 Created container
Normal Pulled 46m kubelet, ocp-master-0 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d1b7554af827639b86372f1d9c0fbdbe609c878381f40c433a1f788ec0fe5a7" already present on machine
Normal Started 46m kubelet, ocp-master-0 Started container
Normal Created 46m kubelet, ocp-master-0 Created container
Normal Pulling 46m (x2 over 46m) kubelet, ocp-master-0 pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225"
Warning Failed 46m (x4 over 46m) kubelet, ocp-master-0 Error: ImagePullBackOff
Normal BackOff 46m (x4 over 46m) kubelet, ocp-master-0 Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225"
Warning Failed 46m (x2 over 46m) kubelet, ocp-master-0 Error: ErrImagePull
Warning Failed 46m (x2 over 46m) kubelet, ocp-master-0 Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required
Normal BackOff 36m (x39 over 46m) kubelet, ocp-master-0 Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225"
Warning Failed 1m (x188 over 46m) kubelet, ocp-master-0 Error: ImagePullBackOff

@praveenkumar
Copy link
Contributor Author

I tried today with master branch and didn't see this issue but then again with 0.10.0 tag it is occurring so might be something to do with the way we tag the payload for this tag?

@johwes
Copy link

johwes commented Jan 17, 2019

I also rebuilt from 0.9.1 tag and got it working with that.

@crawford
Copy link
Contributor

0.10.0 is a special release. It is the beta1 build, which means that it targets a different set of content than 0.9.1. I also noticed that the libvirt container isn't pushed to quay (unlike its AWS counterpart), so I think it was just missed in the release process.

@praveenkumar
Copy link
Contributor Author

I also noticed that the libvirt container isn't pushed to quay (unlike its AWS counterpart), so I think it was just missed in the release process.

@crawford thanks, that explain why it is happening with only libvirt. Do we have any plan to push those missing libvirt containers to quay registry?

@bbrowning
Copy link

I just wanted to add that my team is hitting this issue as well and are stuck on 0.9.1 for now until we find a way to run 0.10.x locally with libvirt.

@e-minguez
Copy link
Contributor

Same issue with 0.10.1, console is not deployed because there are no workers available... because the clusterapi-manager-controllers is not up... because it is trying to pull the image from an internal registry which I cannot access:

5m          5m           2         clusterapi-manager-controllers-db4fbd5fc-nhqnb.157c848781ca4d28   Pod          spec.containers{controller-manager}            Warning   Failed              kubelet, minwi-master-0                                                             Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required

@wking
Copy link
Member

wking commented Jan 23, 2019

This is a release issue, the installer just pins the update payloads the release folks push to quay.io. It's being tracked here.

@ghost
Copy link

ghost commented Jan 29, 2019

@wking any update on this one? I am using the latest master and have exactly the same issue

@smarterclayton
Copy link
Contributor

You must have a pull secret to api.ci in order to access libvirt, because the installer team has chosen not to build libvirt for OCP.

@ghost
Copy link

ghost commented Jan 29, 2019

@smarterclayton thanks. How do I get one?

@smarterclayton
Copy link
Contributor

If you're not in the openshift GitHub organization, you can't get one.

libvirt isn't supported in the official installer. You need to use the origin variant or not use libvirt.

@ghost
Copy link

ghost commented Jan 29, 2019

@smarterclayton what's Origin variant? Flavor isn't that important to me. I need a local 4.0 cluster :)

@smarterclayton
Copy link
Contributor

git clone openshift/installer, run hack/build-go.sh, and that's origin

@wking
Copy link
Member

wking commented Jan 29, 2019

git clone openshift/installer, run hack/build-go.sh, and that's origin

For libvirt, you need to set TAGS=libvirt when building.

@wking
Copy link
Member

wking commented Jan 29, 2019

Flavor isn't that important to me.

Unless you take steps to preserve the public (I think?) OKD builds at registry.svc.ci.openshift.org/openshift/origin-release, they're going to get garbage-collected after a few days. Master installer builds (currently the only way to get libvirt compiled in) point there by default, so your cluster should run fine for a few days and then probably start to die as the backing images get garbage-collected. Should be fine for dev-work (the libvirt target), but it's not going to work for long-running tasks out of the box.

@ghost
Copy link

ghost commented Jan 30, 2019

@smarterclayton @wking but this is exactly what I am doing:

eugene@ivantsoft ~/go/src/github.com/openshift/installer ((HEAD detached at v0.10.0)) $ TAGS=libvirt hack/build.sh
+ RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1
+ RHCOS_BUILD_NAME=47.249
+ minimum_go_version=1.10
++ go version
++ cut -d ' ' -f 3
+ current_go_version=go1.10.3
++ version 1.10.3
++ IFS=.
++ printf '%03d%03d%03d\n' 1 10 3
++ unset IFS
++ version 1.10
++ IFS=.
++ printf '%03d%03d%03d\n' 1 10
++ unset IFS
+ '[' 001010003 -lt 001010000 ']'
+ LAUNCH_PATH=/home/eugene/go/src/github.com/openshift/installer
++ dirname hack/build.sh
+ cd hack/..
++ go list -e -f '{{.Dir}}' github.com/openshift/installer
+ PACKAGE_PATH=/home/eugene/go/src/github.com/openshift/installer
+ test -z /home/eugene/go/src/github.com/openshift/installer
+ LOCAL_PATH=/home/eugene/go/src/github.com/openshift/installer
+ test /home/eugene/go/src/github.com/openshift/installer '!=' /home/eugene/go/src/github.com/openshift/installer
+ MODE=release
++ git describe --always --abbrev=40 --dirty
+ LDFLAGS=' -X main.version=v0.10.0'
+ TAGS=libvirt
+ OUTPUT=bin/openshift-install
+ export CGO_ENABLED=0
+ CGO_ENABLED=0
+ case "${MODE}" in
+ TAGS='libvirt release'
+ test -n quay.io/openshift-release-dev/ocp-release:4.0.0-0.1
+ LDFLAGS=' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1'
+ test -n 47.249
+ LDFLAGS=' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 -X github.com/openshift/installer/pkg/rhcos.buildName=47.249'
+ test '' '!=' y
+ go generate ./data
writing assets_vfsdata.go
+ echo 'libvirt release'
+ grep -q libvirt
+ export CGO_ENABLED=1
+ CGO_ENABLED=1
+ go build -ldflags ' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 -X github.com/openshift/installer/pkg/rhcos.buildName=47.249' -tags 'libvirt release' -o bin/openshift-install ./cmd/openshift-install
eugene@ivantsoft ~/go/src/github.com/openshift/installer ((HEAD detached at v0.10.0)) $ bin/openshift-install create cluster
? SSH Public Key /home/eugene/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name codenvy
? Pull Secret [? for help] ****************************************************************************************************************************************************************************************INFO Fetching OS image: redhat-coreos-maipo-47.249-qemu.qcow2.gz 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.11.0+c69f926354 up                    
INFO Waiting up to 30m0s for the bootstrap-complete event... 
ERROR: logging before flag.Parse: E0130 08:28:24.401392    6213 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 148 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 10m0s for the openshift-console route to be created... 
FATAL waiting for openshift-console URL: context deadline exceeded 

Events:
events.log

Is this expected with 0.10.0? Since it works with 0.9.1 for me on Fedora 28 with libvirt. However, unfortunately, 0.9.1 has a bug (already fixed openshift/console#1112) and this version does not work for me since I need to work with OperatorHub and integration of Eclipse Che operator.

What would be the best way to proceed? Give up on a local install with libvirt and look for AWS resources?

@praveenkumar
Copy link
Contributor Author

@eivantsov https://bugzilla.redhat.com/show_bug.cgi?id=1666561this is where it is tracked, do put your comments.

@wking
Copy link
Member

wking commented Jan 30, 2019

...(HEAD detached at v0.10.0)...

Building from tagged releases get update payloads from quay.io, see the Bugzilla bug linked above (twice now ;). Building from master should work better, but comes wiith its own caveats.

What would be the best way to proceed? Give up on a local install with libvirt and look for AWS resources?

We will sell AWS support, so yeah, I expect that is the best route if you want fewer quirks at this stage.

@ghost
Copy link

ghost commented Jan 30, 2019

@praveenkumar i don't have access to this issue

@wking would it be fair to say that 0.10.0+ libvirt installation is broken now?

@praveenkumar
Copy link
Contributor Author

i don't have access to this issue

@eivantsov if you login using your redhat account then you will able to access this atm.

would it be fair to say that 0.10.0+ libvirt installation is broken now?

@eivantsov this only broken for tagged release which have released payloads but it does work with from master as @wking said.

@ghost
Copy link

ghost commented Jan 30, 2019

@praveenkumar I have the same problem with master too

And I am logged in with my RH email

@ghost
Copy link

ghost commented Jan 30, 2019

@wking @praveenkumar
From master, libvirt, Fedora 28

eugene@ivantsoft ~/go/src/github.com/openshift/fourdotoh $ ../installer/bin/openshift-install create cluster
? SSH Public Key /home/eugene/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name eugenious
? Pull Secret [? for help] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Fetching OS image: redhat-coreos-maipo-47.287-qemu.qcow2.gz 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.12.4+3434dda up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
ERROR: logging before flag.Parse: E0130 11:06:07.008600   14435 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 2318 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 10m0s for the openshift-console route to be created... 
FATAL waiting for openshift-console URL: context deadline exceeded 

Last lines from install log:

time="2019-01-30T11:12:45+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:13:15+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:14:04+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:14:34+02:00" level=debug msg="Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)"
time="2019-01-30T11:15:04+02:00" level=debug msg="Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)"
time="2019-01-30T11:15:44+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:16:15+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:16:45+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:17:22+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:17:52+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:18:07+02:00" level=fatal msg="waiting for openshift-console URL: context deadline exceeded"

There are a couple of failed pods with connection refused errors:

oc logs openshift-kube-apiserver-operator-5689d5dd48-nbnq5 -n=openshift-kube-apiserver-operator

I0130 09:22:49.259690       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"9b2e14ef-246d-11e9-bcb2-664f163f5f0f", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'InstallerPodFailed' Failed to create installer pod for revision 7 on node "eugenious-master-0": Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-7-eugenious-master-0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.267089       1 installer_controller.go:636] key failed with : Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-7-eugenious-master-0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.268766       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Role: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/roles?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.287194       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Image: Get https://172.30.0.1:443/apis/config.openshift.io/v1/images?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.291247       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Authentication: Get https://172.30.0.1:443/apis/config.openshift.io/v1/authentications?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.294226       1 reflector.go:134] github.com/openshift/cluster-kube-apiserver-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1alpha1.KubeAPIServerOperatorConfig: Get https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.296771       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ClusterRoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.298489       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.RoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/rolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.662625       1 resourcesync_controller.go:233] key failed with : Put https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs/instance/status: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.737589       1 leaderelection.go:270] error retrieving resource lock openshift-kube-apiserver-operator/openshift-cluster-kube-apiserver-operator-lock: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver-operator/configmaps/openshift-cluster-kube-apiserver-operator-lock: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.864810       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.059761       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.260944       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.270199       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Role: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/roles?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.290580       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Image: Get https://172.30.0.1:443/apis/config.openshift.io/v1/images?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.291841       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Authentication: Get https://172.30.0.1:443/apis/config.openshift.io/v1/authentications?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.297195       1 reflector.go:134] github.com/openshift/cluster-kube-apiserver-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1alpha1.KubeAPIServerOperatorConfig: Get https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.299254       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ClusterRoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.300194       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.RoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/rolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.463036       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ConfigMap: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:57.736504       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' '8406e42e-2470-11e9-9a93-0a580a800003 stopped leading'
I0130 09:22:57.736563       1 leaderelection.go:249] failed to renew lease openshift-kube-apiserver-operator/openshift-cluster-kube-apiserver-operator-lock: failed to tryAcquireOrRenew context deadline exceeded
F0130 09:22:57.736585       1 leaderelection.go:65] leaderelection lost
oc logs openshift-kube-scheduler-eugenious-master-0 -n=openshift-kube-scheduler

130 09:28:55.539845       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.StatefulSet: Get https://eugenious-api.tt.testing:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.563234       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.563941       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.PersistentVolume: Get https://eugenious-api.tt.testing:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.567032       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.567702       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1beta1.PodDisruptionBudget: Get https://eugenious-api.tt.testing:6443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.574635       1 reflector.go:169] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.585226       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.596884       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ReplicationController: Get https://eugenious-api.tt.testing:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.597899       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.PersistentVolumeClaim: Get https://eugenious-api.tt.testing:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.598112       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.599863       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.600301       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.602682       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.609265       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178
E0130 09:28:55.610041       1 reflector.go:134] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178: Failed to list *v1.Pod: Get https://eugenious-api.tt.testing:6443/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610211       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Node: Get https://eugenious-api.tt.testing:6443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610363       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.StorageClass: Get https://eugenious-api.tt.testing:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610724       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ReplicaSet: Get https://eugenious-api.tt.testing:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.614051       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Service: Get https://eugenious-api.tt.testing:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:56.540950       1 reflector.go:169] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.564111       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.568644       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.597321       1 reflector.go:169] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.608238       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.611688       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178
I0130 09:28:56.614043       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.616628       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.617114       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.618578       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:131
I0130 09:29:02.086276       1 wrap.go:47] GET /metrics: (2.775502ms) 200 [Prometheus/2.6.0 192.168.126.51:48566]
I0130 09:29:02.089486       1 wrap.go:47] GET /metrics: (5.676885ms) 200 [Prometheus/2.6.0 192.168.126.51:49234]
E0130 09:29:03.403148       1 event.go:259] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'eugenious-master-0_d2bc12a7-2470-11e9-936a-52fdfc072182 stopped leading'
I0130 09:29:03.403228       1 leaderelection.go:249] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0130 09:29:03.403256       1 server.go:207] lost master
lost lease

Events:

install_events.log

@e-minguez
Copy link
Contributor

e-minguez commented Jan 30, 2019

Current master (commit id d3ff3af):

  • Bootstrap node and master are created
  • Deployment happens
  • Bootstrap node is deleted
  • Worker is created
  • Installation finished unsuccessfully while waiting for the console to be created (FATAL waiting for openshift-console URL: context deadline exceeded)
    $ oc get pods -o wide --all-namespaces
    NAMESPACE                                    NAME                                                       READY     STATUS              RESTARTS   AGE       IP               NODE                   NOMINATED NODE
    kube-system                                  etcd-member-minwi-master-0                                 1/1       Running             0          17m       192.168.126.11   minwi-master-0         <none>
    openshift-apiserver-operator                 openshift-apiserver-operator-74cb6bbbfc-bf877              1/1       Running             1          10m       10.128.0.26      minwi-master-0         <none>
    openshift-apiserver                          apiserver-t78ct                                            1/1       Running             0          3m51s     10.128.0.48      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-9ldbf         1/1       Running             0          6m56s     10.128.0.33      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-kzslq         1/1       Running             0          6m56s     10.128.0.34      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-zdg2r         1/1       Running             0          6m56s     10.128.0.35      minwi-master-0         <none>
    openshift-authentication-operator            origin-cluster-authentication-operator1-77868f4756-dzc6h   1/1       Running             1          6m56s     10.128.0.36      minwi-master-0         <none>
    openshift-cloud-credential-operator          cloud-credential-operator-c8c99b889-gq9cr                  1/1       Running             0          7m5s      10.128.0.32      minwi-master-0         <none>
    openshift-cluster-api                        cluster-autoscaler-operator-59fbb7468d-s2qdc               1/1       Running             0          10m       10.128.0.25      minwi-master-0         <none>
    openshift-cluster-api                        clusterapi-manager-controllers-5764dd8cd7-r5h6v            4/4       Running             0          9m51s     10.128.0.28      minwi-master-0         <none>
    openshift-cluster-api                        machine-api-operator-587656b779-vx9cc                      1/1       Running             0          10m       10.128.0.27      minwi-master-0         <none>
    openshift-cluster-machine-approver           machine-approver-64fbd8bc6c-mqrrr                          1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-cluster-version                    cluster-version-operator-c4599b87d-27fgp                   1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-controller-manager-operator        openshift-controller-manager-operator-677b796b6f-g896f     1/1       Running             4          7m56s     10.128.0.30      minwi-master-0         <none>
    openshift-controller-manager                 controller-manager-tqkcc                                   1/1       Running             1          3m23s     10.128.0.51      minwi-master-0         <none>
    openshift-core-operators                     openshift-service-cert-signer-operator-65664df755-tctzz    1/1       Running             0          18m       10.128.0.2       minwi-master-0         <none>
    openshift-dns-operator                       dns-operator-6cddb84ddd-264mp                              1/1       Running             0          18m       10.128.0.3       minwi-master-0         <none>
    openshift-dns                                dns-default-6bzhh                                          2/2       Running             0          8m4s      10.129.0.2       minwi-worker-0-52gn4   <none>
    openshift-dns                                dns-default-g2tpm                                          2/2       Running             0          15m       10.128.0.8       minwi-master-0         <none>
    openshift-image-registry                     cluster-image-registry-operator-5544bb9f48-8flqd           1/1       Running             0          6m55s     10.128.0.37      minwi-master-0         <none>
    openshift-image-registry                     image-registry-77f87b797f-fhvrf                            1/1       Running             0          6m19s     10.129.0.6       minwi-worker-0-52gn4   <none>
    openshift-image-registry                     node-ca-955vc                                              1/1       Running             0          6m11s     10.128.0.40      minwi-master-0         <none>
    openshift-image-registry                     node-ca-kgzl8                                              1/1       Running             0          6m11s     10.129.0.7       minwi-worker-0-52gn4   <none>
    openshift-ingress-operator                   ingress-operator-5ff8c7dfdd-8hkc5                          1/1       Running             0          6m53s     10.128.0.38      minwi-master-0         <none>
    openshift-ingress                            router-default-654ff569fd-qpkjd                            1/1       Running             0          6m32s     192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-kube-apiserver-operator            openshift-kube-apiserver-operator-5689d5dd48-6m9d5         0/1       CrashLoopBackOff    4          18m       10.128.0.4       minwi-master-0         <none>
    openshift-kube-apiserver                     installer-1-minwi-master-0                                 0/1       Completed           0          14m       10.128.0.9       minwi-master-0         <none>
    openshift-kube-apiserver                     installer-2-minwi-master-0                                 0/1       Completed           0          14m       10.128.0.11      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-3-minwi-master-0                                 0/1       Completed           0          6m2s      10.128.0.41      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-4-minwi-master-0                                 0/1       Completed           0          4m44s     10.128.0.46      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-5-minwi-master-0                                 0/1       Completed           0          3m21s     10.128.0.52      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-6-minwi-master-0                                 0/1       Completed           0          100s      10.128.0.56      minwi-master-0         <none>
    openshift-kube-apiserver                     openshift-kube-apiserver-minwi-master-0                    1/1       Running             0          66s       192.168.126.11   minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-0-minwi-master-0                           0/1       Completed           0          14m       10.128.0.10      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-2-minwi-master-0                           0/1       Completed           0          14m       10.128.0.12      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-3-minwi-master-0                           0/1       Completed           0          6m2s      10.128.0.43      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-4-minwi-master-0                           0/1       Completed           0          4m44s     10.128.0.47      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-5-minwi-master-0                           0/1       Completed           0          3m21s     10.128.0.53      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-6-minwi-master-0                           0/1       Completed           0          100s      10.128.0.57      minwi-master-0         <none>
    openshift-kube-controller-manager-operator   kube-controller-manager-operator-5b8fcd96c8-96v8d          1/1       Running             4          13m       10.128.0.14      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-1-minwi-master-0                                 0/1       Completed           0          12m       10.128.0.16      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-2-minwi-master-0                                 0/1       Completed           0          12m       10.128.0.17      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-3-minwi-master-0                                 0/1       Completed           0          5m4s      10.128.0.44      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-4-minwi-master-0                                 0/1       Completed           0          3m35s     10.128.0.49      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-5-minwi-master-0                                 0/1       Completed           0          2m        10.128.0.54      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-6-minwi-master-0                                 0/1       ContainerCreating   0          6s        <none>           minwi-master-0         <none>
    openshift-kube-controller-manager            openshift-kube-controller-manager-minwi-master-0           1/1       Running             3          109s      192.168.126.11   minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-0-minwi-master-0                           0/1       Completed           0          12m       10.128.0.19      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-2-minwi-master-0                           0/1       Completed           0          12m       10.128.0.18      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-3-minwi-master-0                           0/1       Completed           0          5m3s      10.128.0.45      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-4-minwi-master-0                           0/1       Completed           0          3m36s     10.128.0.50      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-5-minwi-master-0                           0/1       Completed           0          2m        10.128.0.55      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-6-minwi-master-0                           0/1       ContainerCreating   0          7s        <none>           minwi-master-0         <none>
    openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-85cff5b9bd-2k6v8         1/1       Running             0          13m       10.128.0.13      minwi-master-0         <none>
    openshift-kube-scheduler                     installer-1-minwi-master-0                                 0/1       Completed           0          13m       10.128.0.15      minwi-master-0         <none>
    openshift-kube-scheduler                     openshift-kube-scheduler-minwi-master-0                    1/1       Running             4          12m       192.168.126.11   minwi-master-0         <none>
    openshift-kube-scheduler                     revision-pruner-0-minwi-master-0                           0/1       OOMKilled           0          11m       10.128.0.20      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-controller-d7c89fcb5-gmzh2                  1/1       Running             0          10m       10.128.0.24      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-daemon-m8chj                                1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-machine-config-operator            machine-config-daemon-mf4jb                                1/1       Running             0          10m       192.168.126.11   minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-operator-6d45f58bbc-92kl2                   1/1       Running             0          11m       10.128.0.21      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-server-6gr95                                1/1       Running             0          10m       192.168.126.11   minwi-master-0         <none>
    openshift-monitoring                         cluster-monitoring-operator-f576575b5-cxbjq                1/1       Running             0          6m53s     10.128.0.39      minwi-master-0         <none>
    openshift-monitoring                         grafana-78765ddcc7-cr9bq                                   2/2       Running             1          112s      10.129.0.8       minwi-worker-0-52gn4   <none>
    openshift-monitoring                         prometheus-operator-6df5775484-jsrnj                       1/1       Running             2          6m30s     10.129.0.4       minwi-worker-0-52gn4   <none>
    openshift-multus                             multus-5cwqx                                               1/1       Running             4          17m       192.168.126.11   minwi-master-0         <none>
    openshift-multus                             multus-g8dbk                                               1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-network-operator                   network-operator-6484475bbd-zv66l                          1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-operator-lifecycle-manager         catalog-operator-58b8fb9564-8wf28                          1/1       Running             0          11m       10.128.0.23      minwi-master-0         <none>
    openshift-operator-lifecycle-manager         olm-operator-686859c7c9-qppqh                              1/1       Running             0          11m       10.128.0.22      minwi-master-0         <none>
    openshift-operator-lifecycle-manager         olm-operators-5cv8v                                        1/1       Running             0          10m       10.129.0.3       minwi-worker-0-52gn4   <none>
    openshift-operator-lifecycle-manager         packageserver-7c95d754d6-mpwmz                             1/1       Running             2          6m3s      10.128.0.42      minwi-master-0         <none>
    openshift-sdn                                ovs-q65kv                                                  1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-sdn                                ovs-x5jfc                                                  1/1       Running             0          17m       192.168.126.11   minwi-master-0         <none>
    openshift-sdn                                sdn-controller-26skg                                       0/1       CrashLoopBackOff    4          17m       192.168.126.11   minwi-master-0         <none>
    openshift-sdn                                sdn-gksbr                                                  1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-sdn                                sdn-xbth8                                                  1/1       Running             1          17m       192.168.126.11   minwi-master-0         <none>
    openshift-service-cert-signer                apiservice-cabundle-injector-5f54f9578b-bcvqb              1/1       Running             0          15m       10.128.0.6       minwi-master-0         <none>
    openshift-service-cert-signer                configmap-cabundle-injector-54cc474585-7hhc7               1/1       Running             0          15m       10.128.0.5       minwi-master-0         <none>
    openshift-service-cert-signer                service-serving-cert-signer-d9987ff6d-57nr6                1/1       Running             0          15m       10.128.0.7       minwi-master-0         <none>
     
    $ oc get nodes -o wide
    NAME                   STATUS    ROLES     AGE       VERSION              INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION              CONTAINER-RUNTIME
    minwi-master-0         Ready     master    18m       v1.12.4+f39ab668d3   192.168.126.11   <none>        Red Hat CoreOS 4.0   3.10.0-957.5.1.el7.x86_64   cri-o://1.12.5-1.rhaos4.0.git97ebf9b.el7-dev
    minwi-worker-0-52gn4   Ready     worker    8m40s     v1.12.4+f39ab668d3   192.168.126.51   <none>        Red Hat CoreOS 4.0   3.10.0-957.5.1.el7.x86_64   cri-o://1.12.5-1.rhaos4.0.git97ebf9b.el7-dev
     
     
    # installer output
    ...
    INFO Waiting up to 10m0s for the openshift-console route to be created...
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    FATAL waiting for openshift-console URL: context deadline exceeded

Findings:

  • Some pods in the master are oomkilled/crashloopback
  • Some 'the server is currently unable to handle the request' messages happen. I think this is because the master is out of resources
  • openshift-install still uses a huge amount of ram...

I think the issues can be workarounded by adding more cpu/ram to the master node (so, create the manifests, modify the cpu/ram specs for the masters, and create the cluster), but I will need to find somewhere else to test it, my laptop is not capable of do that.

Just in case, in order to make this 'work' in my laptop (t480s, 16 gb ram) I need to:

  • Close all apps (except the terminal)
  • Add 4 gb of swap (fallocate -l 4G /swapfile to create a swapfile... then swapon -a /swapfile...
  • Also after closing all apps, clean the disk cache to free up some memory (echo 3 > /proc/sys/vm/drop_caches)

@ghost
Copy link

ghost commented Jan 30, 2019

@e-minguez i have 24GB and while installing I do not see all of my RAM being used.

@e-minguez
Copy link
Contributor

@e-minguez i have 24GB and while installing I do not see all of my RAM being used.

I do have 16 and unless I add 4 gb more of swap (goodbye nvme!) the installer is oom-killed :)

@rhopp
Copy link

rhopp commented Jan 30, 2019

It seems, that I stumbled upon the same issue using 0.11.0 version and AWS as the target infrastructure.
Here's the script output

INFO Creating cluster... INFO Waiting up to 30m0s for the Kubernetes API... INFO API v1.11.0+8868a98a7b up INFO Waiting up to 30m0s for the bootstrap-complete event... ERROR: logging before flag.Parse: E0130 12:44:05.888016 25969 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug="" WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 297 INFO Destroying the bootstrap resources... INFO Waiting up to 10m0s for the openshift-console route to be created... FATAL waiting for openshift-console URL: context deadline exceeded

Here's the .openshift_install.log: https://paste.fedoraproject.org/paste/zhe4v4FPxfel9vm01Nvvjw

@jkremser
Copy link

I am in the completely same situation as @eivantsov . I need to test the operator in the marketplace and I am also hitting this issue in clusterapi-manager-controllers pod. No console for me :(

Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required

and I can confirm that it's almost impossible to run the installer on t480s with 16gigs of RAM. 0.9.1 worked, but marketplace was broken there. Isn't it possible to run 0.9.1 version of installer and upgrade only the console in it? The whole cluster comes with the console operator, can't I just modify the crd for the console to use the version w/ the fix?

@praveenkumar
Copy link
Contributor Author

@wking Since now this payload is available on quay.io, and according to https://bugzilla.redhat.com/show_bug.cgi?id=166656 it should be available with new tag of installer, so when we can do a new tag release for installer and have all payload from quay.io side?

@cynepco3hahue
Copy link

I still can see that issue on the master

  Warning  Failed     3m (x2 over 4m)  kubelet, test-1-master-0  Error: ErrImagePull
  Warning  Failed     3m (x2 over 4m)  kubelet, test-1-master-0  Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123: unauthorized: authentication required
  Warning  Failed     3m (x4 over 4m)  kubelet, test-1-master-0  Error: ImagePullBackOff
  Normal   BackOff    3m (x3 over 4m)  kubelet, test-1-master-0  Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225"

@steven-ellis
Copy link

I've been having similar issues which I put down to memory sizing #1041
I've increased the memory for master-0 but I'm still not getting any workers starting.

@ghost
Copy link

ghost commented Mar 8, 2019

Hi guys,

The minimum required is :

1 x master
2x workers

*router - need two worker

Best,
Fábio Sbano

@wking
Copy link
Member

wking commented Mar 8, 2019

*router - need two worker

Is there a router pull or docs I can link for that? I guess we need to bump our libvirt default to catch up.

@ghost
Copy link

ghost commented Mar 8, 2019

wking,

I'll create a howto..

see running at https://youtu.be/ZOZPmwUwWj8

Best Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 8, 2019

Wking,

Are you using the latest version of the installer?

Regards,
Fábio Sbano

@wking
Copy link
Member

wking commented Mar 8, 2019

Are you using the latest version of the installer?

I haven't run it on libvirt in a while, but if the router for some reason needs 2+ compute nodes now, we'd want to update the default and some validation. Or is the issue total compute memory constraints or similar, and not actually compute replica count?

@ghost
Copy link

ghost commented Mar 8, 2019

WKing,

replica count=2 then you can not hear two 443, 80 on the same physical or virtual host.

Best Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 8, 2019

I needed to change some things in my setup configuration file and adjust the memory and solve the problem with dnsmasq.

Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 8, 2019

.tf
bootstrap - 32gb
master - 32gb
2x worker - 4gb

and wildcard "*.apps.test1.tt.testing" on bind listen ip 192.168.126.1

Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 9, 2019

Hi,

[fsbano@voyager-1 ~]$ oc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
router-default 0/2 2 0 8m23s
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc describe deployment/router-default
Name: router-default
Namespace: openshift-ingress
CreationTimestamp: Fri, 08 Mar 2019 21:16:46 -0300
Labels: app=router
ingress.openshift.io/clusteringress=default
Annotations: deployment.kubernetes.io/revision=1
Selector: app=router,router=router-default
Replicas: 2 desired | 2 updated | 2 total | 0 available | 2 unavailable

Type Reason Age From Message


Normal ScalingReplicaSet 8m deployment-controller Scaled up replica set router-default-779745f684 to 2
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc edit deployments/router-default

 nodeSelector:
    node-role.kubernetes.io/worker: ""

Best Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 9, 2019

Hi,

My install-config.yaml file

[root@voyager-1 ~]# more install-config.yaml
apiVersion: v1beta3
baseDomain: tt.testing
compute:

  • name: worker
    platform: {}
    replicas: 2
    controlPlane:
    name: master
    platform: {}
    replicas: 1
    metadata:
    creationTimestamp: null
    name: test1
    networking:
    clusterNetworks:
    • cidr: 10.128.0.0/14
      hostSubnetLength: 9
      machineCIDR: 192.168.126.0/24
      serviceCIDR: 172.30.0.0/16
      type: OpenShiftSDN
      platform:
      libvirt:
      URI: qemu+tcp://192.168.122.1/system
      network:
      if: tt0

$ ./openshift-install create cluster --dir . --log-level debug

Best Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 9, 2019

Hey,

Step-by-Step

[fsbano@voyager-1 ~]$ oc get pod --all-namespaces | egrep -v '(Running|Completed)'
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-console console-6d6ffd4444-9299h 0/1 CrashLoopBackOff 13 59m
openshift-console console-6d6ffd4444-p2hpf 0/1 CrashLoopBackOff 13 59m
openshift-kube-controller-manager revision-pruner-3-jaguar-kt74v-master-0 0/1 OOMKilled 0 75m
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ sudo cat /var/named/named.apps.jaguar.fsbano.io
$TTL 1D
@	IN SOA	@ rname.invalid. (
					0	; serial
					1D	; refresh
					1H	; retry
					1W	; expire
					3H )	; minimum
	  NS	@
	A     192.168.126.1
*       A     192.168.126.51
*       A     192.168.126.52
[fsbano@voyager-1 ~]$ 

[fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io
Host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io not found: 3(NXDOMAIN)
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ sudo service named restart
Redirecting to /bin/systemctl restart named.service
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io
prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io has address 192.168.126.51
prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io has address 192.168.126.52
[fsbano@voyager-1 ~]$ 👍

[fsbano@voyager-1 ~]$ oc scale deployment.apps/console --replicas=0
deployment.apps/console scaled
[fsbano@voyager-1 ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
console-6d6ffd4444-9299h 0/1 Terminating 14 64m
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod
No resources found.
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
console-6d6ffd4444-6c4p2 0/1 ContainerCreating 0 3s
console-6d6ffd4444-9tfhv 0/1 ContainerCreating 0 3s
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
console-6d6ffd4444-skhg6 1/1 Running 0 2m52s
console-6d6ffd4444-wcd6g 1/1 Running 0 2m52s
[fsbano@voyager-1 ~]$ 💯

Best Regards,
Good Night Everybody
Fábio Sbano

@ghost
Copy link

ghost commented Mar 9, 2019

Imagens!

okd-preview-4-libvirt
okd-preview-4-cluster-status-libvirt
okd-preview-4-machines-libvirt

Best Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 11, 2019

*router - need two worker

Is there a router pull or docs I can link for that? I guess we need to bump our libvirt default to catch up.

Can I send a pull request?

Best Regards,
Fabio Sbano

@praveenkumar
Copy link
Contributor Author

praveenkumar commented Mar 11, 2019

@ssbano From which commit (component) it is required to have 2 worker to make it work in case of libvirt platform? If this is hard requirement then it would be problematic for us (Code Ready Container team), we are trying only single node cluster (with no worker).

I tested 0.14.0 tag with single worker and everything worked as expected. but today when I am trying out the master then getting following error (is this because of that limitation?)

$ oc get events -n openshift-ingress
LAST SEEN   TYPE      REASON              OBJECT                                 MESSAGE
8m23s       Warning   FailedCreate        replicaset/router-default-76d66d6844   Error creating: pods "router-default-76d66d6844-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 80: Host ports are not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 443: Host ports are not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 1936: Host ports are not allowed to be used]

@ghost
Copy link

ghost commented Mar 11, 2019

@praveenkumar,

I was using the master until yesterday. this morning I did downgrade to 0.14 and it is also running perfectly

Could you describe your setup?

PS: I saw that they updated the image to 20190310

Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 11, 2019

@praveenkumar,

on the two worker it works only with a work but will always be "pending".

Please,

oc project openshift-ingress
oc describe router-default

see #1395

Regards,
Fábio Sbano

@zeenix
Copy link
Contributor

zeenix commented Apr 29, 2019

I think this is likely a duplicate of #1007

@praveenkumar
Copy link
Contributor Author

Already fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests