Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

I cannot re-deploy on OCP4 cluster #36

Open
mvazquezc opened this issue Apr 24, 2019 · 3 comments
Open

I cannot re-deploy on OCP4 cluster #36

mvazquezc opened this issue Apr 24, 2019 · 3 comments

Comments

@mvazquezc
Copy link

I've deployed this repo on a working OCP4 cluster following the instructions on the readme file:

git clone https://github.com/openshift-cloud-functions/knative-operators
cd knative-operators/
git fetch --tags  
git checkout openshift-v0.4.0
./etc/scripts/install.sh

Then I wanted to delete everything related to knative and followed these steps:

oc delete ns istio-operator istio-system knative-build knative-eventing knative-serving myproject
oc -n openshift-operator-lifecycle-manager get catalogsource -o name | grep -E "knative|istio|maistra" | xargs oc -n openshift-operator-lifecycle-manager delete
oc get pods -o name -n openshift-operator-lifecycle-manager | grep -E "knative|maistra" | xargs oc -n openshift-operator-lifecycle-manager delete
oc get crd -o name | grep -E "istio|knative" | xargs oc delete
oc get clusterroles -o name | grep -E "istio|knative|maistra" | xargs oc delete
oc get clusterrolebinding -o name | grep -E "istio|maistra|openshift-ansible-installer-cluster-role-binding" | xargs oc delete
oc get service -o name -n openshift-operator-lifecycle-manager | grep -E "knative|istio|maistra" | xargs oc -n openshift-operator-lifecycle-manager delete
for namespace in $(oc get ns -o name | awk -F "/" '{print $2}')
do
  echo "Cleaning ns $namespace"
  oc get configmap -o name -n $namespace | grep -E "istio|knative|maistra" | xargs oc -n $namespace delete
  oc get secret -o name -n $namespace | grep -E "istio|knative|maistra" | xargs oc -n $namespace delete
done
oc -n openshift-monitoring -o name get clusterserviceversion | grep -E "istio|knative|maistra" | xargs oc -n openshift-monitoring delete
oc -n openshift-operators -o name get clusterserviceversion | grep -E "istio|knative|maistra" | xargs oc -n openshift-operators delete
oc -n openshift-operator-lifecycle-manager -o name get clusterserviceversion | grep -E "istio|knative|maistra" | xargs oc -n openshift-operator-lifecycle-manager delete

Now, when trying to deploy the project again, it fails while waiting for pods on olm namespace to come up:

+ eval oc get pods -n openshift-operator-lifecycle-manager '|' grep knative
++ oc get pods -n openshift-operator-lifecycle-manager
++ grep knative
+ sleep 5
+ [[ 127 -gt 120 ]]
+ echo 'ERROR: Timed out'
ERROR: Timed out
+ exit -1

The catalog-operator pod is showing these errors:

E0424 14:17:53.709605       1 queueinformer_operator.go:177] Sync "openshift-operator-lifecycle-manager/knative-build" failed: error ensuring pod: : error creating new pod: knative-build-: Internal error occurred: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: service "istio-sidecar-injector" not found
time="2019-04-24T14:17:54Z" level=warning msg="couldn't ensure registry server" error="error ensuring pod: : error creating new pod: knative-eventing-: Internal error occurred: failed calling admission webhook \"sidecar-injector.istio.io\": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: service \"istio-sidecar-injector\" not found" id=wSCMk source=knative-eventing
time="2019-04-24T14:17:54Z" level=info msg="retrying openshift-operator-lifecycle-manager/knative-eventing"
E0424 14:17:54.307872       1 queueinformer_operator.go:177] Sync "openshift-operator-lifecycle-manager/knative-eventing" failed: error ensuring pod: : error creating new pod: knative-eventing-: Internal error occurred: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: service "istio-sidecar-injector" not found
time="2019-04-24T14:17:55Z" level=warning msg="couldn't ensure registry server" error="error ensuring pod: : error creating new pod: maistra-operators-: Internal error occurred: failed calling admission webhook \"sidecar-injector.istio.io\": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: service \"istio-sidecar-injector\" not found" id=+F2Ul source=maistra-operators
time="2019-04-24T14:17:55Z" level=info msg="retrying openshift-operator-lifecycle-manager/maistra-operators"
E0424 14:17:55.307927       1 queueinformer_operator.go:177] Sync "openshift-operator-lifecycle-manager/maistra-operators" failed: error ensuring pod: : error creating new pod: maistra-operators-: Internal error occurred: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: service "istio-sidecar-injector" not found

Any suggestions?

Thanks,

@davgordo
Copy link

@mvazquezc the main issue here seems to be service "istio-sidecar-injector" not found. Do you find that service is provisioned in the istio-system namespace?

If you do not see the istio-system namespace at all, that is a big clue. I'll explain...

oc delete project ... is an asynchronous command. The project (namespace) will only be deleted once all of it's pods are spun down. In this case of istio-system there are quite a few pods to spin down, and looks like you're deleting many namespaces at once. I think there is a possibility that you ran the installer before all of the namespaces were fully deleted. It is only a guess, but might be something worth observing.

@mvazquezc
Copy link
Author

I've tried a fresh install using tag openshift-v0.4.0.

The installer timed out while waiting for the istio-operator pod:

+ eval oc get pods -n istio-operator '&&' '[[' '$(oc' get pods -n istio-operator '2>&1' '|' grep -c -v -E ''\''(Running|Completed|Terminating|STATUS)'\'')' -eq 0 ']]'
++ oc get pods -n istio-operator
No resources found.
+++ oc get pods -n istio-operator
+++ grep -c -v -E '(Running|Completed|Terminating|STATUS)'
++ [[ 1 -eq 0 ]]
+ sleep 5
+ [[ 305 -gt 300 ]]
+ echo 'ERROR: Timed out'
ERROR: Timed out
+ exit -1

The catalog-operator logs:

E0426 10:09:55.099691       1 queueinformer_operator.go:177] Sync "istio-operator" failed: {maistra alpha  {maistra-operators openshift-operator-lifecycle-manager}} not found: CatalogSource {maistra-operators openshift-operator-lifecycle-manager} not found
time="2019-04-26T10:09:55Z" level=info msg="building connection to registry" currentSource="{knative-serving openshift-operator-lifecycle-manager}" id=l7CCe source=knative-serving
time="2019-04-26T10:09:55Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{knative-serving openshift-operator-lifecycle-manager}" id=l7CCe source=knative-serving
time="2019-04-26T10:09:55Z" level=info msg="retrying istio-operator"
E0426 10:09:55.140085       1 queueinformer_operator.go:177] Sync "istio-operator" failed: {maistra alpha  {maistra-operators openshift-operator-lifecycle-manager}} not found: CatalogSource {maistra-operators openshift-operator-lifecycle-manager} not found
time="2019-04-26T10:09:55Z" level=info msg="building connection to registry" currentSource="{maistra-operators openshift-operator-lifecycle-manager}" id=8IQIm source=maistra-operators
time="2019-04-26T10:09:55Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{maistra-operators openshift-operator-lifecycle-manager}" id=8IQIm source=maistra-operators

@mvazquezc
Copy link
Author

Deploying the project from master branch worked.

I'll try to delete everything and check if the cluster continues to be operational. Last time, after deleting everything following the steps outlined before, whenever a new deployment was created, it got stuck waiting for the istio sidecar to be injected...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants