Unable to bundle-upgrade using operator-sdk cli tool #6204

talsharon48 · 2022-11-27T11:28:05Z

Bug Report

What did you do?

Hello,
I am using operator-sdk v1.24.0 and trying to use it to create bundle and use them in openshift cluster via OLM.
i wrote a demo operator made bundle from it and created a catalog image using catalog-build and catalog-push.
i have manged to create catalog source from it, saw it in the UI and installed the operator, when i tried upgrading the operator, i have create new bundle version using make bundle VERSION=0.0.2, bundle-build and bundle-push and used operator-sdk upgrade bundle to upgrade it.
i have encountered an error saying "Failed to run bundle upgrage: install plan is not available for subscription : timed out waiting for condition" although when i browsing the catalog i can see the latest version is v0.0.2 and not v0.0.1.
when i uninstalled and installed manually the operator the desired v0.0.2 was deployed.
NOTE: i am working in on-premise environment that i cant upload any logs or code snippets.

What did you expect to see?

running operator-sdk upgrade-bundle command to deploy new version (CSV) of my operator.

What did you see instead? Under which circumstances?

after running the command a pod named was raised and served the new version
although nothing other happened and the error mentioned above was raised.

Environment

Operator type:

/language go

Kubernetes cluster type:

openshift v4.6.15

$ operator-sdk version

operator-sdk version: "v1.24.0", commit: "de6a14d03de3c36dcc9de3891af788b49d15f0f3", kubernetes version: "1.24.2", go version: "go1.18.6", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

go version go1.18.6 linux/amd64

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"8df677dc147fe8297d90c4757154469a931bdb90", GitTreeState:"clean", BuildDate:"2022-11-04T15:44:27Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

varshaprasad96 · 2022-11-28T19:45:11Z

Can you check by running the same command with an increased timeout using the --timeout flag in the command. In some of the cases we have observed the default timeout to be less for openshift clusters.

talsharon48 · 2022-11-29T14:33:02Z

hello,
I have tried to use the --timeout=15m0s flag giving it 15 minutes to complete and got the same result
FATA[0900] Failed to run bundle upgrade: install plan is not available for the subscription olm-test: timed out waiting for the condition

everettraven · 2022-11-29T15:37:46Z

NOTE: i am working in on-premise environment that i cant upload any logs or code snippets.

@talsharon48 just to make sure - you can't share any logs at all? Without some logs or more information I'm not quite sure we will be able to help very much. That said, I'll share a brief overview of what I would check:

Check the logs of the registry pod that is created by the operator-sdk run bundle-upgrade command. If this is failing to start properly, this could be causing OLM to not have the information needed to create the InstallPlan resource.
Check the status of the CatalogSource resource used by operator-sdk run bundle-upgrade. If there is a problem it should be present in the status. You should be able to get detailed information with kubectl describe catalogsource <catalogsource-name>
Check the status of the Subscription resource used by operator-sdk run bundle-upgrade. If there is a problem it should be present in the status. You should be able to get detailed information with kubectl describe subscription <subscription-name>
Check the CSV's and see if one is created for upgrading to v0.0.2 with kubectl get clusterserviceversion. If there is one created, the operator should have been upgraded successfully but for some reason used the same InstallPlan resource for approval as when it was installed. I believe operator-sdk run bundle-upgrade expects it to use a new one, and since that new one doesn't exist it failed with that error (I haven't seen this be the case recently so I'm not sure what may be causing this).

I hope this helps a bit!

talsharon48 · 2022-11-30T13:53:39Z

@everettraven I can investigate the logs from any pod, ill just have to type here any figures manually for you to see.

The logs from the pod raised by the upgrade-bundle command look good and ending by serving the registry (when I
choose the operator at the OperatorHub I can see the v0.0.2, the new one, available)
The CatalogSource also looks good with a status saying the ConnectionState is READY and reachable
The Subscription looks good with status showing the details about the current version v0.0.1, the InstallPlan of v0.0.1 and the CatalogHealth which is true
There's no v0.0.2 available only v0.0.1

Any other suggestions? or logs you need?

everettraven · 2022-11-30T15:27:07Z

@talsharon48 The only other thing I can think to check is if the Subscription has the field installPlanApproval: Manual.

If it does not then that means OLM is automatically approving the upgrade - I'm not sure what impact this has on the actual generation of a new InstallPlan. I believe the operator-sdk run bundle-upgrade command expects it to be set to manual approval so OLM doesn't automatically perform and upgrades.

If that doesn't seem to be causing the problem I can't really think of anything other reason why this issue is happening.

Is it possible for you to provide a way that this problem can be replicated? If I can replicate the problem I can try to dig a bit further.

talsharon48 · 2022-12-01T14:48:21Z

@everettraven I would be thankful if you help me investigate the problem, I'll explain the steps I have done for you to replicate my problem:

Initialize a new project using operator-sdk init --domain=<domain>.io
Generate a new API using operator-sdk create api --group olm --version v1alpha1 --kind UpgradePOC --controller --resource
build the controller image, I am using harbor registry to store the images, export IMG=<harbor-fqdn/project/repo:v0.0.1>
make docker-build docker-push
generate bundle using make bundle VERSION=0.0.1
build the bundle image using: export BUNDLE_IMG=<harbor-fqdn/project/repo-bundle:v0.0.1>
make bundle-build bundle-push
I had a problem when the registry pod is being raised by the upgrade-bundle command it gets permission denied creating the cache dir as a result of the opm registry add command he runs. so I made a workaround which is creating a new binary image for the make catalog-build that uses the opm index add command (can be found in the Makefile). I have created a Dockerfile that looks like this:
FROM quay.io/operator-framework/opm:latest USER 0
which forces the user to be root to avoid the permission denial.
built the image and pushed it to the registry with: docker build -t <harbor-fqdn/project/opm:tag> && docker push <harbor-fqdn/project/opm:tag>
For the make catalog-build command to let me put my own binary image I have slightly edited the Makefile with the following snippet:

ifneq ($(origin BINARY_IMG), undefined)
BINARY_IMG_OPT := --binary-image $(BINARY_IMG)
endif

Also changed the catalog-build endpoint from:

.PHONY: catalog-build
catalog-build: opm ## Build a catalog image.
    $(OPM) index add --container-tool docker --mode semver --tag $(CATALOG_IMG) --bundles $(BUNDLE_IMGS) $(FROM_INDEX_OPT)

To:

.PHONY: catalog-build
catalog-build: opm ## Build a catalog image.
    $(OPM) index add --container-tool docker --mode semver --tag $(CATALOG_IMG) --bundles $(BUNDLE_IMGS) $(FROM_INDEX_OPT) $(BINARY_IMG_OPT)

export the binary image and the catalog image names: export CATALOG_IMG=<harbor-fqdn/project/index-catalog:v0.0.1>
export BINARY_IMG=<harbor-fqdn/project/opm:tag> and build the catalog image using: make catalog-build catalog-push
Create CatalogSource CR using the catalog image we just built with the following YAML:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: test-catalog
  namespace: openshift-operators
spec:
  image: <harbor-fqdn/project/index-catalog:v0.0.1>
  displayName: Test_Catalog
  sourceType: grpc

Install the operator by creating a Subscription or via the Openshift UI
build v0.0.2 of the controller image export IMG=<harbor-fqdn/project/repo:v0.0.2>
make docker-build docker-push
generate new bundle version using make bundle VERSION=0.0.2
build new bundle image using: export BUNDLE_IMG=<harbor-fqdn/project/repo-bundle:v0.0.2>
make bundle-build bundle-push
upgrade the bundle operator-sdk run bundle-upgrade <harbor-fqdn/project/repo-bundle:v0.0.2> --skip-tls-verify --skip-tls --timeout 15m0s
Hope this will help you reproduce my situation, looking forward for your response!
Thank you.

openshift-bot · 2023-03-02T01:00:53Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-04-01T08:30:52Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2023-05-02T00:00:55Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2023-05-02T00:00:59Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

varshaprasad96 added the triage/support Indicates an issue that is a support question. label Nov 28, 2022

varshaprasad96 added this to the Backlog milestone Nov 28, 2022

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 1, 2023

openshift-ci bot closed this as completed May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to bundle-upgrade using operator-sdk cli tool #6204

Unable to bundle-upgrade using operator-sdk cli tool #6204

talsharon48 commented Nov 27, 2022

varshaprasad96 commented Nov 28, 2022

talsharon48 commented Nov 29, 2022

everettraven commented Nov 29, 2022

talsharon48 commented Nov 30, 2022 •

edited

Loading

everettraven commented Nov 30, 2022

talsharon48 commented Dec 1, 2022 •

edited

Loading

openshift-bot commented Mar 2, 2023

openshift-bot commented Apr 1, 2023

openshift-bot commented May 2, 2023

openshift-ci bot commented May 2, 2023

Unable to bundle-upgrade using operator-sdk cli tool #6204

Unable to bundle-upgrade using operator-sdk cli tool #6204

Comments

talsharon48 commented Nov 27, 2022

Bug Report

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

varshaprasad96 commented Nov 28, 2022

talsharon48 commented Nov 29, 2022

everettraven commented Nov 29, 2022

talsharon48 commented Nov 30, 2022 • edited Loading

everettraven commented Nov 30, 2022

talsharon48 commented Dec 1, 2022 • edited Loading

openshift-bot commented Mar 2, 2023

openshift-bot commented Apr 1, 2023

openshift-bot commented May 2, 2023

openshift-ci bot commented May 2, 2023

talsharon48 commented Nov 30, 2022 •

edited

Loading

talsharon48 commented Dec 1, 2022 •

edited

Loading