Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeployableByOLM" err="<nil>" result=ERROR for open-access ibm-mq-operator 1.3.1 #844

Closed
tkrishtop opened this issue Nov 29, 2022 · 8 comments · Fixed by #847
Closed

DeployableByOLM" err="<nil>" result=ERROR for open-access ibm-mq-operator 1.3.1 #844

tkrishtop opened this issue Nov 29, 2022 · 8 comments · Fixed by #847
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tkrishtop
Copy link
Contributor

Bug Description

There is an unclear error produced by DeployableByOLM on an open-access IBM operator:

time="2022-11-28T21:14:20Z" level=debug msg="running check: DeployableByOLM"
time="2022-11-28T21:14:23Z" level=info msg="check completed: DeployableByOLM" err="<nil>" result=ERROR

Could we have a bit more logs in that situation?

Version and Command Invocation

preflight version: the same error on pr843 and 1.4.2

podman run --rm --privileged --security-opt=label=disable 
-e PFLT_JUNIT=true
-e PFLT_ARTIFACTS=/artifacts 
-e PFLT_LOGFILE=/artifacts/preflight.log 
-e PFLT_LOGLEVEL=trace 
-e DOCKER_CONFIG=/opt 
-e PFLT_DOCKERCONFIG=/opt/config.json 
-e KUBECONFIG=/kubeconfig 
-e PFLT_INDEXIMAGE=registry.dfwt5g.lab:4443/telcoci/preflight/disconnected-catalog@sha256:794292f20a050c6f628aafc0959de871c8527d23d2abaf9443fe3f0ce2b9765e -e PFLT_NAMESPACE=preflight-testing 
-e PFLT_SERVICEACCOUNT=default 
-e PFLT_SCORECARD_IMAGE=quay.io/operator-framework/scorecard-test@sha256:f9bb5c28e4c2b0aecab97f5f17efa5a39e310a9e2b9f6e2127f81a08c2f57749 
-v /tmp/preflight_tmp_dir.gv4tzvvl/kubeconfig:/kubeconfig 
-v /tmp/preflight_operator_artifacts.brormnsy:/artifacts
-v /tmp/preflight_tmp_dir.gv4tzvvl/config.json:/opt/config.json  
registry.dfwt5g.lab:4443/preflight/preflight:pr843 check operator 
docker.io/ibmcom/ibm-mq-operator-bundle@sha256:723e3abcf5f8e2eea1458289c35b67cdcef92c3c256d3f384d6e4275302f0a89

Expected Result

Get a bit more logs to understand why DeployableByOLM threw an error

Actual Result

DeployableByOLM" err="<nil>" result=ERROR

@tkrishtop tkrishtop added the kind/bug Categorizes issue or PR as related to a bug. label Nov 29, 2022
@tkrishtop
Copy link
Contributor Author

Adding more details because it seems to be a regression introduced between preflight 1.0.8 and preflight 1.1.0.

Here I used another open-access operator for the tests, it passed DeployableByOlm on preflight 1.0.8 and failed DeployableByOlm starting from preflight 1.1.0:

quay.io/rh-nfv-int/testpmd-operator:v0.2.9
index image: quay.io/rh-nfv-int/nfv-example-cnf-catalog:v0.2.9

Logs on 1.0.8 (DeployableByOlm passed):

time="2022-04-25T04:52:53Z" level=info msg="certification library version 1.0.8 <commit: f6eb6893c33c39f5a9ba9e46c0c217f3079dd3ac>"
-- snip --
time="2022-04-25T04:53:04Z" level=info msg="running check: DeployableByOLM"
time="2022-04-25T04:53:07Z" level=trace msg="reading annotations file from the bundle"
time="2022-04-25T04:53:07Z" level=debug msg="mounted directory is /tmp/preflight-653131518/fs"
time="2022-04-25T04:53:07Z" level=trace msg="searching for key 
-- snip --
time="2022-04-25T04:54:26Z" level=info msg="check completed: DeployableByOLM" result=PASSED

Logs on 1.1.0 (DeployableByOlm failed):

time="2022-04-26T02:29:18Z" level=info msg="certification library version 1.1.0+500bf9eb354cbf9bf7bf0d78eae4b12c6469396a <commit: 500bf9eb354cbf9bf7bf0d78eae4b12c6469396a>"
-- snip --
time="2022-04-26T02:29:29Z" level=debug msg="running check: DeployableByOLM"
time="2022-04-26T02:29:29Z" level=trace msg="reading annotations file from the bundle"
time="2022-04-26T02:29:29Z" level=debug msg="mounted directory is /tmp/preflight-287613090/fs"
time="2022-04-26T02:29:29Z" level=debug msg="Command being run: [operator-sdk bundle validate -b none --output json-alpha1 --select-optional name=community --select-optional name=operatorhub --verbose /tmp/preflight-287613090/fs]"
time="2022-04-26T02:29:29Z" level=info msg="check completed: DeployableByOLM" result=FAILED

Logs on 1.4.2 (DeployableByOlm failed):

time="2022-11-29T07:42:08Z" level=info msg="certification library version 1.4.2 <commit: f9cff772837132149df69f8ae251d3caf81c49ac>"
-- snip --
time="2022-11-29T07:42:18Z" level=debug msg="running check: DeployableByOLM"
time="2022-11-29T07:42:21Z" level=trace msg="reading annotations file from the bundle"
time="2022-11-29T07:42:21Z" level=debug msg="image extraction directory is /tmp/preflight-2534365746/fs"
time="2022-11-29T07:42:21Z" level=info msg="check completed: DeployableByOLM" err="<nil>" result=ERROR

@komish komish self-assigned this Dec 1, 2022
@komish
Copy link
Contributor

komish commented Dec 2, 2022

@tkrishtop thanks for reporting this!

The operator image you gave me there is the operator application container itself, but quay.io/rh-nfv-int/testpmd-operator-bundle:v0.2.9 is the bundle we're testing against for that operator. I was able to replicate the issue here with that bundle on preflight built from the main branch.

The core of the issue is that the bundle is malformed. The ValidateOperatorBundle check result is not included in your snippets, but I would guess that it is also failing in your execution, indicating that the service account in the manifests/ directory matches the service account value found in your cluster service version - and that this is invalid.

"errors": [
              "Error: Value testpmd-operator-controller-manager: invalid service account found in bundle. This service account testpmd-operator-controller-manager in your bundle is not valid, because a service account with the same name was already specified in your CSV. If this was unintentional, please remove the service account manifest from your bundle. If it was intentional to specify a separate service account, please rename the SA in either the bundle manifest or the CSV."
            ],

To test, I rebuilt the bundle with that service account manifest specifying a different service account and all tests passed without a problem. For reference, I've temporarily published that bundle here quay.io/komish/rh-nfv-int_testpmd-operator-bundle:v0.2.9. I'll probably clean it up from my Quay namespace at some arbitrary time in the future.

The ScorecardOlmSuiteCheck check throws an error as well for the same reason. I do find it odd that this is being listed as an error, and will look into why - but if I had to guess, it's because the scorecard results themselves report an error instead of a failure. I'll check our logic that causes a scorecard check value to get reported as an erroring check vs. a failing check, but there may not be much we can do about this.

With that said, DeployableByOLM still fails with, seemingly, no error. I'll look into the reasons why this is happening and see what I can do to fix it soon.


All of this relates to the testpmd-operator-bundle - but you initially reported this issue with another bundle that's not accessible to me. It's entirely possible that the case there may be different, but have a look and see if this may be the cause. If not, let us know and I'm happy to troubleshoot further - but we may need to work with you get a better look at the bundle in question.

Let us know.

@tkrishtop
Copy link
Contributor Author

Hi @komish, thank you for having a look!

Just a quick update before heading into the weekend:

you initially reported this issue with another bundle that's not accessible to me

The IBM operator is also an open-access one:

bundle_image: "docker.io/ibmcom/ibm-mq-operator-bundle@sha256:723e3abcf5f8e2eea1458289c35b67cdcef92c3c256d3f384d6e4275302f0a89" # v1.3.1

index_image: "docker.io/ibmcom/ibm-mq-operator-catalog:v1.3.1"

@komish
Copy link
Contributor

komish commented Dec 2, 2022

Awesome, thank you! I'll check the same with this bundle. Have a great weekend!

@komish
Copy link
Contributor

komish commented Dec 2, 2022

This bundle was also failing validation, and DeployableByOLM performs a validation as a preventative measure, but logs failure as an error because it's an executional failure and not a deployment failure. I've submitted a PR to fix the issue with the nil errors when the DeployableByOLM check passes, which should help provide more information here.

@tkrishtop
Copy link
Contributor Author

tkrishtop commented Dec 5, 2022

@komish, so starting from preflight 1.1.0, DeployableByOLM is not an independent check anymore, it will fail if ValidateOperatorBundle fails. Is that correct?

@komish
Copy link
Contributor

komish commented Dec 5, 2022

It is still an independent check, but because we perform a validation within DeployableByOLM to reduce the likelihood of a failure in deployment that is similar to Bundle Validation, a failure in bundle validation will probably cause a failure in DeployableByOLM.

In other words, if the bundle does not validate, deployable will probably fail very early.

@tkrishtop
Copy link
Contributor Author

In other words, if the bundle does not validate, deployable will probably fail very early.

Well, at least, it's not the case for ibm-mq-operator and testpmd-operator. These operators passed DeployableByOlm on preflight 1.0.8, that's why the issue was open.

Although the current situation looks logical and the error is now clear. Thank you for the fix, @komish! Going to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants