Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve operators on catalog availability change #216

Merged

Conversation

ankitathomas
Copy link
Contributor

@ankitathomas ankitathomas commented May 16, 2023

This PR triggers resolution whenever the available catalogsources on cluster change. Resolution is triggered for all operators on cluster when there is any change to a catalogsource.

See: #205

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
Comment on lines 33 to 37
catalogReady, err := isCatalogReady(e.Object)
if err != nil {
fmt.Println(err)
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the catalog is not ready, is that something we would want to reflect in the status of the Operator (either now, or in the future)?

I think I'm of the opinion that we should just blindly forward along all CatalogSource events and deal with the state of it during reconciliation where we have a change to touch the Operator status if necessary.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on whether we have a caching layer for catalog contents at the operator controller level, we may not want to reflect a non-ready state for catalogs at the operator-controller level. I don't think it makes sense to run the resolver to trigger on any catalog change especially if the catalog is not ready, but I agree that we can have that check moved further down the resolution process. We'd need to verify that none of our sources are stale/non-ready to avoid resolving unnecessarily.

@ankitathomas ankitathomas force-pushed the reresolve-catsrc-update branch 4 times, most recently from 467e627 to 555895f Compare May 19, 2023 18:30
}
})

It("resolves again when a new catalog is available", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid all the long timeouts here by simplifying the test expectations. I think all we really want to assert is "whenever a catalog is created/updated/deleted, all of the existing operators are reconciled". Is there something that would show up in the Operator CR that we could use to verify that a reconcile happened after the catalog event?

That might be organized something like this:

Describe("reconcile Operators on catalog events")
   - BeforeEach() -> create 3 operators
   - AfterEach() -> delete 3 operators
   - When("catalog is created")
        - It("reconciles 3 operators")
   - When("catalog exists")
       - BeforeEach() -> create a catalog
       - AfterEach() -> delete the catalog
       - When("catalog is updated")
           - It("reconciles 3 operators")
       - When("catalog is deleted")
           - It("reconciles 3 operators")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even this is too much to do in the e2e? Is there a way to test the controller and the watches without a full-blown e2e?

If so, we could verify all these scenarios there, and then just focus on the happy path in the e2e?

I'm honestly not sure what's the best option here, but I think we should really focus on keeping our overall e2e run as short (in terms of overall run time) as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be ideal if we could test this outside of the e2e tests, but as far as I am aware there isn't a good way to test watches being triggered without actually starting the controller and performing operations against a cluster that would trigger those watches. I think it could be done with envtest but would likely involve a lot of stubbing out resources and would essentially boil down to the question of "do we prefer complex unit tests or longer running e2e tests?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a minimal catalog cuts the wait down by a good chunk, so the e2e suite shouldn't be as long lived. I'm not sure if we'd want to expose resolution/reconcile details on the operator CR just for the sake of CI, in case it can be confusing to a user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're concerned about adding to the status API, how about examining the logs. We control the logs, and the tests can examine them for particular log entries. It wouldn't necessarily be exposing the logs as an API since it would be for e2e/unit tests, which can be modified when the logs are.

Copy link
Member

@joelanford joelanford May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this PR boil down to an updated controller configuration with a new watch that adds some extra stuff to the workqueue and calls a reconciler. So I'm suggesting it might be possible to isolate that specific behavior if we refactored a little bit (e.g. what if instead of SetupWithManager being a method on the reconciler, we made it a function that we could pass a reconciler to). In our main.go, we'd pass the real reconciler, but in a test, we'd pass a fake reconciler that does the "did I get triggered" assertions

Then we could put some envtest-based unit tests together alongside the reconciler and we would essentially split the testing into two chunks:

  1. Does the controller configuration enqueue the expected reconcile requests?
  2. Does the reconciler handle reconcile requests in the expected way?

We'd still want an e2e that runs through some scenarios of the integrated/real main.go, but we would avoid the combinatoric problem here, which would save lots of time in e2e runs.

Copy link
Member

@joelanford joelanford May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about examining the logs

Yeah +1 if the unit test path isn't taken. We do this in the SDK helm-operator e2e tests.

https://github.com/operator-framework/operator-sdk/blob/5347d9375658cab9117e7ac8f691d35bb9154ff9/test/e2e/helm/cluster_test.go#L125-L131

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opening another issue to track the refactoring #247. I don't think it makes sense to block this PR on the refactor, I'm in favor of having that in a separate PR.

@ankitathomas ankitathomas changed the title WIP: resolve operators on catalog availability change resolve operators on catalog availability change May 25, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2023
@ankitathomas ankitathomas force-pushed the reresolve-catsrc-update branch 2 times, most recently from 4c372d1 to 9755169 Compare May 26, 2023 13:35
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2023
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good, but would like to see @joelanford's last comment addressed, whether any changes are made or not.

@tmshort
Copy link
Contributor

tmshort commented May 26, 2023

Not sure how I clicked "requested changes" instead of "approve"!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2023
Signed-off-by: Ankita Thomas <ankithom@redhat.com>
Signed-off-by: Ankita Thomas <ankithom@redhat.com>
Signed-off-by: Ankita Thomas <ankithom@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2023
Signed-off-by: Ankita Thomas <ankithom@redhat.com>
@joelanford
Copy link
Member

/approve

@ankitathomas ankitathomas merged commit 25cdc88 into operator-framework:main Jun 2, 2023
5 checks passed
awgreene pushed a commit to awgreene/operator-controller that referenced this pull request Oct 13, 2023
)

* resolve operators on catalog availability change

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

* removing catalog state check for operator reconcile

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

* e2e tests for catalog watch

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

* use smaller test index image, cleanup after tests

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

---------

Signed-off-by: Ankita Thomas <ankithom@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants