resolve operators on catalog availability change #216

ankitathomas · 2023-05-16T20:06:42Z

This PR triggers resolution whenever the available catalogsources on cluster change. Resolution is triggered for all operators on cluster when there is any change to a catalogsource.

See: #205

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

controllers/catalog_predicates.go

joelanford · 2023-05-16T21:04:10Z

controllers/catalog_predicates.go

+	catalogReady, err := isCatalogReady(e.Object)
+	if err != nil {
+		fmt.Println(err)
+		return false
+	}


If the catalog is not ready, is that something we would want to reflect in the status of the Operator (either now, or in the future)?

I think I'm of the opinion that we should just blindly forward along all CatalogSource events and deal with the state of it during reconciliation where we have a change to touch the Operator status if necessary.

WDYT?

Depending on whether we have a caching layer for catalog contents at the operator controller level, we may not want to reflect a non-ready state for catalogs at the operator-controller level. I don't think it makes sense to run the resolver to trigger on any catalog change especially if the catalog is not ready, but I agree that we can have that check moved further down the resolution process. We'd need to verify that none of our sources are stale/non-ready to avoid resolving unnecessarily.

controllers/catalog_predicates.go

joelanford · 2023-05-20T00:40:48Z

test/e2e/install_test.go

+			}
+		})
+
+		It("resolves again when a new catalog is available", func() {


I wonder if we can avoid all the long timeouts here by simplifying the test expectations. I think all we really want to assert is "whenever a catalog is created/updated/deleted, all of the existing operators are reconciled". Is there something that would show up in the Operator CR that we could use to verify that a reconcile happened after the catalog event?

That might be organized something like this:

Describe("reconcile Operators on catalog events") - BeforeEach() -> create 3 operators - AfterEach() -> delete 3 operators - When("catalog is created") - It("reconciles 3 operators") - When("catalog exists") - BeforeEach() -> create a catalog - AfterEach() -> delete the catalog - When("catalog is updated") - It("reconciles 3 operators") - When("catalog is deleted") - It("reconciles 3 operators")

Or maybe even this is too much to do in the e2e? Is there a way to test the controller and the watches without a full-blown e2e?

If so, we could verify all these scenarios there, and then just focus on the happy path in the e2e?

I'm honestly not sure what's the best option here, but I think we should really focus on keeping our overall e2e run as short (in terms of overall run time) as possible.

I agree that it would be ideal if we could test this outside of the e2e tests, but as far as I am aware there isn't a good way to test watches being triggered without actually starting the controller and performing operations against a cluster that would trigger those watches. I think it could be done with envtest but would likely involve a lot of stubbing out resources and would essentially boil down to the question of "do we prefer complex unit tests or longer running e2e tests?"

Using a minimal catalog cuts the wait down by a good chunk, so the e2e suite shouldn't be as long lived. I'm not sure if we'd want to expose resolution/reconcile details on the operator CR just for the sake of CI, in case it can be confusing to a user.

If we're concerned about adding to the status API, how about examining the logs. We control the logs, and the tests can examine them for particular log entries. It wouldn't necessarily be exposing the logs as an API since it would be for e2e/unit tests, which can be modified when the logs are.

The changes in this PR boil down to an updated controller configuration with a new watch that adds some extra stuff to the workqueue and calls a reconciler. So I'm suggesting it might be possible to isolate that specific behavior if we refactored a little bit (e.g. what if instead of SetupWithManager being a method on the reconciler, we made it a function that we could pass a reconciler to). In our main.go, we'd pass the real reconciler, but in a test, we'd pass a fake reconciler that does the "did I get triggered" assertions

Then we could put some envtest-based unit tests together alongside the reconciler and we would essentially split the testing into two chunks:

Does the controller configuration enqueue the expected reconcile requests?

Does the reconciler handle reconcile requests in the expected way?

We'd still want an e2e that runs through some scenarios of the integrated/real main.go, but we would avoid the combinatoric problem here, which would save lots of time in e2e runs.

how about examining the logs

Yeah +1 if the unit test path isn't taken. We do this in the SDK helm-operator e2e tests.

https://github.com/operator-framework/operator-sdk/blob/5347d9375658cab9117e7ac8f691d35bb9154ff9/test/e2e/helm/cluster_test.go#L125-L131

I'm opening another issue to track the refactoring #247. I don't think it makes sense to block this PR on the refactor, I'm in favor of having that in a separate PR.

controllers/operator_controller.go

tmshort

I think this generally looks good, but would like to see @joelanford's last comment addressed, whether any changes are made or not.

tmshort · 2023-05-26T18:55:43Z

Not sure how I clicked "requested changes" instead of "approve"!

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

joelanford · 2023-06-01T20:25:36Z

/approve

) * resolve operators on catalog availability change Signed-off-by: Ankita Thomas <ankithom@redhat.com> * removing catalog state check for operator reconcile Signed-off-by: Ankita Thomas <ankithom@redhat.com> * e2e tests for catalog watch Signed-off-by: Ankita Thomas <ankithom@redhat.com> * use smaller test index image, cleanup after tests Signed-off-by: Ankita Thomas <ankithom@redhat.com> --------- Signed-off-by: Ankita Thomas <ankithom@redhat.com>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023

ankitathomas force-pushed the reresolve-catsrc-update branch from 225d8b3 to 9eaf91f Compare May 16, 2023 20:07

awgreene reviewed May 16, 2023

View reviewed changes

controllers/catalog_predicates.go Outdated Show resolved Hide resolved

awgreene reviewed May 16, 2023

View reviewed changes

controllers/catalog_predicates.go Outdated Show resolved Hide resolved

joelanford reviewed May 16, 2023

View reviewed changes

controllers/catalog_predicates.go Outdated Show resolved Hide resolved

ankitathomas force-pushed the reresolve-catsrc-update branch 4 times, most recently from 467e627 to 555895f Compare May 19, 2023 18:30

joelanford reviewed May 20, 2023

View reviewed changes

everettraven reviewed May 22, 2023

View reviewed changes

controllers/operator_controller.go Outdated Show resolved Hide resolved

ankitathomas changed the title ~~WIP: resolve operators on catalog availability change~~ resolve operators on catalog availability change May 25, 2023

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2023

tmshort reviewed May 25, 2023

View reviewed changes

controllers/operator_controller.go Outdated Show resolved Hide resolved

tmshort reviewed May 25, 2023

View reviewed changes

controllers/operator_controller.go Outdated Show resolved Hide resolved

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2023

ankitathomas force-pushed the reresolve-catsrc-update branch 2 times, most recently from 4c372d1 to 9755169 Compare May 26, 2023 13:35

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2023

ankitathomas requested review from joelanford, tmshort and everettraven May 26, 2023 15:10

tmshort requested changes May 26, 2023

View reviewed changes

tmshort approved these changes May 26, 2023

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2023

ankitathomas added 3 commits June 1, 2023 11:59

resolve operators on catalog availability change

82a834a

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

removing catalog state check for operator reconcile

1544404

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

e2e tests for catalog watch

b688885

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

ankitathomas force-pushed the reresolve-catsrc-update branch from 9755169 to cd88cc7 Compare June 1, 2023 20:03

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2023

use smaller test index image, cleanup after tests

67caff4

Signed-off-by: Ankita Thomas <ankithom@redhat.com>

ankitathomas force-pushed the reresolve-catsrc-update branch from cd88cc7 to 67caff4 Compare June 1, 2023 20:05

joelanford mentioned this pull request Jun 1, 2023

refactor e2e for speed and clarity, add helpers #244

Closed

4 tasks

ankitathomas merged commit 25cdc88 into operator-framework:main Jun 2, 2023
5 checks passed

anik120 mentioned this pull request Jun 2, 2023

(e2e) Use custom built test catalog for e2e testing #250

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve operators on catalog availability change #216

resolve operators on catalog availability change #216

ankitathomas commented May 16, 2023 •

edited

Loading

joelanford May 16, 2023

ankitathomas May 17, 2023

joelanford May 20, 2023

joelanford May 20, 2023

everettraven May 22, 2023

ankitathomas May 26, 2023

tmshort May 26, 2023

joelanford May 26, 2023 •

edited

Loading

joelanford May 26, 2023 •

edited

Loading

ankitathomas Jun 1, 2023

tmshort left a comment

tmshort commented May 26, 2023

joelanford commented Jun 1, 2023

resolve operators on catalog availability change #216

resolve operators on catalog availability change #216

Conversation

ankitathomas commented May 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelanford May 26, 2023 • edited Loading

Choose a reason for hiding this comment

joelanford May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmshort left a comment

Choose a reason for hiding this comment

tmshort commented May 26, 2023

joelanford commented Jun 1, 2023

ankitathomas commented May 16, 2023 •

edited

Loading

joelanford May 26, 2023 •

edited

Loading

joelanford May 26, 2023 •

edited

Loading