(chore): Refactor `CatalogSource` controller #30

everettraven · 2023-04-11T19:48:09Z

Description
Cleans up the controllers by:

Removing the unused Package and BundleMetadata controllers
Refactoring the CatalogSource controller to follow a similar approach to rukpak and operator-controller for consistency
Renames a couple functions to more accurately convey what they do
Adds comments to almost all functions (left out reconcile()) to help new contributors be able to more easily go through and understand what each function is doing

Motivation

resolves Refactor CatalogSource controller #5

pkg/controllers/core/catalogsource_controller.go

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

pkg/controllers/core/catalogsource_controller.go

anik120

First round.

anik120 · 2023-04-12T15:11:15Z

pkg/controllers/core/catalogsource_controller.go

+}
+
+func (r *CatalogSourceReconciler) reconcile(ctx context.Context, catalogSource *corev1beta1.CatalogSource) (ctrl.Result, error) {
+	job, err := r.ensureUnpackJob(ctx, catalogSource)


When I first create a CatalogSource, the reconciler creates the unpack job, creates the package/BM CRs

Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.

After a few reconciliation loops, if the job's been GCed, then ensureUnpackJob will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.

Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.

After a few reconciliation loops, if the job's been GCed, then ensureUnpackJob will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.

You are correct that this is the case and something we need to consider for the future. This PR doesn't actually change the original process that was followed. IMO making these changes are outside the scope of this PR and should be captured in another issue. I'm not sure off the top of my head what the solution would look like (maybe a check for ready status == true or a label/annotation before running the unpack logic?).

Agree with @everettraven. Also, a good chunk of the brainstorming here is rendered moot by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.

A similar problem will exist though: Can we skip syncing the apiserver storage if it is already up-to-date. We should write an general issue for this.

by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.

We won't be creating CRs, but we'll still be creating some objects? Eg I think the logic for creating the package manifests today are encoded in the packageserver but we should probably have the aggregated-apiservice only be responsible for administering the objects, instead of creating them, so the catalogd controller would still be creating them.

pkg/controllers/core/catalogsource_controller.go

anik120 · 2023-04-12T15:27:15Z

pkg/controllers/core/catalogsource_controller.go

+	if jobCopy.Status.CompletionTime != nil {
+		// Loop through the conditions and pull out all the conditions
+		conds := map[batchv1.JobConditionType]batchv1.JobCondition{}
+		for _, cond := range jobCopy.Status.Conditions {
+			conds[cond.Type] = cond
+		}
+
+		// Check for signs of failure first. If any of the below
+		// conditions have a status of True return an error
+		failConds := []batchv1.JobConditionType{
+			batchv1.JobFailed,
+			batchv1.JobFailureTarget,
+			batchv1.JobSuspended,
+		}
+		for _, failCond := range failConds {
+			if cond, ok := conds[failCond]; ok {
+				if cond.Status == v1.ConditionTrue {
+					return false, fmt.Errorf("unpack job has condition %q with a status of %q", failCond, v1.ConditionTrue)
+				}
+			}
+		}
+
+		// No failures so ensure the job has a completed status
+		// TODO: This is probably redundant. Check to see if we can just return true if we made it here
+		if cond, ok := conds[batchv1.JobComplete]; ok {
+			if cond.Status == v1.ConditionTrue {
+				return true, nil
+			}
+		}
+	}
+
+	return false, nil


if job.Status.CompletionTime != nil { for _, cond := range jobCopy.Status.Conditions { if cond.Status == v1.ConditionTrue{ if cond.Type == batchv1.JobComplete { return true, nil }else { return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status) } } } } return false, nil

I could be wrong here, but my understanding was that there could be multiple statuses existing at the same time. For example a Job could have both the batchv1.JobFailed and batchv1.JobComplete statuses. That being said I do think this implementation could be improved to be something like:

if job.Status.CompletionTime != nil { for _, cond := range jobCopy.Status.Conditions { if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete { return false, nil } } return true, nil } return false, nil

Done in f5f99c5

for _, cond := range jobCopy.Status.Conditions { if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete { return false, nil } }

gotta return the error instead of nil

return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status)

🤦 - good catch

anik120 · 2023-04-12T15:31:45Z

pkg/controllers/core/catalogsource_controller.go

+// updateStatusError will update the CatalogSource.Status.Conditions
+// to have the condition Type "Ready" with a Status of "False" and a Reason
+// of "UnpackError". This function is used to signal that a CatalogSource
+// is in an error state and that catalog contents are not available on cluster
 func updateStatusError(catalogSource *corev1beta1.CatalogSource, err error) {


If this is the purpose of the function feels like it should be renamed to updateStatusUnpackError instead of the generic updateStatusError, which should instead have the signature of updateStatusError(reason Reason, catalogSource *corev1beta1.CatalogSource)

This function was implemented as part of #17. Since #6 targets better error handling and status writing I think this is a bit outside the scope of this PR and would be better addressed by the PR that handles fixing #6. That being said, if there are strong feelings that it should be done as part of this PR I am okay with that - I'm just trying to keep the scope/overlap as small as possible

anik120 · 2023-04-12T15:34:30Z

pkg/controllers/core/catalogsource_controller.go

-func (r *CatalogSourceReconciler) buildBundleMetadata(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource corev1beta1.CatalogSource) error {
+// createBundleMetadata will create a `BundleMetadata` resource for each
+// "olm.bundle" object that exists for the given catalog contents. Returns an
+// error if any are encountered.


We should probably not return an error in the first sign of trouble and abandon operation. Returning an aggregated list of errors is probably the way to go here. (this'll be useful when we talk about changing the child resources to something else too)

I don't disagree. IMO this is better left to be addressed by the fix to #6 but if there are strong feelings towards doing it in this PR then that is fine by me. I'm just trying to be conscious of keeping the scope/overlap as small as possible

anik120 · 2023-04-12T15:35:25Z

pkg/controllers/core/catalogsource_controller.go

+// "olm.package" object that exists for the given catalog contents.
+// `Package.Spec.Channels` is populated by filtering all "olm.channel" objects
+// where the "packageName" == `Package.Name`. Returns an error if any are encountered.
+func (r *CatalogSourceReconciler) createPackages(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource *corev1beta1.CatalogSource) error {


Same comment here, regarding the error being returned in the first sign of trouble.

See #30 (comment)

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

anik120 · 2023-04-12T20:41:53Z

One last comment here otherwise it's cool to address the other concerns with #6 and #35

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven commented Apr 11, 2023

View reviewed changes

pkg/controllers/core/catalogsource_controller.go Show resolved Hide resolved

everettraven added 2 commits April 11, 2023 17:02

refactor/cleanup controller logic

1ea8464

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

revert how pod logs are read

5cb17af

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven force-pushed the refactor/controller-cleanup branch from 710132b to 5cb17af Compare April 11, 2023 21:02

joelanford reviewed Apr 11, 2023

View reviewed changes

pkg/controllers/core/catalogsource_controller.go Show resolved Hide resolved

everettraven mentioned this pull request Apr 12, 2023

Update CatalogSource controller logging #33

Open

anik120 requested changes Apr 12, 2023

View reviewed changes

address review comments

f5f99c5

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven changed the title ~~(chore): controller cleanup~~ (chore): Refactor CatalogSource controller Apr 12, 2023

joelanford approved these changes Apr 12, 2023

View reviewed changes

everettraven mentioned this pull request Apr 12, 2023

Don't unpack again if unchanged #35

Closed

return error instead of nil

6df3957

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

anik120 approved these changes Apr 12, 2023

View reviewed changes

everettraven merged commit 5053412 into operator-framework:main Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(chore): Refactor `CatalogSource` controller #30

(chore): Refactor `CatalogSource` controller #30

everettraven commented Apr 11, 2023

anik120 left a comment

anik120 Apr 12, 2023

everettraven Apr 12, 2023

joelanford Apr 12, 2023

everettraven Apr 12, 2023

anik120 Apr 12, 2023

anik120 Apr 12, 2023

everettraven Apr 12, 2023

everettraven Apr 12, 2023

anik120 Apr 12, 2023

everettraven Apr 12, 2023

everettraven Apr 12, 2023

anik120 Apr 12, 2023

everettraven Apr 12, 2023

anik120 Apr 12, 2023

everettraven Apr 12, 2023

anik120 Apr 12, 2023

everettraven Apr 12, 2023

anik120 commented Apr 12, 2023

(chore): Refactor CatalogSource controller #30

(chore): Refactor CatalogSource controller #30

Conversation

everettraven commented Apr 11, 2023

anik120 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anik120 commented Apr 12, 2023

(chore): Refactor `CatalogSource` controller #30

(chore): Refactor `CatalogSource` controller #30