Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(chore): Refactor CatalogSource controller #30

Merged

Conversation

everettraven
Copy link
Collaborator

Description
Cleans up the controllers by:

  • Removing the unused Package and BundleMetadata controllers
  • Refactoring the CatalogSource controller to follow a similar approach to rukpak and operator-controller for consistency
  • Renames a couple functions to more accurately convey what they do
  • Adds comments to almost all functions (left out reconcile()) to help new contributors be able to more easily go through and understand what each function is doing

Motivation

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Copy link
Collaborator

@anik120 anik120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round.

}

func (r *CatalogSourceReconciler) reconcile(ctx context.Context, catalogSource *corev1beta1.CatalogSource) (ctrl.Result, error) {
job, err := r.ensureUnpackJob(ctx, catalogSource)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When I first create a CatalogSource, the reconciler creates the unpack job, creates the package/BM CRs
  • Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.
  • After a few reconciliation loops, if the job's been GCed, then ensureUnpackJob will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Next reconciliation, unpack job is detected, so I get the unpack job back, but still proceed to check if the unpack job is completed + create the BM/package CRs again? Seems benign from "what the user sees" perspective, but it's just going to keep eating up unnecessary cpu cycles.
  • After a few reconciliation loops, if the job's been GCed, then ensureUnpackJob will unpack the same index image again, and then proceed to reattempt to create the child resources again. Now this is eating up unnecessary network bandwidth AND cpu cycles.

You are correct that this is the case and something we need to consider for the future. This PR doesn't actually change the original process that was followed. IMO making these changes are outside the scope of this PR and should be captured in another issue. I'm not sure off the top of my head what the solution would look like (maybe a check for ready status == true or a label/annotation before running the unpack logic?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @everettraven. Also, a good chunk of the brainstorming here is rendered moot by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.

A similar problem will exist though: Can we skip syncing the apiserver storage if it is already up-to-date. We should write an general issue for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the likely switchover to an aggregated apiserver. We'll no longer be creating/updating/deleting CRs for the underlying catalog metadata.

We won't be creating CRs, but we'll still be creating some objects? Eg I think the logic for creating the package manifests today are encoded in the packageserver but we should probably have the aggregated-apiservice only be responsible for administering the objects, instead of creating them, so the catalogd controller would still be creating them.

pkg/controllers/core/catalogsource_controller.go Outdated Show resolved Hide resolved
Comment on lines 193 to 224
if jobCopy.Status.CompletionTime != nil {
// Loop through the conditions and pull out all the conditions
conds := map[batchv1.JobConditionType]batchv1.JobCondition{}
for _, cond := range jobCopy.Status.Conditions {
conds[cond.Type] = cond
}

// Check for signs of failure first. If any of the below
// conditions have a status of True return an error
failConds := []batchv1.JobConditionType{
batchv1.JobFailed,
batchv1.JobFailureTarget,
batchv1.JobSuspended,
}
for _, failCond := range failConds {
if cond, ok := conds[failCond]; ok {
if cond.Status == v1.ConditionTrue {
return false, fmt.Errorf("unpack job has condition %q with a status of %q", failCond, v1.ConditionTrue)
}
}
}

// No failures so ensure the job has a completed status
// TODO: This is probably redundant. Check to see if we can just return true if we made it here
if cond, ok := conds[batchv1.JobComplete]; ok {
if cond.Status == v1.ConditionTrue {
return true, nil
}
}
}

return false, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if job.Status.CompletionTime != nil {
  for _, cond := range jobCopy.Status.Conditions {
     if cond.Status == v1.ConditionTrue{
          if cond.Type == batchv1.JobComplete {
                 return true, nil
          }else {
                 return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status)
          }
     }
  }

}
return false, nil 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong here, but my understanding was that there could be multiple statuses existing at the same time. For example a Job could have both the batchv1.JobFailed and batchv1.JobComplete statuses. That being said I do think this implementation could be improved to be something like:

if job.Status.CompletionTime != nil {
  for _, cond := range jobCopy.Status.Conditions {
     if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete {
         return false, nil
     }
  }
  return true, nil
}
return false, nil 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f5f99c5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for _, cond := range jobCopy.Status.Conditions {
     if cond.Status == v1.ConditionTrue && cond.Type != batchv1.JobComplete {
         return false, nil
     }
  }

gotta return the error instead of nil

return false, fmt.Errorf("unpack job has condition %q with a status of %q", cond.Type, cond.Status)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 - good catch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +240 to 244
// updateStatusError will update the CatalogSource.Status.Conditions
// to have the condition Type "Ready" with a Status of "False" and a Reason
// of "UnpackError". This function is used to signal that a CatalogSource
// is in an error state and that catalog contents are not available on cluster
func updateStatusError(catalogSource *corev1beta1.CatalogSource, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the purpose of the function feels like it should be renamed to updateStatusUnpackError instead of the generic updateStatusError, which should instead have the signature of updateStatusError(reason Reason, catalogSource *corev1beta1.CatalogSource)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was implemented as part of #17. Since #6 targets better error handling and status writing I think this is a bit outside the scope of this PR and would be better addressed by the PR that handles fixing #6. That being said, if there are strong feelings that it should be done as part of this PR I am okay with that - I'm just trying to keep the scope/overlap as small as possible

func (r *CatalogSourceReconciler) buildBundleMetadata(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource corev1beta1.CatalogSource) error {
// createBundleMetadata will create a `BundleMetadata` resource for each
// "olm.bundle" object that exists for the given catalog contents. Returns an
// error if any are encountered.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably not return an error in the first sign of trouble and abandon operation. Returning an aggregated list of errors is probably the way to go here. (this'll be useful when we talk about changing the child resources to something else too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree. IMO this is better left to be addressed by the fix to #6 but if there are strong feelings towards doing it in this PR then that is fine by me. I'm just trying to be conscious of keeping the scope/overlap as small as possible

// "olm.package" object that exists for the given catalog contents.
// `Package.Spec.Channels` is populated by filtering all "olm.channel" objects
// where the "packageName" == `Package.Name`. Returns an error if any are encountered.
func (r *CatalogSourceReconciler) createPackages(ctx context.Context, declCfg *declcfg.DeclarativeConfig, catalogSource *corev1beta1.CatalogSource) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, regarding the error being returned in the first sign of trouble.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven everettraven changed the title (chore): controller cleanup (chore): Refactor CatalogSource controller Apr 12, 2023
@anik120
Copy link
Collaborator

anik120 commented Apr 12, 2023

One last comment here otherwise it's cool to address the other concerns with #6 and #35

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven everettraven merged commit 5053412 into operator-framework:main Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor CatalogSource controller
3 participants