Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Integrate App with Extension reconciler #625

Merged

Conversation

varshaprasad96
Copy link
Member

@varshaprasad96 varshaprasad96 commented Feb 9, 2024

Description

This PR introduces support for installing App CR with extension reconciler.

Each of the TODOs below, will have a separate issue created:

  • Add e2e tests for extension reconciler
  • Remove the use of resolver, and instead work on an alternative that identifies if the same bundle is present in the namespace or not and if cluster scoped APIs already exist on cluster.
  • Improve status reporting based on App's statuses.
  • Introduce support for registryV1 bundles through YTT templating mechanism.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2024
@varshaprasad96
Copy link
Member Author

/hold

Based on top of #598

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 21, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2024
@varshaprasad96 varshaprasad96 marked this pull request as ready for review February 26, 2024 23:02
@varshaprasad96 varshaprasad96 requested a review from a team as a code owner February 26, 2024 23:02
@varshaprasad96 varshaprasad96 changed the title WIP: Integrate App with Extension reconciler ✨ Integrate App with Extension reconciler Feb 26, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@varshaprasad96 varshaprasad96 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2024
@varshaprasad96 varshaprasad96 force-pushed the extension-kapp branch 2 times, most recently from 51fa95f to 2c32a98 Compare February 26, 2024 23:32
Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 4.86726% with 215 lines in your changes are missing coverage. Please review.

Project coverage is 64.10%. Comparing base (d51d908) to head (744c54c).

Files Patch % Lines
internal/controllers/extension_controller.go 0.00% 207 Missing ⚠️
cmd/manager/main.go 57.89% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #625       +/-   ##
===========================================
- Coverage   78.91%   64.10%   -14.82%     
===========================================
  Files          22       22               
  Lines        1148     1368      +220     
===========================================
- Hits          906      877       -29     
- Misses        192      441      +249     
  Partials       50       50               
Flag Coverage Δ
e2e 47.44% <4.86%> (-8.23%) ⬇️
unit 58.41% <0.00%> (-14.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a bunch of resolution code that I don't think we'll want in the long run.

Instead of bickering over it in this PR, what do you think about going ahead with a simple (as simple as you want it for now) filter/sort/pick highest approach?

I'd propose that in this PR, the simple filtering could just be "give me the highest version for the package is that is in the specified channel and range".

And then we could follow-up with a separate PR to handle the various upgrade concerns.

cmd/manager/main.go Outdated Show resolved Hide resolved
internal/controllers/extension_controller.go Outdated Show resolved Hide resolved
internal/controllers/extension_controller.go Outdated Show resolved Hide resolved
Comment on lines 157 to 159
// TODO: Instead of using a SAT solver and generating variables based on installed and
// available bundles, work on a filtering mechanism to evaluate if the bundle and its
// APIs are installed on a cluster or not. Feature tracked in a separate issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to evaluate if the bundle and its APIs are already installed on the cluster.

For now, the "resolution" without deppy would be:

  1. Get package, channel, bundle info from catalogd for ext.Spec.Source.Package.Name
  2. Set candidates to all the bundles in the package.
  3. If ext.Spec.Source.Package.Channel is set, remove candidates unless they have an entries[].name in the named channel.
  4. If ext.Spec.Source.Package.Version is set, remove candidates unless their version is in the specified range.
  5. If an App already exists for the Extension, remove candidates that are not the existing installed bundle or a successor of the existing installed bundle. (Ideally this would also account for UpgradeConstraintPolicy and the ForceSemver feature gate).
  6. If no candidates remain, "resolution" fails
  7. Sort the remaining candidates by semver, and pick the highest.

So this is all just "grab from the catalog(s) and filter based on what's in the spec and what is installed"

Feel free to capture in a follow-up!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've captured this for a follow up #654 and done basic filtering based on version, package and channel. Meanwhile, looking into the code, realized that we do have helpers in place to filter out catalog bundles based on relevant info from extension that currently deppy uses to create variables. We can re-use them to make filtering cleaner.

internal/controllers/extension_controller.go Show resolved Hide resolved
internal/controllers/extension_controller.go Outdated Show resolved Hide resolved
@varshaprasad96
Copy link
Member Author

Surprised that rukpak is breaking the tilt tests, which is not related to this PR at all.

Copy link
Contributor

@ankitathomas ankitathomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a couple of nits, looks good!

internal/controllers/extension_controller.go Show resolved Hide resolved
internal/controllers/extension_controller.go Show resolved Hide resolved
gallettilance
gallettilance previously approved these changes Feb 29, 2024
Copy link
Member

@gallettilance gallettilance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Seems like a great jumping off point! Just to sanity check I understood things properly. It looks like:

  1. If the kapp apis are installed after olm is, olm won't be aware of them unless you restart the pod manually
  2. We don't have a good way of knowing if the kapp apis will be reconciled
  3. The only thing this extension reconciler does is
    • install plain bundles
    • map these manifests to an app cr
    • relay status of app cr back to extension api

Let me know if I misunderstood anything / if there's more I didn't see.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 29, 2024
@varshaprasad96
Copy link
Member Author

varshaprasad96 commented Feb 29, 2024

@gallettilance yes, you're right. Adding a few more thoughts:

If the kapp apis are installed after olm is, olm won't be aware of them unless you restart the pod manually

The intention is to have kapp-controller setup along with relevant APIs during OLM installation itself. This is just an additional check just to ensure if kapp APIs are in place before we run the reconcile - since an independent installation is also possible and these are external APIs.

We don't have a good way of knowing if the kapp apis will be reconciled

Could you elaborate more on this? If you meat the successful reconcile for an App CR - yeah, for now we are just relaying a ReconcileSuccessful status, but there is a follow up issue created to track other statuses - like reconcile failed, deleting etc.

The only thing this extension reconciler does is
install plain bundles
map these manifests to an app cr
relay status of app cr back to extension api

This is mostly right, except that extension reconciler does not directly install bundles, instead it creates an App CR which triggers the kapp-controller to unpack and install bundles. Over all the flow looks like:

  1. User creates an Extension.
  2. Extension reconciler
    2.1. does resolution based on the extension spec and the info from the catalog. Sets appropriate resolution status.
    2.2. Extension reconciler creates an App CR - checks if an App CR with the similar spec exists and patches accordingly based on diff.
    2.3 When the App CR is created - it triggers the kapp-ctrl (which already has watches set up on App API) to unpack and install bundle content.
    2.4 Based on the status present in the App CR, it sets a relevant one in Extension.
    2.5 Evaluates the deprecation status.

Comment on lines +195 to +196
// originally Reason: ocv1alpha1.ReasonInstallationFailed
ext.Status.InstalledBundleResource = ""
setInstalledStatusConditionFailed(&ext.Status.Conditions, err.Error(), ext.GetGeneration())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a previous App is installed? It seems like we should be able to keep this status reflecting the truth of what's actually installed? This can be a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If existing app with the same spec exists, we will be returning with a nil error after doing the equality check: https://github.com/operator-framework/operator-controller/pull/625/files#diff-c04822d94b6cb8ff190859213d75fa6cd5c8b7c5284aaa2cf93cd504f727f5a9R363. After which we map the status of App we return from ensureApp - which is going to be either the existing one or the patched one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we have a new spec (e.g. we're doing an upgrade), and we patch the App, does the App status have a dual indication of:

  • Still have this old thing active
  • Progressing on rolling out the new thing

It would be nice, for example, that if something failed during one of (fetch, template, preflight) we could tell Extension users that the previous version is still installed, but this upgrade needs some attention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could tell Extension users that the previous version is still installed, but this upgrade needs some attention.

Upgrade scenario needs to be verified with kapp-controller. I think the App status still reflects "Reconciling", I don't see anything specific related to it in their status. Will create a follow up to look into it.

This commit introduces support for creating App CR
in extension controller.

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2024
Copy link

openshift-ci bot commented Mar 1, 2024

New changes are detected. LGTM label has been removed.

@varshaprasad96 varshaprasad96 force-pushed the extension-kapp branch 2 times, most recently from 66bfed6 to 744c54c Compare March 1, 2024 15:38
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @varshaprasad96! Thanks!

Did you ever figure out what was going on with that tilt job? I wonder if there's something that's just broken about it because of the passage of time and some non-pinned configuration of something?

@varshaprasad96
Copy link
Member Author

Did you ever figure out what was going on with that tilt job? I wonder if there's something that's just broken about it because of the passage of time and some non-pinned configuration of something?

operator-framework/rukpak#830 should fix it. Though I think we need a new release, because Tiltfile is configured to pull in a release (from go mod) rather than the latest commit from main.

@varshaprasad96 varshaprasad96 added this pull request to the merge queue Mar 1, 2024
Merged via the queue into operator-framework:main with commit 92a87d2 Mar 1, 2024
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants