Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add upgrade E2E #1003

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m1kola
Copy link
Member

@m1kola m1kola commented Jul 2, 2024

Description

Testing upgrade from the latest release to the current commit.

Fixes #856

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2024
Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 0cc4c2e
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/668569651e73ab00082ae182
😎 Deploy Preview https://deploy-preview-1003--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.19%. Comparing base (ceba614) to head (0cc4c2e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1003   +/-   ##
=======================================
  Coverage   77.19%   77.19%           
=======================================
  Files          17       17           
  Lines        1206     1206           
=======================================
  Hits          931      931           
  Misses        193      193           
  Partials       82       82           
Flag Coverage Δ
e2e 56.54% <ø> (ø)
unit 51.90% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m1kola m1kola force-pushed the upgrade-e2e-skeleton branch 3 times, most recently from adc912f to 90c602f Compare July 2, 2024 14:56
Testing upgrade from the latest release to the current commit

Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>
@tmshort
Copy link
Contributor

tmshort commented Jul 3, 2024

My general question is... do we want to use bash scripts for this, or do we want to use golang?

./hack/pre-upgrade-setup.sh $(CATALOG_IMG) $(TEST_CLUSTER_CATALOG_NAME) $(TEST_CLUSTER_EXTENSION_NAME)

.PHONY: post-upgrade-checks
post-upgrade-checks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also run the standard e2e after an upgrade as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were considering this, but decided not to because:

  1. It will be an equivalent of running E2E on the current commit and we are doing it anyway in a separate E2E job.
  2. It will increase feedback time and has potential to add noise to the signal (e.g. upgrade was successful, but post-upgrade E2E test flaked) - you have to re-test and wait again.


kubectl wait --for=condition=Available --timeout=60s -n olmv1-system deployment --all
kubectl wait --for=condition=Unpacked --timeout=60s ClusterCatalog $TEST_CLUSTER_CATALOG_NAME
kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on checking that the existing ClusterCatalog and ClusterExtension remain Unpacked and Installed. But two questions:

  1. My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.
  2. I think we should probably do a few extra tasks post-upgrade
    a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
    b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.

You are right, the new deployment might be ready (first check), but it is likely that we perform ClusterCatalog and ClusterExtension checks before the manger picks them up for reconciling.

Any ideas how in happy scenario we can check that say ClusterExtension was reconciled by a new version?

If ClusterExtension is Installed before upgrade and Installed after upgrade then there will be no change in the resource I think. E.g. lastTransitionTime on conditions will stay the same.

  1. I think we should probably do a few extra tasks post-upgrade
    a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
    b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Why do you think we need "b"?

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

This sounds like a regular E2E scenario for catalogd. Not sure that I understand the need for this as part of upgrade test. Could you please expand on this (ideally on the relevant catalogd issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Note to self: lastTransitionTime won't change even if we bump a version. But we can check observedGeneration.

@joelanford
Copy link
Member

Do we want to use bash scripts for this, or do we want to use golang?

This is a good question. I don't necessarily think its an either-or thing. I could see something like:

sh <installPreviousRelease>
sh <setupPreUpgradeState>
sh <upgradeIt>
go test ./test/upgrade-e2e/...

@joelanford
Copy link
Member

Another interesting upgrade question: At what point do we need to actually define an upgrade process? An upgrade is generally:

  1. Create objects that are net new in the new release
  2. Update objects that are changed from the old release to the new release
  3. Delete objects that were present in the old release, but not the new release

We don't do step 3 right now, right? Is doing step 3 a prereq for valid upgrade testing?

@m1kola
Copy link
Member Author

m1kola commented Jul 4, 2024

Another interesting upgrade question: At what point do we need to actually define an upgrade process? An upgrade is generally:

  1. Create objects that are net new in the new release
  2. Update objects that are changed from the old release to the new release
  3. Delete objects that were present in the old release, but not the new release

We don't do step 3 right now, right? Is doing step 3 a prereq for valid upgrade testing?

We were talking about this with @ankitathomas. It sounds a bit like we are testing something that doesn't exist at this moment (the upgrade process).

We decided not to open a can of worms for now and assume that the upgrade process is just:

  • For operator-controller - run the install script on top of what already exists
  • For catalogd - apply manifests on top of what already exists

This should be enough short-term: it should signal when someone adds something breaking to manifests/install script.

But long term we probably want to define upgrade process and probably create tool for that. E.g. we will likely need to some migration tool where it is possible to clean up old in-cluster objects from previous releases. We can use OpenShift's Cluster Version Operator and how it handles deletions.

But IMO - this deserves its own epic. What do you think? Should we put this on hold and define upgrade process & create necessary tools first? Or should we proceed with this naive approach of just applying things on top?

Do we want to use bash scripts for this, or do we want to use golang?

This is a good question. I don't necessarily think its an either-or thing. I could see something like:

sh <installPreviousRelease>
sh <setupPreUpgradeState>
sh <upgradeIt>
go test ./test/upgrade-e2e/...

I decided to keep bash in the draft for now. We can switch to Go when/if we decide to do something more sophisticated here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[epic] Implement upgrade tests for OperatorController
3 participants