✨ Add upgrade E2E #1003

m1kola · 2024-07-02T12:15:08Z

Description

Testing upgrade from the latest release to the current commit.

Fixes #856

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2024-07-02T12:15:25Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`0cc4c2e`
🔍 Latest deploy log	https://app.netlify.com/sites/olmv1/deploys/668569651e73ab00082ae182
😎 Deploy Preview	https://deploy-preview-1003--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

codecov · 2024-07-02T12:21:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.19%. Comparing base (ceba614) to head (0cc4c2e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1003   +/-   ##
=======================================
  Coverage   77.19%   77.19%           
=======================================
  Files          17       17           
  Lines        1206     1206           
=======================================
  Hits          931      931           
  Misses        193      193           
  Partials       82       82

Flag	Coverage Δ
e2e	`56.54% <ø> (ø)`
unit	`51.90% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Testing upgrade from the latest release to the current commit Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>

tmshort · 2024-07-03T18:43:35Z

My general question is... do we want to use bash scripts for this, or do we want to use golang?

joelanford · 2024-07-03T20:09:39Z

Makefile

+	./hack/pre-upgrade-setup.sh $(CATALOG_IMG) $(TEST_CLUSTER_CATALOG_NAME) $(TEST_CLUSTER_EXTENSION_NAME)
+
+.PHONY: post-upgrade-checks
+post-upgrade-checks:


Can we also run the standard e2e after an upgrade as well?

We were considering this, but decided not to because:

It will be an equivalent of running E2E on the current commit and we are doing it anyway in a separate E2E job.

It will increase feedback time and has potential to add noise to the signal (e.g. upgrade was successful, but post-upgrade E2E test flaked) - you have to re-test and wait again.

joelanford · 2024-07-03T20:21:41Z

hack/post-upgrade-checks.sh

+
+kubectl wait --for=condition=Available --timeout=60s -n olmv1-system deployment --all
+kubectl wait --for=condition=Unpacked --timeout=60s ClusterCatalog $TEST_CLUSTER_CATALOG_NAME
+kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME


+1 on checking that the existing ClusterCatalog and ClusterExtension remain Unpacked and Installed. But two questions:

My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.

I think we should probably do a few extra tasks post-upgrade
a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.

You are right, the new deployment might be ready (first check), but it is likely that we perform ClusterCatalog and ClusterExtension checks before the manger picks them up for reconciling.

Any ideas how in happy scenario we can check that say ClusterExtension was reconciled by a new version?

If ClusterExtension is Installed before upgrade and Installed after upgrade then there will be no change in the resource I think. E.g. lastTransitionTime on conditions will stay the same.

I think we should probably do a few extra tasks post-upgrade
a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Why do you think we need "b"?

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

This sounds like a regular E2E scenario for catalogd. Not sure that I understand the need for this as part of upgrade test. Could you please expand on this (ideally on the relevant catalogd issue).

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Note to self: lastTransitionTime won't change even if we bump a version. But we can check observedGeneration.

joelanford · 2024-07-03T20:25:26Z

Do we want to use bash scripts for this, or do we want to use golang?

This is a good question. I don't necessarily think its an either-or thing. I could see something like:

sh <installPreviousRelease>
sh <setupPreUpgradeState>
sh <upgradeIt>
go test ./test/upgrade-e2e/...

joelanford · 2024-07-03T20:28:02Z

Another interesting upgrade question: At what point do we need to actually define an upgrade process? An upgrade is generally:

Create objects that are net new in the new release
Update objects that are changed from the old release to the new release
Delete objects that were present in the old release, but not the new release

We don't do step 3 right now, right? Is doing step 3 a prereq for valid upgrade testing?

m1kola · 2024-07-04T13:48:00Z

Another interesting upgrade question: At what point do we need to actually define an upgrade process? An upgrade is generally:

Create objects that are net new in the new release

Update objects that are changed from the old release to the new release

Delete objects that were present in the old release, but not the new release

We don't do step 3 right now, right? Is doing step 3 a prereq for valid upgrade testing?

We were talking about this with @ankitathomas. It sounds a bit like we are testing something that doesn't exist at this moment (the upgrade process).

We decided not to open a can of worms for now and assume that the upgrade process is just:

For operator-controller - run the install script on top of what already exists
For catalogd - apply manifests on top of what already exists

This should be enough short-term: it should signal when someone adds something breaking to manifests/install script.

But long term we probably want to define upgrade process and probably create tool for that. E.g. we will likely need to some migration tool where it is possible to clean up old in-cluster objects from previous releases. We can use OpenShift's Cluster Version Operator and how it handles deletions.

But IMO - this deserves its own epic. What do you think? Should we put this on hold and define upgrade process & create necessary tools first? Or should we proceed with this naive approach of just applying things on top?

Do we want to use bash scripts for this, or do we want to use golang?

This is a good question. I don't necessarily think its an either-or thing. I could see something like:
sh <installPreviousRelease>
sh <setupPreUpgradeState>
sh <upgradeIt>
go test ./test/upgrade-e2e/...

I decided to keep bash in the draft for now. We can switch to Go when/if we decide to do something more sophisticated here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2024

m1kola force-pushed the upgrade-e2e-skeleton branch 3 times, most recently from adc912f to 90c602f Compare July 2, 2024 14:56

Add upgrade E2E

0cc4c2e

Testing upgrade from the latest release to the current commit Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>

m1kola force-pushed the upgrade-e2e-skeleton branch from 90c602f to 0cc4c2e Compare July 3, 2024 15:08

joelanford reviewed Jul 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add upgrade E2E #1003

✨ Add upgrade E2E #1003

m1kola commented Jul 2, 2024 •

edited

Loading

netlify bot commented Jul 2, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

tmshort commented Jul 3, 2024

joelanford Jul 3, 2024

m1kola Jul 4, 2024

joelanford Jul 3, 2024

m1kola Jul 4, 2024

m1kola Jul 4, 2024

joelanford commented Jul 3, 2024

joelanford commented Jul 3, 2024

m1kola commented Jul 4, 2024

✨ Add upgrade E2E #1003

Are you sure you want to change the base?

✨ Add upgrade E2E #1003

Conversation

m1kola commented Jul 2, 2024 • edited Loading

Description

Reviewer Checklist

netlify bot commented Jul 2, 2024 • edited Loading

✅ Deploy Preview for olmv1 ready!

codecov bot commented Jul 2, 2024 • edited Loading

Codecov Report

tmshort commented Jul 3, 2024

joelanford Jul 3, 2024

Choose a reason for hiding this comment

m1kola Jul 4, 2024

Choose a reason for hiding this comment

joelanford Jul 3, 2024

Choose a reason for hiding this comment

m1kola Jul 4, 2024

Choose a reason for hiding this comment

m1kola Jul 4, 2024

Choose a reason for hiding this comment

joelanford commented Jul 3, 2024

joelanford commented Jul 3, 2024

m1kola commented Jul 4, 2024

m1kola commented Jul 2, 2024 •

edited

Loading

netlify bot commented Jul 2, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading