-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add upgrade E2E #1003
base: main
Are you sure you want to change the base?
✨ Add upgrade E2E #1003
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1003 +/- ##
=======================================
Coverage 77.19% 77.19%
=======================================
Files 17 17
Lines 1206 1206
=======================================
Hits 931 931
Misses 193 193
Partials 82 82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
adc912f
to
90c602f
Compare
Testing upgrade from the latest release to the current commit Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com>
90c602f
to
0cc4c2e
Compare
My general question is... do we want to use |
./hack/pre-upgrade-setup.sh $(CATALOG_IMG) $(TEST_CLUSTER_CATALOG_NAME) $(TEST_CLUSTER_EXTENSION_NAME) | ||
|
||
.PHONY: post-upgrade-checks | ||
post-upgrade-checks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also run the standard e2e after an upgrade as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were considering this, but decided not to because:
- It will be an equivalent of running E2E on the current commit and we are doing it anyway in a separate E2E job.
- It will increase feedback time and has potential to add noise to the signal (e.g. upgrade was successful, but post-upgrade E2E test flaked) - you have to re-test and wait again.
|
||
kubectl wait --for=condition=Available --timeout=60s -n olmv1-system deployment --all | ||
kubectl wait --for=condition=Unpacked --timeout=60s ClusterCatalog $TEST_CLUSTER_CATALOG_NAME | ||
kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on checking that the existing ClusterCatalog and ClusterExtension remain Unpacked and Installed. But two questions:
- My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.
- I think we should probably do a few extra tasks post-upgrade
a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog
Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.
You are right, the new deployment might be ready (first check), but it is likely that we perform ClusterCatalog
and ClusterExtension
checks before the manger picks them up for reconciling.
Any ideas how in happy scenario we can check that say ClusterExtension
was reconciled by a new version?
If ClusterExtension
is Installed
before upgrade and Installed
after upgrade then there will be no change in the resource I think. E.g. lastTransitionTime
on conditions will stay the same.
- I think we should probably do a few extra tasks post-upgrade
a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog
Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?
Why do you think we need "b"?
Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.
This sounds like a regular E2E scenario for catalogd. Not sure that I understand the need for this as part of upgrade test. Could you please expand on this (ideally on the relevant catalogd issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?
Note to self: lastTransitionTime
won't change even if we bump a version. But we can check observedGeneration
.
This is a good question. I don't necessarily think its an either-or thing. I could see something like: sh <installPreviousRelease>
sh <setupPreUpgradeState>
sh <upgradeIt>
go test ./test/upgrade-e2e/... |
Another interesting upgrade question: At what point do we need to actually define an upgrade process? An upgrade is generally:
We don't do step 3 right now, right? Is doing step 3 a prereq for valid upgrade testing? |
We were talking about this with @ankitathomas. It sounds a bit like we are testing something that doesn't exist at this moment (the upgrade process). We decided not to open a can of worms for now and assume that the upgrade process is just:
This should be enough short-term: it should signal when someone adds something breaking to manifests/install script. But long term we probably want to define upgrade process and probably create tool for that. E.g. we will likely need to some migration tool where it is possible to clean up old in-cluster objects from previous releases. We can use OpenShift's Cluster Version Operator and how it handles deletions. But IMO - this deserves its own epic. What do you think? Should we put this on hold and define upgrade process & create necessary tools first? Or should we proceed with this naive approach of just applying things on top?
I decided to keep bash in the draft for now. We can switch to Go when/if we decide to do something more sophisticated here. |
Description
Testing upgrade from the latest release to the current commit.
Fixes #856
Reviewer Checklist