Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add upgrade E2E #1003

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ jobs:
go-version-file: go.mod

- name: Run the extension developer e2e test
run: |
make extension-developer-e2e
run: make extension-developer-e2e

e2e-kind:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -48,3 +47,15 @@ jobs:
files: e2e-cover.out
flags: e2e
token: ${{ secrets.CODECOV_TOKEN }}

upgrade-e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version-file: go.mod

- name: Run the upgrade e2e test
run: make test-upgrade-e2e
18 changes: 18 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,24 @@ extension-developer-e2e: KUSTOMIZE_BUILD_DIR := config/overlays/cert-manager
extension-developer-e2e: KIND_CLUSTER_NAME := operator-controller-ext-dev-e2e #EXHELP Run extension-developer e2e on local kind cluster
extension-developer-e2e: run image-registry test-ext-dev-e2e kind-clean

.PHONY: run-latest-release
run-latest-release:
curl -L -s https://github.com/operator-framework/operator-controller/releases/latest/download/install.sh | bash -s

.PHONY: pre-upgrade-setup
pre-upgrade-setup:
./hack/pre-upgrade-setup.sh $(CATALOG_IMG) $(TEST_CLUSTER_CATALOG_NAME) $(TEST_CLUSTER_EXTENSION_NAME)

.PHONY: post-upgrade-checks
post-upgrade-checks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also run the standard e2e after an upgrade as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were considering this, but decided not to because:

  1. It will be an equivalent of running E2E on the current commit and we are doing it anyway in a separate E2E job.
  2. It will increase feedback time and has potential to add noise to the signal (e.g. upgrade was successful, but post-upgrade E2E test flaked) - you have to re-test and wait again.

./hack/post-upgrade-checks.sh $(TEST_CLUSTER_CATALOG_NAME) $(TEST_CLUSTER_EXTENSION_NAME)

.PHONY: test-upgrade-e2e
test-upgrade-e2e: KIND_CLUSTER_NAME := operator-controller-upgrade-e2e
test-upgrade-e2e: TEST_CLUSTER_CATALOG_NAME := test-catalog
test-upgrade-e2e: TEST_CLUSTER_EXTENSION_NAME := test-package
test-upgrade-e2e: kind-cluster run-latest-release image-registry build-push-e2e-catalog registry-load-bundles pre-upgrade-setup docker-build kind-load kind-deploy post-upgrade-checks kind-clean #HELP Run upgrade e2e tests on a local kind cluster

.PHONY: e2e-coverage
e2e-coverage:
COVERAGE_OUTPUT=./e2e-cover.out ./hack/e2e-coverage.sh
Expand Down
36 changes: 36 additions & 0 deletions hack/post-upgrade-checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

set -euo pipefail

help="post-upgrade-checks.sh is used to perform tests after upgrade from the previous release.

Usage:
post-upgrade-checks.sh [TEST_CLUSTER_CATALOG_NAME] [TEST_CLUSTER_EXTENSION_NAME]
"

if [[ "$#" -ne 2 ]]; then
echo "Illegal number of arguments passed"
echo "${help}"
exit 1
fi

function getObservedGeneration() {
kubectl get ClusterExtension $TEST_CLUSTER_EXTENSION_NAME -o=jsonpath='{.status.conditions[?(@.type=="Installed")].observedGeneration}'
}

TEST_CLUSTER_CATALOG_NAME=$1
TEST_CLUSTER_EXTENSION_NAME=$2

kubectl wait --for=condition=Unpacked --timeout=60s ClusterCatalog $TEST_CLUSTER_CATALOG_NAME
kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on checking that the existing ClusterCatalog and ClusterExtension remain Unpacked and Installed. But two questions:

  1. My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.
  2. I think we should probably do a few extra tasks post-upgrade
    a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
    b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My instinct is that these waits might not actually check anything. We've already waited for them to be Unpacked and Installed prior to the upgrade, so they will still be Unpacked and Installed after the upgrade at least until catalogd and operator-controller have reconciled them. Seems like we have a race condition here where these commands will execute before the upgraded reconcilers have finished reconciling them.

You are right, the new deployment might be ready (first check), but it is likely that we perform ClusterCatalog and ClusterExtension checks before the manger picks them up for reconciling.

Any ideas how in happy scenario we can check that say ClusterExtension was reconciled by a new version?

If ClusterExtension is Installed before upgrade and Installed after upgrade then there will be no change in the resource I think. E.g. lastTransitionTime on conditions will stay the same.

  1. I think we should probably do a few extra tasks post-upgrade
    a. Change the ClusterExtension to specify a broader version range so that it finds and upgrades to a new version from the existing catalog
    b. Add a new ClusterExtension that installs another package from the existing ClusterCatalog

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Why do you think we need "b"?

Also, potentially out-of-scope in this PR, but in the catalogd-specific upgrade tests, I think we'll want a scenario where the catalog image referenced by a ClusterCatalog receives an update after the upgrade, which would ensure that our polling logic doesn't break. That scenario may be relevant in operator-controller because we expect catalog updates to trigger reconciles (and potentially upgrade) ClusterExtension objects.

This sounds like a regular E2E scenario for catalogd. Not sure that I understand the need for this as part of upgrade test. Could you please expand on this (ideally on the relevant catalogd issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to use "a" to trigger reconciliation? A workaround to my above question?

Note to self: lastTransitionTime won't change even if we bump a version. But we can check observedGeneration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the scripts to initially install version 1.0.0 and then upgrade to 1.0.1 & added a check to make sure that the reconciliation actually works by checking installed contain again & its observedGeneration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be better if we could

  1. Wait for reconcile of the pre-upgrade state of the ClusterExtension before we modify it.
  2. Make some assertions
  3. Change the ClusterExtension
  4. Make more assertions

To do (1), we can check operator-controller logs, which should have enough info to key off of.



oldObservedGeneration=$(getObservedGeneration)
kubectl patch ClusterExtension $TEST_CLUSTER_EXTENSION_NAME --type='merge' -p '{"spec": {"version": "1.0.1"}}'
kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME
newObservedGeneration=$(getObservedGeneration)

if [[ "$oldObservedGeneration" == "$newObservedGeneration" ]]; then
echo "Expected ClusterExtension ($TEST_CLUSTER_EXTENSION_NAME) to have new value in observedGeneration of the Installed condition, but it remained unchanged - $newObservedGeneration"
exit 1
fi
49 changes: 49 additions & 0 deletions hack/pre-upgrade-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

set -euo pipefail

help="pre-upgrade-setup.sh is used to create some basic resources
which will later be used in upgrade testing.

Usage:
post-upgrade-checks.sh [TEST_CATALOG_IMG] [TEST_CATALOG_NAME] [TEST_CLUSTER_EXTENSION_NAME]
"

if [[ "$#" -ne 3 ]]; then
echo "Illegal number of arguments passed"
echo "${help}"
exit 1
fi

TEST_CATALOG_IMG=$1
TEST_CLUSTER_CATALOG_NAME=$2
TEST_CLUSTER_EXTENSION_NAME=$3

kubectl apply -f - << EOF
apiVersion: catalogd.operatorframework.io/v1alpha1
kind: ClusterCatalog
metadata:
name: ${TEST_CLUSTER_CATALOG_NAME}
spec:
source:
type: image
image:
ref: ${TEST_CATALOG_IMG}
pollInterval: 24h
insecureSkipTLSVerify: true
EOF


kubectl apply -f - << EOF
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
name: ${TEST_CLUSTER_EXTENSION_NAME}
spec:
installNamespace: default
packageName: prometheus
version: 1.0.0
EOF

kubectl wait --for=condition=Unpacked --timeout=60s ClusterCatalog $TEST_CLUSTER_CATALOG_NAME
kubectl wait --for=condition=Installed --timeout=60s ClusterExtension $TEST_CLUSTER_EXTENSION_NAME
Loading