Backport CI fixes to 2.7 #6623

thbkrkr · 2023-03-29T17:07:42Z

Backport the following commits to 2.7:

…6529) Signed-off-by: Michael Montgomery <mmontg1@gmail.com>

This fixes several bugs in the publish-dockerhub.sh script. * The credentials stored in the docker-registry-elastic secret cannot push to the eck-dev namespace of the Elastic Docker registry, only to eck-ci. That is the reason of the error server message: insufficient_scope: authorization failed. * The test to check if docker buildx should be installed is inverted. * The docker buildx version v0.8.2 does not support multiple repositories, we need to install at least v0.9.0 to have it. * The URL to install docker buildx is wrong, it is the one for arm instead of amd. * The docker login command doesn't have the name of the registry, required in dry run mode to be authenticated to the Elastic Docker registry. * The secret to use in dry-run mode to get the docker registry creds is inverted with the one to use in live mode.

This adds a go pipeline generator for running e2e-tests in Buildkite. This allows to: - centralize the definition of all nightly e2e tests (see .buildkite/e2e/nightly-main-matrix.yaml) - support triggering any e2e tests from a PR comment - avoid duplications in the buildkite pipeline - remove the dependency on `.ci/setenconfig` The generator can take `stdin` or command-line flags as arguments to define the e2e tests to run. It can display the result as a buildkite pipeline or an environment file to reproduce a single run locally. Mixing the two, it gives 3 modes: - nightly-main/merge-main/pr-commit e2e tests: `cat matrix.yml | pipeline-gen | buildkite-agent pipeline upload` - pr-comment e2e-tests: `pipeline-gen "$GITHUB_PR_TRIGGER_COMMENT_ARGS" | buildkite-agent pipeline upload` - dev: `pipeline-gen -o envfile -f p=gke,k=1.23,s=8.6.0,t=TestSmoke | tee ../../../.env` Tests suite runs can be organized in groups. Each group has an optional label, fixed environment variables shared by all the combinations in the group and mixed environment variables that are a list of environment variables for each combination. By just specifying environment variables, it can be verbose but this is very flexible. The pipeline generator doesn't validate uniqueness of tests combinations or verify the environment variables. A variable defined in the `fixed` field can be overridden in the `mixed` field. The only required variable for each combination is `E2E_PROVIDER`. ```yaml - label: stack fixed: E2E_PROVIDER: gke mixed: - E2E_STACK_VERSION: "8.6.1" - E2E_STACK_VERSION: "8.7.0-SNAPSHOT" BUILD_LICENSE_PUBKEY: dev - label: kind fixed: E2E_PROVIDER: kind TESTS_MATCH: TestSmoke mixed: - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.21.12@sha256:f316b33dd88f8196379f38feb80545ef3ed44d9197dca1bfd48bcb1583210207 - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.22.9@sha256:8135260b959dfe320206eb36b3aeda9cffcb262f4b44cda6b33f7bb73f453105 - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.22.9@sha256:8135260b959dfe320206eb36b3aeda9cffcb262f4b44cda6b33f7bb73f453105 DEPLOYER_KIND_IP_FAMILY: ipv6 ``` I've added 'p,k,s,t' shortcuts for the most used variables to make PR comment more usable. ``` buildkite test this -f p=gke,k=1.23,s=8.6.0,t=TestSmoke ``` This is the final piece to stop relying on `setenvconfig`. Its logic is now separated into 3 parts: - build vars used by steps in `.buildkite/build/setenv.sh` (introduced in elastic#6377) - tests runner vars are directly managed in the pipeline generator - deployer vars are in `.buildkite/scripts/test/set-deployer-config.sh` The only variable shared by the tests runner and the deployer is `CLUSTER_NAME` and is now generated by the pipeline generator instead of being set by us. There are two specificities when not in CI, `make (operator|e2e)-image` is used instead of Buildkite metadata to set the images to use, and test license secrets are written to disk instead of being held in memory. The generation and the upload of the JUnit XML report for each e2e tests run is enabled. Some commands to test: ```sh cat ../nightly-main-matrix.yaml | go run main.go | tee pipeline-nightly.yml echo '- label: "kind/TestSmoke" fixed: E2E_PROVIDER: kind TESTS_MATCH: TestSmoke ' | go run main.go | tee pipeline-pr.yml go run main.go -o envfile -f p=kind,t=TestSmoke ```

…lastic#6588) For EKS, we set our e2e storage class to use local volumes instead of depending on the default storage class that uses network storage because from k8s 1.23 network storage requires the installation of the Amazon EBS CSI driver and the deployer does not yet support this. See elastic#6515.

- Update kind to v0.17.0 and node images for this version (see https://github.com/kubernetes-sigs/kind/releases) - Add k8s v1.25.3 and v1.26.0 Indirected related to this: - Keep only GKE, updated to k8s v1.25 in the deployer plans

Looks for `BUILD_LICENSE_PUBKEY` in the env of the test to run (mixed and fixed env vars) instead of the current env to know if the operator needs to be suffixed.

This fixes an issue when creating a Tanzu cluster where the installer state directory is not persisted in the Azure storage container. - Use a predictable name for the installer state directory, so that it doesn't change between an upload and a download made in different containers - Use `az blob upload --recursive` instead of `azcopy blob sync` - Set workdir to home for az container to work with relative paths - Use the directory basename to get a relative path that works in the tanzu cli container and the az cli container

Currently the Helm charts are published before the operator images are published because it's faster. This is not a problem at the moment, but I think it's better and logical to publish Helm charts after the operator images. It avoids the very rare case where you update the helm repository during the release and end up with an operator pod that won't start because the image hasn't been published yet.

Remove last migrated Jenkins jobs, only keep the release job.

) - Quick fix the issue when an env var has spaces by stopping to use .env file in set-deployer-script.sh script. - Also switch to a comma separated list for the Go build tags as spaces separated list is deprecated. Current limitation: E2E_TAGS with multiple tags does not work with the pipeline-gen used with flags because the comma is already used to separate the k=v tuples in --mixed and --fixed.

This avoids having to maintain two almost similar Dockerfiles in two different places.

…ic#6603) This allows us to manage our pipelines ourselves in this repository instead of having to do PRs in another repository. --------- Co-authored-by: Michael Montgomery <mmontg1@gmail.com>

* Remove failfast from the e2e tests. Signed-off-by: Michael Montgomery <mmontg1@gmail.com> Co-authored-by: Michael Morello <michael.morello@gmail.com> Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com> Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>

* Remove disabling of agent e2e tests because of 6331. Signed-off-by: Michael Montgomery <mmontg1@gmail.com>

This makes sure e2e tests reports are uploaded on failure with 2 changes: * the JSON go test results are uploaded regardless of the status of the make e2e-run command * a new step is run to download all test results and convert them to JUnit XML reports JUnit XML reports are uploaded temporarily waiting we build a Buildkite annotation to summarize the test failures.

This adds a `build.message` to trigger in one call the release of all (eck-operator and eck-resources) Helm charts, and enables the automatic release of all Helm charts for final tags.

naemono

Wow! There sure were a lot of changes here recently.

naemono and others added 19 commits March 29, 2023 19:01

Use cache mount in dockerfile to speed up "go mod download" (elastic#…

f3f68cf

…6529) Signed-off-by: Michael Montgomery <mmontg1@gmail.com>

Update build badge status (elastic#6580)

477e24b

Set kind gcp agents machine type to n1-standard-16 (elastic#6586)

698b78c

Test k8s v1.25.3 and v1.26.0 on kind (elastic#6578)

d7409c4

- Update kind to v0.17.0 and node images for this version (see https://github.com/kubernetes-sigs/kind/releases) - Add k8s v1.25.3 and v1.26.0 Indirected related to this: - Keep only GKE, updated to k8s v1.25 in the deployer plans

[ci/pipeline-gen] Lookup test env vars not current env (elastic#6590)

86318dc

Looks for `BUILD_LICENSE_PUBKEY` in the env of the test to run (mixed and fixed env vars) instead of the current env to know if the operator needs to be suffixed.

[ci] Test OpenShift 4.12 (elastic#6592)

66df9cd

Remove migrated Jenkins jobs (elastic#6595)

0001cfe

Remove last migrated Jenkins jobs, only keep the release job.

Use Buildkite agent image for ci container on GCP agent (elastic#6604)

8526734

This avoids having to maintain two almost similar Dockerfiles in two different places.

[ci] Add Backstage configuration to manage Buildkite pipelines (elast…

f6de729

…ic#6603) This allows us to manage our pipelines ourselves in this repository instead of having to do PRs in another repository. --------- Co-authored-by: Michael Montgomery <mmontg1@gmail.com>

[e2e] Remove disabling of Agent e2e tests because of 6331 (elastic#6611)

abe2165

* Remove disabling of agent e2e tests because of 6331. Signed-off-by: Michael Montgomery <mmontg1@gmail.com>

[ci] Adjust condition to release Helm charts (elastic#6620)

952b4e2

This adds a `build.message` to trigger in one call the release of all (eck-operator and eck-resources) Helm charts, and enables the automatic release of all Helm charts for final tags.

thbkrkr added backport For backport PRs v2.7.0 labels Mar 29, 2023

botelastic bot added the triage label Mar 29, 2023

thbkrkr added >bug Something isn't working >enhancement Enhancement of existing functionality and removed triage labels Mar 29, 2023

naemono approved these changes Mar 29, 2023

View reviewed changes

thbkrkr merged commit 0ef8d5e into elastic:2.7 Mar 29, 2023

thbkrkr deleted the backport-ci-to-2.7 branch April 4, 2023 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport CI fixes to 2.7 #6623

Backport CI fixes to 2.7 #6623

thbkrkr commented Mar 29, 2023

naemono left a comment

Backport CI fixes to 2.7 #6623

Backport CI fixes to 2.7 #6623

Conversation

thbkrkr commented Mar 29, 2023

naemono left a comment

Choose a reason for hiding this comment