Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport CI fixes to 2.7 #6623

Merged
merged 19 commits into from
Mar 29, 2023
Merged

Backport CI fixes to 2.7 #6623

merged 19 commits into from
Mar 29, 2023

Conversation

naemono and others added 19 commits March 29, 2023 19:01
…6529)

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
This fixes several bugs in the publish-dockerhub.sh script.

* The credentials stored in the docker-registry-elastic secret cannot push to the eck-dev namespace of the Elastic Docker registry, only to eck-ci. That is the reason of the error server message: insufficient_scope: authorization failed.
* The test to check if docker buildx should be installed is inverted.
* The docker buildx version v0.8.2 does not support multiple repositories, we need to install at least v0.9.0 to have it.
* The URL to install docker buildx is wrong, it is the one for arm instead of amd.
* The docker login command doesn't have the name of the registry, required in dry run mode to be authenticated to the Elastic Docker registry.
* The secret to use in dry-run mode to get the docker registry creds is inverted with the one to use in live mode.
This adds a go pipeline generator for running e2e-tests in Buildkite.

This allows to:
- centralize the definition of all nightly e2e tests (see .buildkite/e2e/nightly-main-matrix.yaml)
- support triggering any e2e tests from a PR comment
- avoid duplications in the buildkite pipeline
- remove the dependency on `.ci/setenconfig`

The generator can take `stdin` or command-line flags as arguments to define the e2e tests to run.
It can display the result as a buildkite pipeline or an environment file to reproduce a single run locally.

Mixing the two, it gives 3 modes:
- nightly-main/merge-main/pr-commit e2e tests: `cat matrix.yml | pipeline-gen | buildkite-agent pipeline upload`
- pr-comment e2e-tests: `pipeline-gen "$GITHUB_PR_TRIGGER_COMMENT_ARGS" | buildkite-agent pipeline upload`
- dev: `pipeline-gen -o envfile -f p=gke,k=1.23,s=8.6.0,t=TestSmoke | tee ../../../.env`

Tests suite runs can be organized in groups. Each group has an optional label, fixed environment variables shared by all the combinations in the group and mixed environment variables that are a list of environment variables for each combination. By just specifying environment variables, it can be verbose but this is very flexible. The pipeline generator doesn't validate uniqueness of tests combinations or verify the environment variables. A variable defined in the `fixed` field can be overridden in the `mixed` field. The only required variable for each combination is `E2E_PROVIDER`.

```yaml
- label: stack
  fixed:
    E2E_PROVIDER: gke
  mixed:
    - E2E_STACK_VERSION: "8.6.1"
    - E2E_STACK_VERSION: "8.7.0-SNAPSHOT"
      BUILD_LICENSE_PUBKEY: dev

- label: kind
  fixed:
    E2E_PROVIDER: kind
    TESTS_MATCH: TestSmoke
  mixed:
    - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.21.12@sha256:f316b33dd88f8196379f38feb80545ef3ed44d9197dca1bfd48bcb1583210207
    - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.22.9@sha256:8135260b959dfe320206eb36b3aeda9cffcb262f4b44cda6b33f7bb73f453105
    - DEPLOYER_KIND_NODE_IMAGE: kindest/node:v1.22.9@sha256:8135260b959dfe320206eb36b3aeda9cffcb262f4b44cda6b33f7bb73f453105
      DEPLOYER_KIND_IP_FAMILY: ipv6
```

I've added 'p,k,s,t' shortcuts for the most used variables to make PR comment more usable.

```
buildkite test this -f p=gke,k=1.23,s=8.6.0,t=TestSmoke
```

This is the final piece to stop relying on `setenvconfig`. Its logic is now separated into 3 parts:
- build vars used by steps in `.buildkite/build/setenv.sh` (introduced in elastic#6377)
- tests runner vars are directly managed in the pipeline generator
- deployer vars are in `.buildkite/scripts/test/set-deployer-config.sh`

The only variable shared by the tests runner and the deployer is `CLUSTER_NAME` and is now generated by the pipeline generator instead of being set by us.

There are two specificities when not in CI, `make (operator|e2e)-image` is used instead of Buildkite metadata to set the images to use, and test license secrets are written to disk instead of being held in memory.

The generation and the upload of the JUnit XML report for each e2e tests run is enabled.

Some commands to test:
```sh
cat ../nightly-main-matrix.yaml | go run main.go | tee pipeline-nightly.yml

echo '- label: "kind/TestSmoke"
  fixed:
    E2E_PROVIDER: kind
    TESTS_MATCH: TestSmoke
' | go run main.go | tee pipeline-pr.yml

go run main.go -o envfile -f p=kind,t=TestSmoke
```
…lastic#6588)

For EKS, we set our e2e storage class to use local volumes instead of depending on the default storage class that uses
network storage because from k8s 1.23 network storage requires the installation of the Amazon EBS CSI driver and the
deployer does not yet support this. See elastic#6515.
- Update kind to v0.17.0 and node images for this version (see https://github.com/kubernetes-sigs/kind/releases)
- Add k8s v1.25.3 and v1.26.0
Indirected related to this:
- Keep only GKE, updated to k8s v1.25 in the deployer plans
Looks for `BUILD_LICENSE_PUBKEY` in the env of the test to run (mixed and fixed env vars) instead of the current env to know if the operator needs to be suffixed.
This fixes an issue when creating a Tanzu cluster where the installer state directory is not persisted in the Azure storage container.

- Use a predictable name for the installer state directory, so that it doesn't change between an upload and a download made in different containers
- Use `az blob upload --recursive` instead of `azcopy blob sync`
- Set workdir to home for az container to work with relative paths
- Use the directory basename to get a relative path that works in the tanzu cli container and the az cli container
Currently the Helm charts are published before the operator images are published because it's faster. This is not a problem at the moment, but I think it's better and logical to publish Helm charts after the operator images. It avoids the very rare case where you update the helm repository during the release and end up with an operator pod that won't start because the image hasn't been published yet.
Remove last migrated Jenkins jobs, only keep the release job.
)

- Quick fix the issue when an env var has spaces by stopping to use .env file in set-deployer-script.sh script.
- Also switch to a comma separated list for the Go build tags as spaces separated list is deprecated.

Current limitation: E2E_TAGS with multiple tags does not work with the pipeline-gen used with flags because the comma is already used to separate the k=v tuples in --mixed and --fixed.
This avoids having to maintain two almost similar Dockerfiles in two different places.
…ic#6603)

This allows us to manage our pipelines ourselves in this repository instead of having to do PRs in another repository.

---------
Co-authored-by: Michael Montgomery <mmontg1@gmail.com>
* Remove failfast from the e2e tests.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
* Remove disabling of agent e2e tests because of 6331.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
This makes sure e2e tests reports are uploaded on failure with 2 changes:
* the JSON go test results are uploaded regardless of the status of the make e2e-run command
* a new step is run to download all test results and convert them to JUnit XML reports
JUnit XML reports are uploaded temporarily waiting we build a Buildkite annotation to summarize the test failures.
This adds a `build.message` to trigger in one call the release of all (eck-operator and eck-resources) Helm charts,
and enables the automatic release of all Helm charts for final tags.
@thbkrkr thbkrkr added backport For backport PRs v2.7.0 labels Mar 29, 2023
@botelastic botelastic bot added the triage label Mar 29, 2023
@thbkrkr thbkrkr added >bug Something isn't working >enhancement Enhancement of existing functionality and removed triage labels Mar 29, 2023
Copy link
Contributor

@naemono naemono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! There sure were a lot of changes here recently.

@thbkrkr thbkrkr merged commit 0ef8d5e into elastic:2.7 Mar 29, 2023
@thbkrkr thbkrkr deleted the backport-ci-to-2.7 branch April 4, 2023 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport For backport PRs >bug Something isn't working >enhancement Enhancement of existing functionality v2.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants