Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume the reusable workflows from upbound/uptest@standard-runners #1180

Merged
merged 2 commits into from
Mar 12, 2024

Conversation

ulucinar
Copy link
Collaborator

@ulucinar ulucinar commented Feb 28, 2024

Description of your changes

Relevant PR: upbound/official-providers-ci#184

This PR replaces the larger runners we are using for the:

  • e2e, publish-service-artifacts workflows
  • lint, publish-artifacts jobs
    to the standard ubuntu-22.04 workflow runners. This will also allow us to check whether any CI jobs need larger runners in this repo.

Unfortunately, the e2e workflow (uptest) running on the standard runner cannot be tested with this PR. I'm planning to test the e2e workflow on a separate repository. We have switched to the family providers in uptest runs, so I hope there's no need for the larger runners for uptest runs. This also depends on the time given to the e2e workflow and the resource providers needed to run a test.

Another test we need to do is the publish-service-artifacts workflow that we use to build the family packages and push them to the Upbound package registry. Building and pushing a larger number of provider packages from a single job requires more resources on the runner. Another parameter affecting the compute resources we consume on the workflow runners is the package size and recently, we have reduced the resource provider package size considerably by removing the Terraform CLI and Terraform provider. Although we have control over how many provider packages will be processed per CI child action, increasing the child action count increases the load on the package registry, as these children are currently run in parallel. We could run these child actions sequentially if needed, at the expense of increased release build & push time. We would like to experiment on these parameters using the standard runners once this PR is merged.

Another job of interest is the local-deploy job as it builds and deploys the monolith on a kind cluster. The monolith is heavy on the CPU and the memory the API server process consumes. We will be able to test it with this PR.

This PR also enables the Cleanup Disk step for the jobs in e2e & CI workflows.

Btw, the lint job is already failing on the larger Ubuntu-Jumbo-Runner workflow runner because currently it consumes ~40GB of memory. So it should fail with the smaller standard runners that we are switching to with this PR. We are working on this. This PR is for checking the rest of the CI jobs on the standard runners.

I have:

  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Tested via the following CI run: https://github.com/upbound/provider-aws/actions/runs/8248588907/job/22559239426?pr=1180

@ulucinar
Copy link
Collaborator Author

So, from the output of https://github.com/upbound/provider-aws/actions/runs/8084771313, the remaining unknowns are the publish-service-artifacts and the e2e workflows. We also need to work on the lint job, which cannot even fit on a Ubuntu-Jumbo-Runner runner as of now.

@ulucinar
Copy link
Collaborator Author

ulucinar commented Feb 28, 2024

A relatively heavyweight uptest is running here, with 4 providers installed:

  • The AWS provider family config package
  • 3 resource providers: provider-aws-{networkfirewall, ec2, s3},
    with ec2 being the largest (in terms of CRDs installed) AWS resource provider.

Update: This uptest run has failed due to insufficient disk space while installing the s3 resource provider. The other 3 provider packages were successfully installed:
image

This could imply we will hit similar issues for the publish-service-artifacts job...

@ulucinar
Copy link
Collaborator Author

An uptest run for the Cluster.eks example manifest also failed with a disk space error here:
image

@ulucinar
Copy link
Collaborator Author

Looks like running the jlumbroso/free-disk-space action in the beginning of the e2e workflow helps with the disk space issue. Please see the following uptest runs:

@ulucinar
Copy link
Collaborator Author

Looks like the jlumbroso/free-disk-space action also helped with the publish-artifacts job, which was previously failing without it:
https://github.com/ulucinar/upbound-provider-aws/actions/runs/8086192235/job/22095397427?pr=40

- This will also allow us to check whether any CI jobs need larger
  runners in the official provider repositories.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Copy link
Collaborator

@sergenyalcin sergenyalcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulucinar LGTM!

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
@ulucinar ulucinar merged commit 72b444f into crossplane-contrib:main Mar 12, 2024
11 checks passed
@ulucinar ulucinar deleted the no-large-runners branch March 12, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants