Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: automate performance benchmarking #185

Merged
merged 41 commits into from
Jun 5, 2023
Merged

feat: automate performance benchmarking #185

merged 41 commits into from
Jun 5, 2023

Conversation

galargh
Copy link
Contributor

@galargh galargh commented May 26, 2023

This is a continuation of mxinden#2

Diff since the initial review: galorgh/test-plans@perf-terraform...libp2p:test-plans:perf-automation

  • removed workflow_call trigger from the perf workflow (we can add it when we need it)
  • added git push to the perf workflow
  • removed s3 upload from the perf workflow
  • moved provisioning/deprovisioning from the Makefile to terraform config
  • made it possible to partially apply terraform (the available groups are common - IAM user + scale down lambda, region - VPC + launch template, ephemeral - EC2 instance)
  • changed the instructions for local setup back to the ones requiring only terraform apply + terraform destroy
  • created an override terraform config with remote terraform backend for long-lived infra
  • moved the lambda testing script to the directory where all the other lambda related files are
  • removed s3 from terraform
  • added instructions on how to move to ephemeral key pairs as a comment

I'll mark it as ready for review once I reply to the comments raised in the original PR.

.github/workflows/perf.yml Outdated Show resolved Hide resolved
perf/README.md Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
.github/workflows/perf.yml Outdated Show resolved Hide resolved
@galargh
Copy link
Contributor Author

galargh commented May 26, 2023

I think I now covered all the suggestions (including applying the ideas from Alternative approach 🙇) from mxinden#2. @mxinden let me know what you think :) Once we agree on the design, I'll test it again against IPDX's account (⚠️ WARN ⚠️ it's not tested yet).

@galargh galargh requested a review from mxinden May 26, 2023 09:50
@galargh galargh marked this pull request as ready for review May 26, 2023 09:50
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. Thank you for the follow-ups! Feel free to ignore the naming suggestions.

Once we agree on the design, I'll test it again against IPDX's account (warning WARN warning it's not tested yet).

Agreed on the design. Mind doing a test run?

perf/terraform/region/main.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/terraform.tf Outdated Show resolved Hide resolved
.github/workflows/perf.yml Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented May 29, 2023

Thanks for the follow-ups. Let me know once you want me to take another look.

@galargh
Copy link
Contributor Author

galargh commented May 29, 2023

I renamed the modules according to suggestions, made the SSH keys ephemeral (without automated cleanup), skipped archiving in the workflow unless push is disabled and tested the new setup against IPDX AWS account.

Here's an example workflow run: https://github.com/galorgh/test-plans/actions/runs/5111435405/jobs/9188390217

And here's a commit created by the automation: galorgh@d787a35

Let me know if there's anything else you'd want me to look at 😄

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two smaller comments. Otherwise this is ready from my side.

Feel free to merge into perf branch.

perf/terraform/configs/remote/terraform_override.tf Outdated Show resolved Hide resolved
perf/terraform/configs/local/terraform.tf Outdated Show resolved Hide resolved
Comment on lines 33 to 49
variable "ci_enabled" {
type = bool
description = "Whether or not to create resources required to automate the setup in CI (e.g. IAM user, cleanup Lambda)"
default = false
}

variable "long_lived_enabled" {
type = bool
description = "Whether or not to create long lived resources (in CI, used across runs; e.g. VPCs)"
default = true
}

variable "short_lived_enabled" {
type = bool
description = "Whether or not to create short lived resources (in CI, specific to each run; e.g. EC2 instances)"
default = true
}
Copy link
Member

@mxinden mxinden May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing with this, I am still a bit confused.

If I understand correctly the .github/workflows/perf workflow will run the perf/terraform/local config, see above. That config has long_lived_enabled and short_lived_enabled set to true. The latter makes sense. Why does it set the former? Shouldn't it use the long-lived resources created outside of the workflow?

Looking at it from a different perspective, don't we really need 3 configs?

  1. configs/local for local development with ci_enabled false, long_lived_enabled true, short_lived_enabled true and no S3 backend.
  2. configs/ci_long_lived run once outside of the GitHub workflow with ci_enabled true, long_lived_enabled true, short_lived_enabled false and S3 backend.
  3. configs/ci_short_lived run within the workflow on each run with ci_enabled false, long_lived_enabled false, short_lived_enabled true and no S3 backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very valid question because it's not immediately obvious why we have different configs in the first place. And the answer lies within terraform internals which we definitely shouldn't require users to know.

So the terraform specific thing I'm referring to here is that terraform backend (i.e. config of where the state is stored) cannot be defined dynamically. That's the reason why we need 2 separate configs - one where state is stored locally, and the other where the state is stored remotely (in this case, on S3). On the other hand, what modules are created during terraform apply can be configured dynamically through terraform variables. That's why we only need 2 configs.

So how are we running tf and where?

  1. We use config/local during local development. config/local defaults are set so that only long_lived and short_lived resources are created. Because of that, the user doesn't have to specify any additional input when doing terraform apply or destroy. The state is stored locally.
  2. We use config/remote when we want to do the initial set up of the resources required for the CI. By default, this config creates only ci and long_lived resources. Again, no additional input is required when we use this config. The state is stored remotely, in a S3 bucket.
  3. We use config/local when we want to start instances for CI runs. We use config/local because we want the state to be stored locally but config/local, by default, creates long_lived and short_lived resources. This is too much so we have to override the defaults. This is fine because we don't put the burden of overriding defaults on users. Instead, we just configure it once in the workflow YAML via TF_VAR_* env vars, see
    TF_VAR_ci_enabled: false
    TF_VAR_long_lived_enabled: false
    TF_VAR_short_lived_enabled: true

Because it's a bit tricky, I included a README next to the configs: https://github.com/libp2p/test-plans/blob/020d5097fecb4c6e2c271d8748bb5841fe0d9e1a/perf/terraform/configs/README.md

terraform {
# TODO: Uncomment to start using remote backend
# backend "s3" {
# bucket = "terraform-tfstate"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and we already have a bucket with the name libp2p-terraform-state in the libp2p AWS account. What is the best practice? A single bucket with many .tfstate files differentiated by key, or multiple buckets, one per project, e.g. in this case one separate bucket for the libp2p-perf project.

Copy link
Contributor Author

@galargh galargh May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, if you already have libp2p-terraform-state bucket, I'd use this one. I've done bucket per project in some places at PL and it feels like an unnecessary overhead.

Could you check what's the region of the bucket?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check what's the region of the bucket?

us-west-2 if I recall correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers :) I uncommented the bucket name now - 4179aa6

Comment on lines +8 to +11
# 5. Go to https://github.com/libp2p/test-plans/settings/secrets/actions
# 6. Click 'New repository secret', set the name to 'PERF_AWS_SECRET_ACCESS_KEY', and paste the secret access key from step 5
# 7. Go to https://github.com/libp2p/test-plans/settings/variables/actions
# 8. Click 'New repository variable', set the name to 'PERF_AWS_ACCESS_KEY_ID', and paste the access key ID from step 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking my understanding, given that this is quite sensitive.

An external contributor can not steal these by e.g. printing the base64 of them to stdout, because:

  • An external contributor can not create a branch on the libp2p/test-plans repository and thus can not trigger a CI run with access to the repository secrets.
  • A CI run of a pull request from a fork does not have access to the repository secrets.

Is that understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right!

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galargh following up on out-of-band discussion, adjusting the default values to targeting both CI and personal run in the libp2p AWS acccount with long-lived infrastructure pre-provisioned. Looks good to you?

.github/workflows/perf.yml Outdated Show resolved Hide resolved
perf/terraform/configs/local/terraform.tf Outdated Show resolved Hide resolved
perf/terraform/configs/README.md Outdated Show resolved Hide resolved
perf/terraform/configs/README.md Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented Jun 2, 2023

perf/terraform/config/remote is now deployed with the .tfstate in the corresponding S3 bucket. //CC @galargh

@mxinden mxinden merged commit 285149d into perf Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants