-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: automate performance benchmarking #185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me. Thank you for the follow-ups! Feel free to ignore the naming suggestions.
Once we agree on the design, I'll test it again against IPDX's account (warning WARN warning it's not tested yet).
Agreed on the design. Mind doing a test run?
This reverts commit d20bfdb.
Thanks for the follow-ups. Let me know once you want me to take another look. |
I renamed the modules according to suggestions, made the SSH keys ephemeral (without automated cleanup), skipped archiving in the workflow unless push is disabled and tested the new setup against IPDX AWS account. Here's an example workflow run: https://github.com/galorgh/test-plans/actions/runs/5111435405/jobs/9188390217 And here's a commit created by the automation: galorgh@d787a35 Let me know if there's anything else you'd want me to look at 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two smaller comments. Otherwise this is ready from my side.
Feel free to merge into perf
branch.
variable "ci_enabled" { | ||
type = bool | ||
description = "Whether or not to create resources required to automate the setup in CI (e.g. IAM user, cleanup Lambda)" | ||
default = false | ||
} | ||
|
||
variable "long_lived_enabled" { | ||
type = bool | ||
description = "Whether or not to create long lived resources (in CI, used across runs; e.g. VPCs)" | ||
default = true | ||
} | ||
|
||
variable "short_lived_enabled" { | ||
type = bool | ||
description = "Whether or not to create short lived resources (in CI, specific to each run; e.g. EC2 instances)" | ||
default = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing with this, I am still a bit confused.
If I understand correctly the .github/workflows/perf
workflow will run the perf/terraform/local
config, see above. That config has long_lived_enabled
and short_lived_enabled
set to true
. The latter makes sense. Why does it set the former? Shouldn't it use the long-lived resources created outside of the workflow?
Looking at it from a different perspective, don't we really need 3 configs?
configs/local
for local development withci_enabled
false
,long_lived_enabled
true
,short_lived_enabled
true
and no S3 backend.configs/ci_long_lived
run once outside of the GitHub workflow withci_enabled
true
,long_lived_enabled
true
,short_lived_enabled
false
and S3 backend.configs/ci_short_lived
run within the workflow on each run withci_enabled
false
,long_lived_enabled
false
,short_lived_enabled
true
and no S3 backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very valid question because it's not immediately obvious why we have different configs in the first place. And the answer lies within terraform internals which we definitely shouldn't require users to know.
So the terraform specific thing I'm referring to here is that terraform backend (i.e. config of where the state is stored) cannot be defined dynamically. That's the reason why we need 2 separate configs - one where state is stored locally, and the other where the state is stored remotely (in this case, on S3). On the other hand, what modules are created during terraform apply can be configured dynamically through terraform variables. That's why we only need 2 configs.
So how are we running tf and where?
- We use
config/local
during local development.config/local
defaults are set so that onlylong_lived
andshort_lived
resources are created. Because of that, the user doesn't have to specify any additional input when doing terraform apply or destroy. The state is stored locally. - We use
config/remote
when we want to do the initial set up of the resources required for the CI. By default, this config creates onlyci
andlong_lived
resources. Again, no additional input is required when we use this config. The state is stored remotely, in a S3 bucket. - We use
config/local
when we want to start instances for CI runs. We useconfig/local
because we want the state to be stored locally butconfig/local
, by default, createslong_lived
andshort_lived
resources. This is too much so we have to override the defaults. This is fine because we don't put the burden of overriding defaults on users. Instead, we just configure it once in the workflow YAML viaTF_VAR_*
env vars, seetest-plans/.github/workflows/perf.yml
Lines 35 to 37 in 020d509
TF_VAR_ci_enabled: false TF_VAR_long_lived_enabled: false TF_VAR_short_lived_enabled: true
Because it's a bit tricky, I included a README next to the configs: https://github.com/libp2p/test-plans/blob/020d5097fecb4c6e2c271d8748bb5841fe0d9e1a/perf/terraform/configs/README.md
terraform { | ||
# TODO: Uncomment to start using remote backend | ||
# backend "s3" { | ||
# bucket = "terraform-tfstate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and we already have a bucket with the name libp2p-terraform-state
in the libp2p AWS account. What is the best practice? A single bucket with many .tfstate
files differentiated by key
, or multiple buckets, one per project, e.g. in this case one separate bucket for the libp2p-perf project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, if you already have libp2p-terraform-state
bucket, I'd use this one. I've done bucket per project in some places at PL and it feels like an unnecessary overhead.
Could you check what's the region of the bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check what's the region of the bucket?
us-west-2
if I recall correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers :) I uncommented the bucket name now - 4179aa6
# 5. Go to https://github.com/libp2p/test-plans/settings/secrets/actions | ||
# 6. Click 'New repository secret', set the name to 'PERF_AWS_SECRET_ACCESS_KEY', and paste the secret access key from step 5 | ||
# 7. Go to https://github.com/libp2p/test-plans/settings/variables/actions | ||
# 8. Click 'New repository variable', set the name to 'PERF_AWS_ACCESS_KEY_ID', and paste the access key ID from step 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking my understanding, given that this is quite sensitive.
An external contributor can not steal these by e.g. printing the base64 of them to stdout, because:
- An external contributor can not create a branch on the
libp2p/test-plans
repository and thus can not trigger a CI run with access to the repository secrets. - A CI run of a pull request from a fork does not have access to the repository secrets.
Is that understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galargh following up on out-of-band discussion, adjusting the default values to targeting both CI and personal run in the libp2p AWS acccount with long-lived infrastructure pre-provisioned. Looks good to you?
|
Co-authored-by: Max Inden <mail@max-inden.de>
This is a continuation of mxinden#2
Diff since the initial review: galorgh/test-plans@perf-terraform...libp2p:test-plans:perf-automation
workflow_call
trigger from theperf
workflow (we can add it when we need it)git push
to theperf
workflows3 upload
from theperf
workflowMakefile
to terraform configcommon
- IAM user + scale down lambda,region
- VPC + launch template,ephemeral
- EC2 instance)terraform apply
+terraform destroy
I'll mark it as ready for review once I reply to the comments raised in the original PR.