-
Notifications
You must be signed in to change notification settings - Fork 0
Magic Castle EESSI 2023 09 21
Kenneth Hoste edited this page Sep 21, 2023
·
2 revisions
attending: Kenneth, Alan, Thomas, Lara
- we want to set up Magic Castle (MC) cluster(s) on AWS & Azure
- to replace CitC cluster on AWS, which is becoming difficult to maintain
- new home for the build-and-deploy bot
- tradeoffs
- one cluster per cloud provider vs multiple (one per CPU family:
aarch64
+x86_64
)- not clear whether MC has good support for heteregenous clusters
- bot can be spread across 4 clusters, but will be more difficult to keep them in sync and know which cluster to look at when build jobs fail
- which OS?
- Rocky 8?
- one cluster per cloud provider vs multiple (one per CPU family:
- requirements for worker nodes
- Python
- Slurm
- shared filesystems (NFS)
- Apptainer
- nodejs (for smee used by bot)
- probably not, we use that via a container
- questions:
- how to set up a new MC cluster, what's required?
- Azure account or API key
- private GitHub repo for each cluster
- under github.com/EESSI
- naming scheme
- mc_azure_rocky8_x86_64_202309
- mc_azure_rocky8_aarch64_202309
- mc_aws_rocky8_x86_64_202309
- mc_aws_rocky8_aarch64_202309
- for now, use
test
tag: mc_azure_rocky8_x86_64_202309_test
- Terraform Cloud (TC) account
- see https://developer.hashicorp.com/terraform/cloud-docs
- only needed to make sure that cluster can be managed via private GithUb repo (and for managing credentials)
- one workspace in TC per cluster
- need to make sur that GitHub app for TC is sufficiently locked down, so it can only access private repos that it needs access to
- security credentials
- for AWS
- see https://github.com/ComputeCanada/magic_castle/blob/main/docs/terraform_cloud.md#aws
- use
magic-castle
user in EESSI subscription
- for Azure
- asking Martin...
- add to Terraform Cloud via variable sets in EESSI org in AWS
- for AWS
- steps:
- create GitHub repo first
- register it in Terraform Cloud GitHub app in EESSI org
- make sure your Terraform Cloud account has a GitHub OAuth token, see https://app.terraform.io/app/settings/tokens
- then connect TC to it
- create new workspace in default project via https://app.terraform.io/app/EESSI/workspaces/new
- apply variable set so AWS or Azure credentials are available
- populate repo
- download appropriate release tarball (see https://github.com/ComputeCanada/magic_castle/releases)
- only people who have access to the TC workspace can confirm that plans generated based on commits to GitHub repo
- docs: https://github.com/ComputeCanada/magic_castle/blob/main/docs/terraform_cloud.md
- inspiration: old MC clusters created by Alan
- beware: ancient MC version
- https://github.com/EESSI/mc_cluster_aws
- https://github.com/EESSI/mc_cluster_azure
- how to update the cluster (node images?)
- how to add user accounts (bot, contributors, etc.)
- who is granted admin access to help manage the clusters
- how to set up a new MC cluster, what's required?
- plan
- try setting up Magic Castle clusters using info above
- AWS
- x86_64: Thomas
- aarch64: Kenneth
- Azure
- x86_64: ...
- aarch64: ...
- AWS
- try setting up Magic Castle clusters using info above
- next meeting
- Tue 26 Sept'23 at 10:00 CEST