From 1b839ae08bd1d2eba61bf95bd9b467e05e085bb0 Mon Sep 17 00:00:00 2001 From: Matthias Kay Date: Thu, 5 Oct 2023 11:37:53 +0200 Subject: [PATCH 1/2] rewrite docs --- CONTRIBUTING.md | 18 +- README.md | 463 +++-------------------------------------------- docs/pitfalls.md | 37 ++++ docs/usage.md | 331 +++++++++++++++++++++++++++++++++ 4 files changed, 406 insertions(+), 443 deletions(-) create mode 100644 docs/pitfalls.md create mode 100644 docs/usage.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 487ecf086..8d0f82d5d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,14 +1,15 @@ # Contribution guide -We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible. You are welcome to: +We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible. +You are welcome to: -- Reporting a bug -- Discussing the current state of the code -- Submitting a fix -- Proposing new features +- Report a bug +- Discuss the current state of the code +- Submit a fix +- Propose new features -We Use [Github Flow](https://guides.github.com/introduction/flow/index.html), So All Code Changes Happen Through Pull Requests -Pull requests are the best way to propose changes to the codebase (we use [Github Flow](https://guides.github.com/introduction/flow/index.html)). We actively welcome your pull requests: +We use [Github Flow](https://guides.github.com/introduction/flow/index.html), So all code changes happen through pull requests. +We actively welcome your pull requests: 1. Fork the repo and create your branch from `main`. 2. If you've added code, check one of the examples. @@ -25,7 +26,8 @@ We use the [Terraform Style conventions](https://www.terraform.io/docs/configura ## Documentation -We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via [terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components. +We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via +[terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components. ## Testing diff --git a/README.md b/README.md index a4345a8c0..6ad7f7761 100644 --- a/README.md +++ b/README.md @@ -3,467 +3,58 @@ [![Terraform registry](https://img.shields.io/github/v/release/cattle-ops/terraform-aws-gitlab-runner?label=Terraform%20Registry)](https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/) [![Gitter](https://badges.gitter.im/terraform-aws-gitlab-runner/Lobby.svg)](https://gitter.im/terraform-aws-gitlab-runner/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Actions](https://github.com/cattle-ops/terraform-aws-gitlab-runner/workflows/CI/badge.svg)](https://github.com/cattle-ops/terraform-aws-gitlab-runner/actions) -[![Renovate][1]](https://www.mend.io/renovate/) +[![Renovate](https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate)](https://www.mend.io/renovate/) -# Terraform module for GitLab auto scaling runners on AWS spot instances - -- [The module](#the-module) -- [Prerequisites](#prerequisites) -- [Usage](#usage) -- [Examples](#examples) -- [Contributors ✨](#contributors-) -- [Requirements](#requirements) -- [Providers](#providers) -- [Modules](#modules) -- [Resources](#resources) -- [Inputs](#inputs) -- [Outputs](#outputs) -- +# Terraform module for GitLab auto-scaling runners on AWS spot instances + +💥 See [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) on how to migrate to v7 smoothly. + ## The module -This [Terraform](https://www.terraform.io/) modules creates a [GitLab CI runner](https://docs.gitlab.com/runner/). A blog post +This [Terraform](https://www.terraform.io/) modules creates a [GitLab Runner](https://docs.gitlab.com/runner/). A blog post describes the original version of the runner. See the post at [040code](https://040code.github.io/2017/12/09/runners-on-the-spot/). The original setup of the module is based on the blog post: [Auto scale GitLab CI runners and save 90% on EC2 costs](https://about.gitlab.com/2017/11/23/autoscale-ci-runners/). -> 💥 BREAKING CHANGE AHEAD: Version 7 of the module rewrites the whole variable section to -> - harmonize the variable names -> - harmonize the documentation -> - remove deprecated variables -> - gain a better overview of the features provided -> -> And it also adds -> - all possible Docker settings -> - the `idle_scale_factor` -> -> We know that this is a breaking change causing some pain, but we think it is worth it. We hope you agree. And to make the -> transition as smooth as possible, we have added a migration script to the `migrations` folder. It will cover almost all cases, -> but some minor rework might still be possible. -> -> Checkout [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) - The runners created by the module use spot instances by default for running the builds using the `docker+machine` executor. - Shared cache in S3 with life cycle management to clear objects after x days. - Logs streamed to CloudWatch. - Runner agents registered automatically. -The name of the runner agent and runner is set with the overrides variable. Adding an agent runner name tag does not work. - -```hcl -# ... -runner_instance = { - name = "Gitlab Runner connecting to GitLab" -} - -# this doesn't work -agent_tags = merge(local.my_tags, map("Name", "Gitlab Runner connecting to GitLab")) -``` - The runner supports 3 main scenarios: -### GitLab CI docker-machine runner - one runner agent - -In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html) -using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default, -which is shared across runners (spot instances). - -![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png) - -### GitLab CI docker-machine runner - multiple runner agents - -In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times. -Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache -outside of the module. - -![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png) - -### GitLab Ci docker runner - -In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the -agent. No auto scaling is supported. - -![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png) - -## Prerequisites - -### Terraform - -Ensure you have Terraform installed. The module is based on Terraform 1.3, see `.terraform-version` for the used version. A handy -tool to mange your Terraform version is [tfenv](https://github.com/kamatama41/tfenv). - -### AWS - -Ensure you have setup your AWS credentials. The module requires access to IAM, EC2, CloudWatch, S3 and SSM. - -### JQ & AWS CLI - -In order to be able to destroy the module, you will need to run from a host with both `jq` and `aws` installed and accessible in -the environment. - -On macOS it is simple to install them using `brew`. - -```sh -brew install jq awscli -``` - -### Service linked roles - -The GitLab runner EC2 instance requires the following service linked roles: - -- AWSServiceRoleForAutoScaling -- AWSServiceRoleForEC2Spot +1. GitLab CI docker-machine runner - one runner agent -By default the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option -`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or -via Terraform. + In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html) + using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default, + which is shared across runners (spot instances). -```hcl -resource "aws_iam_service_linked_role" "spot" { - aws_service_name = "spot.amazonaws.com" -} + ![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png) -resource "aws_iam_service_linked_role" "autoscaling" { - aws_service_name = "autoscaling.amazonaws.com" -} -``` +2. GitLab CI docker-machine runner - multiple runner agents -### KMS keys + In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times. + Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache + outside of the module. -If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might -get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch -example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json) + ![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png) -### GitLab runner token configuration +3. GitLab Ci docker runner -By default the runner is registered on initial deployment. In previous versions of this module this was a manual process. The -manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter -store. See [example](examples/runner-pre-registered/) for more details. + In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the + agent. No auto scaling is supported. -To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value -can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section. -By default the runner will be locked to the target project, not run untagged. Below is an example of the configuration map. + ![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png) -```hcl -runner_gitlab_registration_config = { - registration_token = "" - tag_list = "" - description = "" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected - access_level = "" -} -``` -The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module -will look up the token in the SSM parameter store at the location specified by `secure_parameter_store_gitlab_runner_registration_token_name`. - -For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will lookup the -required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored. - -```sh -# set the following variables, look up the variables in your Terraform config. -# see your Terraform variables to fill in the vars below. -aws-region=<${var.aws_region}> -token= -parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}> - -aws ssm put-parameter --overwrite --type SecureString --name "${parameter-name}" --value ${token} --region "${aws-region}" -``` - -Once you have created the parameter, you must remove the variable `runners_token` from your config. The next time your GitLab -runner instance is created it will look up the token from the SSM parameter store. - -Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be -removed in future releases. - -### Auto Scaling Group - -#### Scheduled scaling - -When `enable_schedule=true`, the `schedule_config` variable can be used to scale the Auto Scaling group. - -Scaling may be defined with one `scale_out` scheduled action and/or one `scale_in` scheduled action. - -For example: - -```hcl - module "runner" { - # ... - runner_schedule_enable = true - runner_schedule_config = { - # Configure optional scale_out scheduled action - scale_out_recurrence = "0 8 * * 1-5" - scale_out_count = 1 # Default for min_size, desired_capacity and max_size - # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size - - # Configure optional scale_in scheduled action - scale_in_recurrence = "0 18 * * 1-5" - scale_in_count = 0 # Default for min_size, desired_capacity and max_size - # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size - } - } -``` - -#### Instance Termination - -The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html) -that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned. - -The use of the termination lifecycle can be toggled using the `asg_termination_lifecycle_hook_create` variable. - -When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function. - -### Access runner instance - -A few option are provided to access the runner instance: - -1. Access via the Session Manager (SSM) by setting `enable_runner_ssm_access` to `true`. The policy to allow access via SSM is not - very restrictive. -2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own - policies by attaching them to `runner_agent_role_arn`. - -### GitLab runner cache - -By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle -policy on the bucket. - -Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple -runners. For this purpose the cache is implemented as a sub module. For more details see the -[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of -this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public) -example. - -In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket -policy. - -```json -{ - "Sid": "Allow access logging", - "Effect": "Allow", - "Principal": { - "Service": "logging.s3.amazonaws.com" - }, - "Action": "s3:PutObject", - "Resource": "/*" -} -``` - -In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A -typical error message looks like: - -```text -Error: Invalid count argument -on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance": - count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0 -The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many -instances will be created. To work around this, use the -target argument to first apply only the resources that the count -depends on. -``` - -The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is -a one time effort needed at the very beginning. - -## Usage - -### Configuration - -Update the variables in `terraform.tfvars` according to your needs and add the following variables. See the previous step for -instructions on how to obtain the token. - -```hcl -runner_name = "NAME_OF_YOUR_RUNNER" -gitlab_url = "GITLAB_URL" -runner_token = "RUNNER_TOKEN" -``` - -The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of -this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest -AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI. - -### Scenario: Basic usage - -Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default). - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "basic" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker_docker_machine_instance = { - subnet_ids = module.vpc.private_subnets - } -} -``` - -### Removing the module - -As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special -procedure to remove them. - -1. Use the AWS Console to set the desired capacity of all auto scaling groups to 0. To find the correct ones use the - `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances - which will in turn create new resources. -2. Kill all agent ec2 instances on the via AWS Console. This triggers a Lambda function in the background which removes - all resources created during runtime of the EC2 instances. -3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests. -4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module. - -If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause -additional costs. But I have never seen that. You should also be fine by executing step 4 only. - -### Scenario: Multi-region deployment - -Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a -distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_ -map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to. - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "multi-region-1" - iam_object_prefix = "-gitlab-runner-iam" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker_cache = { - bucket_prefix = "" - } - - runner_worker_docker_machine_instance = { - subnet_ids = module.vpc.private_subnets - } -} -``` - -### Scenario: Use of Spot Fleet - -Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances -types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take -the next instance type and so on. This update has been possible since the -[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets. - -We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this -module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes -~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`. - -#### Configuration example - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "spot-fleet" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker = { - type = "docker+machine" - } - - runner_worker_docker_machine_fleet = { - enable = true - } - - runner_worker_docker_machine_instance = { - types = ["t3a.medium", "t3.medium", "t2.medium"] - subnet_ids = module.vpc.private_subnets - } -} -``` - -## Examples - -A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the -following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be -run from the `terraform-aws-gitlab-runner/examples/` directory. Don't forget to remove the runners -manually from your Gitlab instance as soon as your are done. - -### Versions - -The version of Terraform is locked down via tfenv, see the `.terraform-version` file for the expected versions. -Providers are locked down as well in the `providers.tf` file. - -### Configure - -The examples are configured with defaults that should work in general. The examples are in general configured for the -region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab registration token. The token can be -found in GitLab in the runner section (global, group or repo scope). Create a file `terraform.tfvars` and the registration token. - -```hcl - registration_token = "MY_TOKEN" -``` - -### Run - -Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created. - -To create the runner, run: - -```sh - terraform apply -``` - -To destroy the runner, run: - -```sh - terraform destroy -``` +For detailed concepts and usage please refer to [usage](docs/usage.md). ## Contributors ✨ -This project exists thanks to all the people who contribute. +PRs are welcome! Please see the [contributing guide](CONTRIBUTING.md) for more details. + +Thanks to all the people who already contributed! @@ -474,6 +65,10 @@ This project exists thanks to all the people who contribute. Made with [contributors-img](https://contrib.rocks). +## License + +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. + ## Module Documentation @@ -640,5 +235,3 @@ Made with [contributors-img](https://contrib.rocks). - -[1]: https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate diff --git a/docs/pitfalls.md b/docs/pitfalls.md new file mode 100644 index 000000000..9c0f26c39 --- /dev/null +++ b/docs/pitfalls.md @@ -0,0 +1,37 @@ +# Common Pitfalls + +## Setting the name of the instances via name tag + +It doesn't work, but the modules supports the `runner_instance.name` and `runner_worker_docker_machine_instance` variable. Set them +to any value to adjust the name of the EC2 instance(s). + +```hcl +# working +runner_instance = { + name = "my-gitlab-runner-name" +} + +# not working +runner_instance = { + addtional_tags = { + Name = ""my-gitlab-runner-name"" + } +} +``` + +## Apply with shared cache bucket fails + +In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A +typical error message looks like: + +```text +Error: Invalid count argument +on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance": + count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0 +The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many +instances will be created. To work around this, use the -target argument to first apply only the resources that the count +depends on. +``` + +The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is +a one time effort needed at the very beginning. diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 000000000..04bbdcdd8 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,331 @@ +# Usage + +Common pitfalls are documented in [pitfalls.md](pitfalls.md). + +## Configuration + +The examples are configured with defaults that should work in general. The examples are in general configured for the +region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab are the registration token and the +URL of your GitLab instance. The token can be found in GitLab in the runner section (global, group or repo scope). +Create a file `terraform.tfvars` and the registration token. + +```hcl +registration_token = "MY_TOKEN" +gitlab_url = "https://my.gitlab.instance/" +``` + +The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of +this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest +AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI if needed. + +## Install the module + +Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created. + +To create the runner, run: + +```sh +terraform apply +``` + +To destroy the runner, run: + +```sh +terraform destroy +``` + +## Scenarios + +### Scenario: Basic usage + +Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default). + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "basic" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker_docker_machine_instance = { +subnet_ids = module.vpc.private_subnets +} +} +``` + +### Scenario: Multi-region deployment + +Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a +distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_ +map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to. + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "multi-region-1" +iam_object_prefix = "-gitlab-runner-iam" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker_cache = { +bucket_prefix = "" +} + +runner_worker_docker_machine_instance = { +subnet_ids = module.vpc.private_subnets +} +} +``` + +### Scenario: Use of Spot Fleet + +Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances +types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take +the next instance type and so on. This update has been possible since the +[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets. + +We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this +module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes +~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`. + +#### Configuration example + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "spot-fleet" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker = { +type = "docker+machine" +} + +runner_worker_docker_machine_fleet = { +enable = true +} + +runner_worker_docker_machine_instance = { +types = ["t3a.medium", "t3.medium", "t2.medium"] +subnet_ids = module.vpc.private_subnets +} +} +``` + +## Examples + +A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the +following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be +run from the `terraform-aws-gitlab-runner/examples/` directory. Don't forget to remove the runners +manually from your Gitlab instance as soon as your are done. + +## Concepts + +### Service linked roles + +The GitLab runner EC2 instance requires the following service linked roles: + +- AWSServiceRoleForAutoScaling +- AWSServiceRoleForEC2Spot + +By default, the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option +`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or +via Terraform. + +```hcl +resource "aws_iam_service_linked_role" "spot" { + aws_service_name = "spot.amazonaws.com" +} + +resource "aws_iam_service_linked_role" "autoscaling" { + aws_service_name = "autoscaling.amazonaws.com" +} +``` + +### KMS keys + +If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might +get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch +example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json) + +### GitLab runner token configuration + +By default, the runner is registered on initial deployment. In previous versions of this module this was a manual process. The +manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter +store. See [example](examples/runner-pre-registered/) for more details. + +To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value +can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section. +By default, the runner will be locked to the target project and not run untagged jobs. Below is an example of the configuration map. + +```hcl +runner_gitlab_registration_config = { + registration_token = "" + tag_list = "" + description = "" + locked_to_project = "true" + run_untagged = "false" + maximum_timeout = "3600" + # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected + access_level = "" +} +``` + +The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module +will look up the token in the SSM parameter store at the location specified by +`runner_gitlab_registration_token_secure_parameter_store_name`. + +For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will look up the +required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored. + +```sh +# set the following variables, look up the variables in your Terraform config. +# see your Terraform variables to fill in the vars below. +aws-region=<${var.aws_region}> +token= +parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}> + +aws ssm put-parameter --overwrite --type SecureString --name "${parameter-name}" --value ${token} --region "${aws-region}" +``` + +Once you have created the parameter, you must remove the variable `runner_gitlab.registration_token` from your config. The next +time your GitLab runner instance is created it will look up the token from the SSM parameter store. + +Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be +removed in future releases. + +### Auto Scaling Group + +#### Scheduled scaling + +When `runner_schedule_enable=true`, the `runner_schedule_config` block can be used to scale the Auto Scaling group. + +Scaling may be defined with one `scale_out_*` scheduled action and/or one `scale_in_*` scheduled action. + +For example: + +```hcl + module "runner" { + # ... + runner_schedule_enable = true + runner_schedule_config = { + # Configure optional scale_out scheduled action + scale_out_recurrence = "0 8 * * 1-5" + scale_out_count = 1 # Default for min_size, desired_capacity and max_size + # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size + + # Configure optional scale_in scheduled action + scale_in_recurrence = "0 18 * * 1-5" + scale_in_count = 0 # Default for min_size, desired_capacity and max_size + # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size + } + } +``` + +#### Instance Termination + +The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html) +that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned. + +The use of the termination lifecycle can be toggled using the `runner_enable_asg_recreation` variable. + +When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function. + +### Access the Runner instance + +A few option are provided to access the runner instance: + +1. Access via the Session Manager (SSM) by setting `runner_worker.ssm_access` to `true`. The policy to allow access via SSM is not + very restrictive. +2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own + policies by attaching them to `runner_role`. + +### GitLab runner cache + +By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle +policy on the bucket. + +Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple +runners. For this purpose the cache is implemented as a sub module. For more details see the +[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of +this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public) +example. + +In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket +policy. + +```json +{ + "Sid": "Allow access logging", + "Effect": "Allow", + "Principal": { + "Service": "logging.s3.amazonaws.com" + }, + "Action": "s3:PutObject", + "Resource": "/*" +} +``` + +## Removing the module + +As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special +procedure to remove them. + +1. Use the AWS Console to set the desired capacity of all auto-scaling groups to 0. To find the correct ones use the + `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances + which will in turn create new resources. +2. Kill all agent ec2 instances via AWS Console. This triggers a Lambda function in the background which removes + all resources created during runtime of the EC2 instances. +3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests. +4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module. + +If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause +additional costs. But I have never seen that. You should also be fine by executing step 4 only. From c4cb4062cd81d826d6fe5d4fafe7f151cc64131f Mon Sep 17 00:00:00 2001 From: Matthias Kay Date: Thu, 5 Oct 2023 11:50:06 +0200 Subject: [PATCH 2/2] fix spelling --- README.md | 11 ++++------- docs/pitfalls.md | 4 ++-- docs/usage.md | 2 +- 3 files changed, 7 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 6ad7f7761..a49f6d147 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ - + [![Terraform registry](https://img.shields.io/github/v/release/cattle-ops/terraform-aws-gitlab-runner?label=Terraform%20Registry)](https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/) [![Gitter](https://badges.gitter.im/terraform-aws-gitlab-runner/Lobby.svg)](https://gitter.im/terraform-aws-gitlab-runner/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) @@ -10,8 +10,6 @@ 💥 See [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) on how to migrate to v7 smoothly. -## The module - This [Terraform](https://www.terraform.io/) modules creates a [GitLab Runner](https://docs.gitlab.com/runner/). A blog post describes the original version of the runner. See the post at [040code](https://040code.github.io/2017/12/09/runners-on-the-spot/). The original setup of the module is based on the blog post: [Auto scale GitLab CI runners and save 90% on EC2 costs](https://about.gitlab.com/2017/11/23/autoscale-ci-runners/). @@ -36,18 +34,17 @@ The runner supports 3 main scenarios: In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times. Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache - outside of the module. + outside the module. ![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png) 3. GitLab Ci docker runner In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the - agent. No auto scaling is supported. + agent. No auto-scaling is supported. ![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png) - For detailed concepts and usage please refer to [usage](docs/usage.md). ## Contributors ✨ @@ -60,7 +57,7 @@ Thanks to all the people who already contributed! - + contributors Made with [contributors-img](https://contrib.rocks). diff --git a/docs/pitfalls.md b/docs/pitfalls.md index 9c0f26c39..2d69239c4 100644 --- a/docs/pitfalls.md +++ b/docs/pitfalls.md @@ -13,8 +13,8 @@ runner_instance = { # not working runner_instance = { - addtional_tags = { - Name = ""my-gitlab-runner-name"" + additional_tags = { + Name = "my-gitlab-runner-name" } } ``` diff --git a/docs/usage.md b/docs/usage.md index 04bbdcdd8..54b766f61 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -200,7 +200,7 @@ example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-g By default, the runner is registered on initial deployment. In previous versions of this module this was a manual process. The manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter -store. See [example](examples/runner-pre-registered/) for more details. +store. See [example](../examples/runner-pre-registered) for more details. To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section.