From 8348f80404c3d6d411942be1ab2aaf1bc4657471 Mon Sep 17 00:00:00 2001 From: Matthias Kay Date: Thu, 5 Oct 2023 11:56:00 +0200 Subject: [PATCH] docs: restructure and streamline documentation (#1003) ## Description This PR restructures the whole documentation and adapts it to the changed variable names introduces with v7 release. - shorten the main document - underline that contributions are welcome - create a pitfall document - moving the detailed concepts into a separate file --- CONTRIBUTING.md | 18 +- README.md | 470 +++-------------------------------------------- docs/pitfalls.md | 37 ++++ docs/usage.md | 331 +++++++++++++++++++++++++++++++++ 4 files changed, 408 insertions(+), 448 deletions(-) create mode 100644 docs/pitfalls.md create mode 100644 docs/usage.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 487ecf086..8d0f82d5d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,14 +1,15 @@ # Contribution guide -We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible. You are welcome to: +We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible. +You are welcome to: -- Reporting a bug -- Discussing the current state of the code -- Submitting a fix -- Proposing new features +- Report a bug +- Discuss the current state of the code +- Submit a fix +- Propose new features -We Use [Github Flow](https://guides.github.com/introduction/flow/index.html), So All Code Changes Happen Through Pull Requests -Pull requests are the best way to propose changes to the codebase (we use [Github Flow](https://guides.github.com/introduction/flow/index.html)). We actively welcome your pull requests: +We use [Github Flow](https://guides.github.com/introduction/flow/index.html), So all code changes happen through pull requests. +We actively welcome your pull requests: 1. Fork the repo and create your branch from `main`. 2. If you've added code, check one of the examples. @@ -25,7 +26,8 @@ We use the [Terraform Style conventions](https://www.terraform.io/docs/configura ## Documentation -We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via [terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components. +We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via +[terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components. ## Testing diff --git a/README.md b/README.md index a4345a8c0..a49f6d147 100644 --- a/README.md +++ b/README.md @@ -1,479 +1,71 @@ - + [![Terraform registry](https://img.shields.io/github/v/release/cattle-ops/terraform-aws-gitlab-runner?label=Terraform%20Registry)](https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/) [![Gitter](https://badges.gitter.im/terraform-aws-gitlab-runner/Lobby.svg)](https://gitter.im/terraform-aws-gitlab-runner/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Actions](https://github.com/cattle-ops/terraform-aws-gitlab-runner/workflows/CI/badge.svg)](https://github.com/cattle-ops/terraform-aws-gitlab-runner/actions) -[![Renovate][1]](https://www.mend.io/renovate/) +[![Renovate](https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate)](https://www.mend.io/renovate/) -# Terraform module for GitLab auto scaling runners on AWS spot instances - -- [The module](#the-module) -- [Prerequisites](#prerequisites) -- [Usage](#usage) -- [Examples](#examples) -- [Contributors ✨](#contributors-) -- [Requirements](#requirements) -- [Providers](#providers) -- [Modules](#modules) -- [Resources](#resources) -- [Inputs](#inputs) -- [Outputs](#outputs) -- -## The module - -This [Terraform](https://www.terraform.io/) modules creates a [GitLab CI runner](https://docs.gitlab.com/runner/). A blog post +# Terraform module for GitLab auto-scaling runners on AWS spot instances + +💥 See [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) on how to migrate to v7 smoothly. + +This [Terraform](https://www.terraform.io/) modules creates a [GitLab Runner](https://docs.gitlab.com/runner/). A blog post describes the original version of the runner. See the post at [040code](https://040code.github.io/2017/12/09/runners-on-the-spot/). The original setup of the module is based on the blog post: [Auto scale GitLab CI runners and save 90% on EC2 costs](https://about.gitlab.com/2017/11/23/autoscale-ci-runners/). -> 💥 BREAKING CHANGE AHEAD: Version 7 of the module rewrites the whole variable section to -> - harmonize the variable names -> - harmonize the documentation -> - remove deprecated variables -> - gain a better overview of the features provided -> -> And it also adds -> - all possible Docker settings -> - the `idle_scale_factor` -> -> We know that this is a breaking change causing some pain, but we think it is worth it. We hope you agree. And to make the -> transition as smooth as possible, we have added a migration script to the `migrations` folder. It will cover almost all cases, -> but some minor rework might still be possible. -> -> Checkout [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) - The runners created by the module use spot instances by default for running the builds using the `docker+machine` executor. - Shared cache in S3 with life cycle management to clear objects after x days. - Logs streamed to CloudWatch. - Runner agents registered automatically. -The name of the runner agent and runner is set with the overrides variable. Adding an agent runner name tag does not work. - -```hcl -# ... -runner_instance = { - name = "Gitlab Runner connecting to GitLab" -} - -# this doesn't work -agent_tags = merge(local.my_tags, map("Name", "Gitlab Runner connecting to GitLab")) -``` - The runner supports 3 main scenarios: -### GitLab CI docker-machine runner - one runner agent - -In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html) -using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default, -which is shared across runners (spot instances). - -![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png) - -### GitLab CI docker-machine runner - multiple runner agents - -In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times. -Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache -outside of the module. - -![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png) - -### GitLab Ci docker runner - -In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the -agent. No auto scaling is supported. - -![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png) - -## Prerequisites - -### Terraform - -Ensure you have Terraform installed. The module is based on Terraform 1.3, see `.terraform-version` for the used version. A handy -tool to mange your Terraform version is [tfenv](https://github.com/kamatama41/tfenv). - -### AWS - -Ensure you have setup your AWS credentials. The module requires access to IAM, EC2, CloudWatch, S3 and SSM. - -### JQ & AWS CLI - -In order to be able to destroy the module, you will need to run from a host with both `jq` and `aws` installed and accessible in -the environment. - -On macOS it is simple to install them using `brew`. - -```sh -brew install jq awscli -``` - -### Service linked roles - -The GitLab runner EC2 instance requires the following service linked roles: - -- AWSServiceRoleForAutoScaling -- AWSServiceRoleForEC2Spot - -By default the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option -`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or -via Terraform. +1. GitLab CI docker-machine runner - one runner agent -```hcl -resource "aws_iam_service_linked_role" "spot" { - aws_service_name = "spot.amazonaws.com" -} + In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html) + using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default, + which is shared across runners (spot instances). -resource "aws_iam_service_linked_role" "autoscaling" { - aws_service_name = "autoscaling.amazonaws.com" -} -``` + ![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png) -### KMS keys +2. GitLab CI docker-machine runner - multiple runner agents -If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might -get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch -example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json) + In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times. + Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache + outside the module. -### GitLab runner token configuration + ![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png) -By default the runner is registered on initial deployment. In previous versions of this module this was a manual process. The -manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter -store. See [example](examples/runner-pre-registered/) for more details. +3. GitLab Ci docker runner -To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value -can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section. -By default the runner will be locked to the target project, not run untagged. Below is an example of the configuration map. + In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the + agent. No auto-scaling is supported. -```hcl -runner_gitlab_registration_config = { - registration_token = "" - tag_list = "" - description = "" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected - access_level = "" -} -``` + ![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png) -The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module -will look up the token in the SSM parameter store at the location specified by `secure_parameter_store_gitlab_runner_registration_token_name`. - -For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will lookup the -required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored. - -```sh -# set the following variables, look up the variables in your Terraform config. -# see your Terraform variables to fill in the vars below. -aws-region=<${var.aws_region}> -token= -parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}> - -aws ssm put-parameter --overwrite --type SecureString --name "${parameter-name}" --value ${token} --region "${aws-region}" -``` - -Once you have created the parameter, you must remove the variable `runners_token` from your config. The next time your GitLab -runner instance is created it will look up the token from the SSM parameter store. - -Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be -removed in future releases. - -### Auto Scaling Group - -#### Scheduled scaling - -When `enable_schedule=true`, the `schedule_config` variable can be used to scale the Auto Scaling group. - -Scaling may be defined with one `scale_out` scheduled action and/or one `scale_in` scheduled action. - -For example: - -```hcl - module "runner" { - # ... - runner_schedule_enable = true - runner_schedule_config = { - # Configure optional scale_out scheduled action - scale_out_recurrence = "0 8 * * 1-5" - scale_out_count = 1 # Default for min_size, desired_capacity and max_size - # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size - - # Configure optional scale_in scheduled action - scale_in_recurrence = "0 18 * * 1-5" - scale_in_count = 0 # Default for min_size, desired_capacity and max_size - # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size - } - } -``` - -#### Instance Termination - -The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html) -that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned. - -The use of the termination lifecycle can be toggled using the `asg_termination_lifecycle_hook_create` variable. - -When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function. - -### Access runner instance - -A few option are provided to access the runner instance: - -1. Access via the Session Manager (SSM) by setting `enable_runner_ssm_access` to `true`. The policy to allow access via SSM is not - very restrictive. -2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own - policies by attaching them to `runner_agent_role_arn`. - -### GitLab runner cache - -By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle -policy on the bucket. - -Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple -runners. For this purpose the cache is implemented as a sub module. For more details see the -[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of -this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public) -example. - -In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket -policy. - -```json -{ - "Sid": "Allow access logging", - "Effect": "Allow", - "Principal": { - "Service": "logging.s3.amazonaws.com" - }, - "Action": "s3:PutObject", - "Resource": "/*" -} -``` - -In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A -typical error message looks like: - -```text -Error: Invalid count argument -on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance": - count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0 -The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many -instances will be created. To work around this, use the -target argument to first apply only the resources that the count -depends on. -``` - -The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is -a one time effort needed at the very beginning. - -## Usage - -### Configuration - -Update the variables in `terraform.tfvars` according to your needs and add the following variables. See the previous step for -instructions on how to obtain the token. - -```hcl -runner_name = "NAME_OF_YOUR_RUNNER" -gitlab_url = "GITLAB_URL" -runner_token = "RUNNER_TOKEN" -``` - -The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of -this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest -AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI. - -### Scenario: Basic usage - -Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default). - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "basic" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker_docker_machine_instance = { - subnet_ids = module.vpc.private_subnets - } -} -``` - -### Removing the module - -As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special -procedure to remove them. - -1. Use the AWS Console to set the desired capacity of all auto scaling groups to 0. To find the correct ones use the - `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances - which will in turn create new resources. -2. Kill all agent ec2 instances on the via AWS Console. This triggers a Lambda function in the background which removes - all resources created during runtime of the EC2 instances. -3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests. -4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module. - -If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause -additional costs. But I have never seen that. You should also be fine by executing step 4 only. - -### Scenario: Multi-region deployment - -Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a -distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_ -map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to. - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "multi-region-1" - iam_object_prefix = "-gitlab-runner-iam" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker_cache = { - bucket_prefix = "" - } - - runner_worker_docker_machine_instance = { - subnet_ids = module.vpc.private_subnets - } -} -``` - -### Scenario: Use of Spot Fleet - -Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances -types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take -the next instance type and so on. This update has been possible since the -[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets. - -We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this -module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes -~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`. - -#### Configuration example - -```hcl -module "runner" { - # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ - source = "cattle-ops/gitlab-runner/aws" - - environment = "spot-fleet" - - vpc_id = module.vpc.vpc_id - subnet_id = element(module.vpc.private_subnets, 0) - - runner_gitlab = { - url = "https://gitlab.com" - } - - runner_gitlab_registration_config = { - registration_token = "my-token" - tag_list = "docker" - description = "runner default" - locked_to_project = "true" - run_untagged = "false" - maximum_timeout = "3600" - } - - runner_worker = { - type = "docker+machine" - } - - runner_worker_docker_machine_fleet = { - enable = true - } - - runner_worker_docker_machine_instance = { - types = ["t3a.medium", "t3.medium", "t2.medium"] - subnet_ids = module.vpc.private_subnets - } -} -``` - -## Examples - -A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the -following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be -run from the `terraform-aws-gitlab-runner/examples/` directory. Don't forget to remove the runners -manually from your Gitlab instance as soon as your are done. - -### Versions - -The version of Terraform is locked down via tfenv, see the `.terraform-version` file for the expected versions. -Providers are locked down as well in the `providers.tf` file. - -### Configure - -The examples are configured with defaults that should work in general. The examples are in general configured for the -region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab registration token. The token can be -found in GitLab in the runner section (global, group or repo scope). Create a file `terraform.tfvars` and the registration token. - -```hcl - registration_token = "MY_TOKEN" -``` - -### Run - -Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created. - -To create the runner, run: - -```sh - terraform apply -``` - -To destroy the runner, run: - -```sh - terraform destroy -``` +For detailed concepts and usage please refer to [usage](docs/usage.md). ## Contributors ✨ -This project exists thanks to all the people who contribute. +PRs are welcome! Please see the [contributing guide](CONTRIBUTING.md) for more details. + +Thanks to all the people who already contributed! - + contributors Made with [contributors-img](https://contrib.rocks). +## License + +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. + ## Module Documentation @@ -640,5 +232,3 @@ Made with [contributors-img](https://contrib.rocks). - -[1]: https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate diff --git a/docs/pitfalls.md b/docs/pitfalls.md new file mode 100644 index 000000000..2d69239c4 --- /dev/null +++ b/docs/pitfalls.md @@ -0,0 +1,37 @@ +# Common Pitfalls + +## Setting the name of the instances via name tag + +It doesn't work, but the modules supports the `runner_instance.name` and `runner_worker_docker_machine_instance` variable. Set them +to any value to adjust the name of the EC2 instance(s). + +```hcl +# working +runner_instance = { + name = "my-gitlab-runner-name" +} + +# not working +runner_instance = { + additional_tags = { + Name = "my-gitlab-runner-name" + } +} +``` + +## Apply with shared cache bucket fails + +In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A +typical error message looks like: + +```text +Error: Invalid count argument +on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance": + count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0 +The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many +instances will be created. To work around this, use the -target argument to first apply only the resources that the count +depends on. +``` + +The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is +a one time effort needed at the very beginning. diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 000000000..54b766f61 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,331 @@ +# Usage + +Common pitfalls are documented in [pitfalls.md](pitfalls.md). + +## Configuration + +The examples are configured with defaults that should work in general. The examples are in general configured for the +region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab are the registration token and the +URL of your GitLab instance. The token can be found in GitLab in the runner section (global, group or repo scope). +Create a file `terraform.tfvars` and the registration token. + +```hcl +registration_token = "MY_TOKEN" +gitlab_url = "https://my.gitlab.instance/" +``` + +The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of +this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest +AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI if needed. + +## Install the module + +Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created. + +To create the runner, run: + +```sh +terraform apply +``` + +To destroy the runner, run: + +```sh +terraform destroy +``` + +## Scenarios + +### Scenario: Basic usage + +Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default). + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "basic" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker_docker_machine_instance = { +subnet_ids = module.vpc.private_subnets +} +} +``` + +### Scenario: Multi-region deployment + +Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a +distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_ +map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to. + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "multi-region-1" +iam_object_prefix = "-gitlab-runner-iam" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker_cache = { +bucket_prefix = "" +} + +runner_worker_docker_machine_instance = { +subnet_ids = module.vpc.private_subnets +} +} +``` + +### Scenario: Use of Spot Fleet + +Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances +types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take +the next instance type and so on. This update has been possible since the +[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets. + +We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this +module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes +~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`. + +#### Configuration example + +```hcl +module "runner" { +# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/ +source = "cattle-ops/gitlab-runner/aws" + +environment = "spot-fleet" + +vpc_id = module.vpc.vpc_id +subnet_id = element(module.vpc.private_subnets, 0) + +runner_gitlab = { +url = "https://gitlab.com" +} + +runner_gitlab_registration_config = { +registration_token = "my-token" +tag_list = "docker" +description = "runner default" +locked_to_project = "true" +run_untagged = "false" +maximum_timeout = "3600" +} + +runner_worker = { +type = "docker+machine" +} + +runner_worker_docker_machine_fleet = { +enable = true +} + +runner_worker_docker_machine_instance = { +types = ["t3a.medium", "t3.medium", "t2.medium"] +subnet_ids = module.vpc.private_subnets +} +} +``` + +## Examples + +A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the +following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be +run from the `terraform-aws-gitlab-runner/examples/` directory. Don't forget to remove the runners +manually from your Gitlab instance as soon as your are done. + +## Concepts + +### Service linked roles + +The GitLab runner EC2 instance requires the following service linked roles: + +- AWSServiceRoleForAutoScaling +- AWSServiceRoleForEC2Spot + +By default, the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option +`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or +via Terraform. + +```hcl +resource "aws_iam_service_linked_role" "spot" { + aws_service_name = "spot.amazonaws.com" +} + +resource "aws_iam_service_linked_role" "autoscaling" { + aws_service_name = "autoscaling.amazonaws.com" +} +``` + +### KMS keys + +If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might +get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch +example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json) + +### GitLab runner token configuration + +By default, the runner is registered on initial deployment. In previous versions of this module this was a manual process. The +manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter +store. See [example](../examples/runner-pre-registered) for more details. + +To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value +can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section. +By default, the runner will be locked to the target project and not run untagged jobs. Below is an example of the configuration map. + +```hcl +runner_gitlab_registration_config = { + registration_token = "" + tag_list = "" + description = "" + locked_to_project = "true" + run_untagged = "false" + maximum_timeout = "3600" + # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected + access_level = "" +} +``` + +The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module +will look up the token in the SSM parameter store at the location specified by +`runner_gitlab_registration_token_secure_parameter_store_name`. + +For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will look up the +required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored. + +```sh +# set the following variables, look up the variables in your Terraform config. +# see your Terraform variables to fill in the vars below. +aws-region=<${var.aws_region}> +token= +parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}> + +aws ssm put-parameter --overwrite --type SecureString --name "${parameter-name}" --value ${token} --region "${aws-region}" +``` + +Once you have created the parameter, you must remove the variable `runner_gitlab.registration_token` from your config. The next +time your GitLab runner instance is created it will look up the token from the SSM parameter store. + +Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be +removed in future releases. + +### Auto Scaling Group + +#### Scheduled scaling + +When `runner_schedule_enable=true`, the `runner_schedule_config` block can be used to scale the Auto Scaling group. + +Scaling may be defined with one `scale_out_*` scheduled action and/or one `scale_in_*` scheduled action. + +For example: + +```hcl + module "runner" { + # ... + runner_schedule_enable = true + runner_schedule_config = { + # Configure optional scale_out scheduled action + scale_out_recurrence = "0 8 * * 1-5" + scale_out_count = 1 # Default for min_size, desired_capacity and max_size + # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size + + # Configure optional scale_in scheduled action + scale_in_recurrence = "0 18 * * 1-5" + scale_in_count = 0 # Default for min_size, desired_capacity and max_size + # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size + } + } +``` + +#### Instance Termination + +The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html) +that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned. + +The use of the termination lifecycle can be toggled using the `runner_enable_asg_recreation` variable. + +When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function. + +### Access the Runner instance + +A few option are provided to access the runner instance: + +1. Access via the Session Manager (SSM) by setting `runner_worker.ssm_access` to `true`. The policy to allow access via SSM is not + very restrictive. +2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own + policies by attaching them to `runner_role`. + +### GitLab runner cache + +By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle +policy on the bucket. + +Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple +runners. For this purpose the cache is implemented as a sub module. For more details see the +[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of +this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public) +example. + +In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket +policy. + +```json +{ + "Sid": "Allow access logging", + "Effect": "Allow", + "Principal": { + "Service": "logging.s3.amazonaws.com" + }, + "Action": "s3:PutObject", + "Resource": "/*" +} +``` + +## Removing the module + +As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special +procedure to remove them. + +1. Use the AWS Console to set the desired capacity of all auto-scaling groups to 0. To find the correct ones use the + `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances + which will in turn create new resources. +2. Kill all agent ec2 instances via AWS Console. This triggers a Lambda function in the background which removes + all resources created during runtime of the EC2 instances. +3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests. +4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module. + +If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause +additional costs. But I have never seen that. You should also be fine by executing step 4 only.