From 8348f80404c3d6d411942be1ab2aaf1bc4657471 Mon Sep 17 00:00:00 2001
From: Matthias Kay <matthias.kay@hlag.com>
Date: Thu, 5 Oct 2023 11:56:00 +0200
Subject: [PATCH] docs: restructure and streamline documentation (#1003)

## Description

This PR restructures the whole documentation and adapts it to the
changed variable names introduces with v7 release.

- shorten the main document
- underline that contributions are welcome
- create a pitfall document
- moving the detailed concepts into a separate file
---
 CONTRIBUTING.md  |  18 +-
 README.md        | 470 +++--------------------------------------------
 docs/pitfalls.md |  37 ++++
 docs/usage.md    | 331 +++++++++++++++++++++++++++++++++
 4 files changed, 408 insertions(+), 448 deletions(-)
 create mode 100644 docs/pitfalls.md
 create mode 100644 docs/usage.md

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 487ecf086..8d0f82d5d 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,14 +1,15 @@
 # Contribution guide
 
-We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible. You are welcome to:
+We appreciate your thought to contribute to open source. :heart: We want to make contributing as easy as possible.
+You are welcome to:
 
-- Reporting a bug
-- Discussing the current state of the code
-- Submitting a fix
-- Proposing new features
+- Report a bug
+- Discuss the current state of the code
+- Submit a fix
+- Propose new features
 
-We Use [Github Flow](https://guides.github.com/introduction/flow/index.html), So All Code Changes Happen Through Pull Requests
-Pull requests are the best way to propose changes to the codebase (we use [Github Flow](https://guides.github.com/introduction/flow/index.html)). We actively welcome your pull requests:
+We use [Github Flow](https://guides.github.com/introduction/flow/index.html), So all code changes happen through pull requests.
+We actively welcome your pull requests:
 
 1. Fork the repo and create your branch from `main`.
 2. If you've added code, check one of the examples.
@@ -25,7 +26,8 @@ We use the [Terraform Style conventions](https://www.terraform.io/docs/configura
 
 ## Documentation
 
-We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via [terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components.
+We use [pre-commit](https://pre-commit.com/) to update the Terraform inputs and outputs in the documentation via
+[terraform-docs](https://github.com/terraform-docs/terraform-docs). Ensure you have installed those components.
 
 ## Testing
 
diff --git a/README.md b/README.md
index a4345a8c0..a49f6d147 100644
--- a/README.md
+++ b/README.md
@@ -1,479 +1,71 @@
-<!-- First line should be a H1: Badges on top please! -->
+<!-- First line should be an H1: Badges on top please! -->
 <!-- markdownlint-disable MD041/first-line-heading/first-line-h1 -->
 [![Terraform registry](https://img.shields.io/github/v/release/cattle-ops/terraform-aws-gitlab-runner?label=Terraform%20Registry)](https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/)
 [![Gitter](https://badges.gitter.im/terraform-aws-gitlab-runner/Lobby.svg)](https://gitter.im/terraform-aws-gitlab-runner/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
 [![Actions](https://github.com/cattle-ops/terraform-aws-gitlab-runner/workflows/CI/badge.svg)](https://github.com/cattle-ops/terraform-aws-gitlab-runner/actions)
-[![Renovate][1]](https://www.mend.io/renovate/)
+[![Renovate](https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate)](https://www.mend.io/renovate/)
 <!-- markdownlint-enable MD041/first-line-heading/first-line-h1 -->
 
-# Terraform module for GitLab auto scaling runners on AWS spot instances <!-- omit in toc -->
-
-- [The module](#the-module)
-- [Prerequisites](#prerequisites)
-- [Usage](#usage)
-- [Examples](#examples)
-- [Contributors ✨](#contributors-) <!-- markdown-link-check-disable-line -->
-- [Requirements](#requirements) <!-- markdown-link-check-disable-line -->
-- [Providers](#providers) <!-- markdown-link-check-disable-line -->
-- [Modules](#modules) <!-- markdown-link-check-disable-line -->
-- [Resources](#resources) <!-- markdown-link-check-disable-line -->
-- [Inputs](#inputs) <!-- markdown-link-check-disable-line -->
-- [Outputs](#outputs) <!-- markdown-link-check-disable-line -->
-- 
-## The module
-
-This [Terraform](https://www.terraform.io/) modules creates a [GitLab CI runner](https://docs.gitlab.com/runner/). A blog post
+# Terraform module for GitLab auto-scaling runners on AWS spot instances <!-- omit in toc -->
+
+💥 See [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819) on how to migrate to v7 smoothly.
+
+This [Terraform](https://www.terraform.io/) modules creates a [GitLab Runner](https://docs.gitlab.com/runner/). A blog post
 describes the original version of the runner. See the post at [040code](https://040code.github.io/2017/12/09/runners-on-the-spot/).
 The original setup of the module is based on the blog post: [Auto scale GitLab CI runners and save 90% on EC2 costs](https://about.gitlab.com/2017/11/23/autoscale-ci-runners/).
 
-> 💥 BREAKING CHANGE AHEAD: Version 7 of the module rewrites the whole variable section to
->    - harmonize the variable names
->    - harmonize the documentation
->    - remove deprecated variables
->    - gain a better overview of the features provided
->
-> And it also adds
->   - all possible Docker settings
->   - the `idle_scale_factor`
->
-> We know that this is a breaking change causing some pain, but we think it is worth it. We hope you agree. And to make the
-> transition as smooth as possible, we have added a migration script to the `migrations` folder. It will cover almost all cases,
-> but some minor rework might still be possible.
->
-> Checkout [issue 819](https://github.com/cattle-ops/terraform-aws-gitlab-runner/issues/819)
-
 The runners created by the module use spot instances by default for running the builds using the `docker+machine` executor.
 
 - Shared cache in S3 with life cycle management to clear objects after x days.
 - Logs streamed to CloudWatch.
 - Runner agents registered automatically.
 
-The name of the runner agent and runner is set with the overrides variable. Adding an agent runner name tag does not work.
-
-```hcl
-# ...
-runner_instance = {
-  name  = "Gitlab Runner connecting to GitLab"
-}
-
-# this doesn't work
-agent_tags = merge(local.my_tags, map("Name", "Gitlab Runner connecting to GitLab"))
-```
-
 The runner supports 3 main scenarios:
 
-### GitLab CI docker-machine runner - one runner agent
-
-In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html)
-using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default,
-which is shared across runners (spot instances).
-
-![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png)
-
-### GitLab CI docker-machine runner - multiple runner agents
-
-In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times.
-Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache
-outside of the module.
-
-![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png)
-
-### GitLab Ci docker runner
-
-In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the
-agent. No auto scaling is supported.
-
-![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png)
-
-## Prerequisites
-
-### Terraform
-
-Ensure you have Terraform installed. The module is based on Terraform 1.3, see `.terraform-version` for the used version. A handy
-tool to mange your Terraform version is [tfenv](https://github.com/kamatama41/tfenv).
-
-### AWS
-
-Ensure you have setup your AWS credentials. The module requires access to IAM, EC2, CloudWatch, S3 and SSM.
-
-### JQ & AWS CLI
-
-In order to be able to destroy the module, you will need to run from a host with both `jq` and `aws` installed and accessible in
-the environment.
-
-On macOS it is simple to install them using `brew`.
-
-```sh
-brew install jq awscli
-```
-
-### Service linked roles
-
-The GitLab runner EC2 instance requires the following service linked roles:
-
-- AWSServiceRoleForAutoScaling
-- AWSServiceRoleForEC2Spot
-
-By default the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option
-`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or
-via Terraform.
+1. GitLab CI docker-machine runner - one runner agent
 
-```hcl
-resource "aws_iam_service_linked_role" "spot" {
-  aws_service_name = "spot.amazonaws.com"
-}
+   In this scenario the runner agent is running on a single EC2 node and runners are created by [docker machine](https://docs.gitlab.com/runner/configuration/autoscale.html)
+   using spot instances. Runners will scale automatically based on the configuration. The module creates a S3 cache by default,
+   which is shared across runners (spot instances).
 
-resource "aws_iam_service_linked_role" "autoscaling" {
-  aws_service_name = "autoscaling.amazonaws.com"
-}
-```
+   ![runners-default](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-default.png)
 
-### KMS keys
+2. GitLab CI docker-machine runner - multiple runner agents
 
-If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might
-get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch
-example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json)
+   In this scenario the multiple runner agents can be created with different configuration by instantiating the module multiple times.
+   Runners will scale automatically based on the configuration. The S3 cache can be shared across runners by managing the cache
+   outside the module.
 
-### GitLab runner token configuration
+   ![runners-cache](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-cache.png)
 
-By default the runner is registered on initial deployment. In previous versions of this module this was a manual process. The
-manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter
-store. See [example](examples/runner-pre-registered/) for more details.
+3. GitLab Ci docker runner
 
-To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value
-can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section.
-By default the runner will be locked to the target project, not run untagged. Below is an example of the configuration map.
+   In this scenario _not_ docker machine is used but docker to schedule the builds. Builds will run on the same EC2 instance as the
+   agent. No auto-scaling is supported.
 
-```hcl
-runner_gitlab_registration_config = {
-  registration_token = "<registration token>"
-  tag_list           = "<your tags, comma separated>"
-  description        = "<some description>"
-  locked_to_project  = "true"
-  run_untagged       = "false"
-  maximum_timeout    = "3600"
-  # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected
-  access_level       = "<not_protected OR ref_protected>"
-}
-```
+   ![runners-docker](https://github.com/cattle-ops/terraform-aws-gitlab-runner/raw/main/assets/images/runner-docker.png)
 
-The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module
-will look up the token in the SSM parameter store at the location specified by `secure_parameter_store_gitlab_runner_registration_token_name`.
-
-For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will lookup the
-required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored.
-
-```sh
-# set the following variables, look up the variables in your Terraform config.
-# see your Terraform variables to fill in the vars below.
-aws-region=<${var.aws_region}>
-token=<runner-token-see-your-gitlab-runner>
-parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}>
-
-aws ssm put-parameter --overwrite --type SecureString  --name "${parameter-name}" --value ${token} --region "${aws-region}"
-```
-
-Once you have created the parameter, you must remove the variable `runners_token` from your config. The next time your GitLab
-runner instance is created it will look up the token from the SSM parameter store.
-
-Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be
-removed in future releases.
-
-### Auto Scaling Group
-
-#### Scheduled scaling
-
-When `enable_schedule=true`, the `schedule_config` variable can be used to scale the Auto Scaling group.
-
-Scaling may be defined with one `scale_out` scheduled action and/or one `scale_in` scheduled action.
-
-For example:
-
-```hcl
-  module "runner" {
-    # ...
-    runner_schedule_enable = true
-    runner_schedule_config = {
-      # Configure optional scale_out scheduled action
-      scale_out_recurrence = "0 8 * * 1-5"
-      scale_out_count      = 1 # Default for min_size, desired_capacity and max_size
-      # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size
-
-      # Configure optional scale_in scheduled action
-      scale_in_recurrence  = "0 18 * * 1-5"
-      scale_in_count       = 0 # Default for min_size, desired_capacity and max_size
-      # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size
-    }
-  }
-```
-
-#### Instance Termination
-
-The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html)
-that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned.
-
-The use of the termination lifecycle can be toggled using the `asg_termination_lifecycle_hook_create` variable.
-
-When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function.
-
-### Access runner instance
-
-A few option are provided to access the runner instance:
-
-1. Access via the Session Manager (SSM) by setting `enable_runner_ssm_access` to `true`. The policy to allow access via SSM is not
-   very restrictive.
-2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own
-   policies by attaching them to `runner_agent_role_arn`.
-
-### GitLab runner cache
-
-By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle
-policy on the bucket.
-
-Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple
-runners. For this purpose the cache is implemented as a sub module. For more details see the
-[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of
-this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public)
-example.
-
-In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket
-policy.
-
-```json
-{
-    "Sid": "Allow access logging",
-    "Effect": "Allow",
-    "Principal": {
-        "Service": "logging.s3.amazonaws.com"
-    },
-    "Action": "s3:PutObject",
-    "Resource": "<s3-arn>/*"
-}
-```
-
-In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A
-typical error message looks like:
-
-```text
-Error: Invalid count argument
-on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance":
-  count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0
-The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many
-instances will be created. To work around this, use the -target argument to first apply only the resources that the count
-depends on.
-```
-
-The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is
-a one time effort needed at the very beginning.
-
-## Usage
-
-### Configuration
-
-Update the variables in `terraform.tfvars` according to your needs and add the following variables. See the previous step for
-instructions on how to obtain the token.
-
-```hcl
-runner_name  = "NAME_OF_YOUR_RUNNER"
-gitlab_url   = "GITLAB_URL"
-runner_token = "RUNNER_TOKEN"
-```
-
-The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of
-this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest
-AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI.
-
-### Scenario: Basic usage
-
-Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default).
-
-```hcl
-module "runner" {
-  # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
-  source  = "cattle-ops/gitlab-runner/aws"
-   
-  environment = "basic"
-
-  vpc_id    = module.vpc.vpc_id
-  subnet_id = element(module.vpc.private_subnets, 0)
-
-   runner_gitlab = {
-      url = "https://gitlab.com" 
-   }
-
-   runner_gitlab_registration_config = {
-    registration_token = "my-token"
-    tag_list           = "docker"
-    description        = "runner default"
-    locked_to_project  = "true"
-    run_untagged       = "false"
-    maximum_timeout    = "3600"
-  }
-
-   runner_worker_docker_machine_instance = {
-      subnet_ids = module.vpc.private_subnets
-   }
-}
-```
-
-### Removing the module
-
-As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special
-procedure to remove them.
-
-1. Use the AWS Console to set the desired capacity of all auto scaling groups to 0. To find the correct ones use the
-   `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances
-   which will in turn create new resources.
-2. Kill all agent ec2 instances on the via AWS Console. This triggers a Lambda function in the background which removes
-   all resources created during runtime of the EC2 instances.
-3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests.
-4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module.
-
-If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause
-additional costs. But I have never seen that. You should also be fine by executing step 4 only.
-
-### Scenario: Multi-region deployment
-
-Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a
-distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_
-map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to.
-
-```hcl
-module "runner" {
-   # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
-   source  = "cattle-ops/gitlab-runner/aws"
-
-   environment = "multi-region-1"
-   iam_object_prefix = "<region-specific-prefix>-gitlab-runner-iam"
-   
-   vpc_id    = module.vpc.vpc_id
-   subnet_id = element(module.vpc.private_subnets, 0)
-
-   runner_gitlab = {
-      url = "https://gitlab.com"
-   }
-
-   runner_gitlab_registration_config = {
-      registration_token = "my-token"
-      tag_list           = "docker"
-      description        = "runner default"
-      locked_to_project  = "true"
-      run_untagged       = "false"
-      maximum_timeout    = "3600"
-   }
-
-   runner_worker_cache = {
-      bucket_prefix = "<region-specific-prefix>"
-   }
-   
-   runner_worker_docker_machine_instance = {
-      subnet_ids = module.vpc.private_subnets
-   }
-}
-```
-
-### Scenario: Use of Spot Fleet
-
-Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances
-types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take
-the next instance type and so on. This update has been possible since the
-[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets.
-
-We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this
-module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes
-~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`.
-
-#### Configuration example
-
-```hcl
-module "runner" {
-   # https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
-   source  = "cattle-ops/gitlab-runner/aws"
-
-   environment = "spot-fleet"
-
-   vpc_id    = module.vpc.vpc_id
-   subnet_id = element(module.vpc.private_subnets, 0)
-
-   runner_gitlab = {
-      url = "https://gitlab.com"
-   }
-
-   runner_gitlab_registration_config = {
-      registration_token = "my-token"
-      tag_list           = "docker"
-      description        = "runner default"
-      locked_to_project  = "true"
-      run_untagged       = "false"
-      maximum_timeout    = "3600"
-   }
-
-   runner_worker = {
-      type = "docker+machine"
-   }
-
-   runner_worker_docker_machine_fleet = {
-      enable = true
-   }
-   
-   runner_worker_docker_machine_instance = {
-      types = ["t3a.medium", "t3.medium", "t2.medium"]
-      subnet_ids = module.vpc.private_subnets
-   }
-}
-```
-
-## Examples
-
-A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the
-following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be
-run from the `terraform-aws-gitlab-runner/examples/<example-dir>` directory. Don't forget to remove the runners
-manually from your Gitlab instance as soon as your are done.
-
-### Versions
-
-The version of Terraform is locked down via tfenv, see the `.terraform-version` file for the expected versions.
-Providers are locked down as well in the `providers.tf` file.
-
-### Configure
-
-The examples are configured with defaults that should work in general. The examples are in general configured for the
-region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab registration token. The token can be
-found in GitLab in the runner section (global, group or repo scope). Create a file `terraform.tfvars` and the registration token.
-
-```hcl
-    registration_token = "MY_TOKEN"
-```
-
-### Run
-
-Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created.
-
-To create the runner, run:
-
-```sh
-  terraform apply
-```
-
-To destroy the runner, run:
-
-```sh
-  terraform destroy
-```
+For detailed concepts and usage please refer to [usage](docs/usage.md).
 
 ## Contributors ✨
 
-This project exists thanks to all the people who contribute.
+PRs are welcome! Please see the [contributing guide](CONTRIBUTING.md) for more details.
+
+Thanks to all the people who already contributed!
 
 <!-- this is the only option to integrate the contributors list in the README.md -->
 <!-- markdownlint-disable MD033 -->
 <a href="https://github.com/cattle-ops/terraform-aws-gitlab-runner/graphs/contributors">
   <!-- markdownlint-disable MD033 -->
-  <img src="https://contrib.rocks/image?repo=cattle-ops/terraform-aws-gitlab-runner" />
+  <img src="https://contrib.rocks/image?repo=cattle-ops/terraform-aws-gitlab-runner" alt="contributors"/>
 </a>
 
 Made with [contributors-img](https://contrib.rocks).
 
+## License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+
 ## Module Documentation
 
 <!-- markdownlint-disable -->
@@ -640,5 +232,3 @@ Made with [contributors-img](https://contrib.rocks).
 <!-- markdownlint-enable -->
 <!-- cSpell:enable -->
 <!-- markdown-link-check-enable -->
-
-[1]: https://img.shields.io/badge/renovate-enabled-brightgreen?logo=renovate
diff --git a/docs/pitfalls.md b/docs/pitfalls.md
new file mode 100644
index 000000000..2d69239c4
--- /dev/null
+++ b/docs/pitfalls.md
@@ -0,0 +1,37 @@
+# Common Pitfalls
+
+## Setting the name of the instances via name tag
+
+It doesn't work, but the modules supports the `runner_instance.name` and `runner_worker_docker_machine_instance` variable. Set them
+to any value to adjust the name of the EC2 instance(s).
+
+```hcl
+# working
+runner_instance = {
+  name  = "my-gitlab-runner-name"
+}
+
+# not working
+runner_instance = {
+  additional_tags = {
+    Name = "my-gitlab-runner-name"
+  }
+}
+```
+
+## Apply with shared cache bucket fails
+
+In case you manage the S3 cache bucket yourself it might be necessary to apply the cache before applying the runner module. A
+typical error message looks like:
+
+```text
+Error: Invalid count argument
+on .terraform/modules/gitlab_runner/main.tf line 400, in resource "aws_iam_role_policy_attachment" "docker_machine_cache_instance":
+  count = var.cache_bucket["create"] || length(lookup(var.cache_bucket, "policy", "")) > 0 ? 1 : 0
+The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many
+instances will be created. To work around this, use the -target argument to first apply only the resources that the count
+depends on.
+```
+
+The workaround is to use a `terraform apply -target=module.cache` followed by a `terraform apply` to apply everything else. This is
+a one time effort needed at the very beginning.
diff --git a/docs/usage.md b/docs/usage.md
new file mode 100644
index 000000000..54b766f61
--- /dev/null
+++ b/docs/usage.md
@@ -0,0 +1,331 @@
+# Usage
+
+Common pitfalls are documented in [pitfalls.md](pitfalls.md).
+
+## Configuration
+
+The examples are configured with defaults that should work in general. The examples are in general configured for the
+region Ireland `eu-west-1`. The only parameter that needs to be provided is the GitLab are the registration token and the
+URL of your GitLab instance. The token can be found in GitLab in the runner section (global, group or repo scope).
+Create a file `terraform.tfvars` and the registration token.
+
+```hcl
+registration_token = "MY_TOKEN"
+gitlab_url   = "https://my.gitlab.instance/"
+```
+
+The base image used to host the GitLab Runner agent is the latest available Amazon Linux 2 HVM EBS AMI. In previous versions of
+this module a hard coded list of AMIs per region was provided. This list has been replaced by a search filter to find the latest
+AMI. Setting the filter to `amzn2-ami-hvm-2.0.20200207.1-x86_64-ebs` will allow you to version lock the target AMI if needed.
+
+## Install the module
+
+Run `terraform init` to initialize Terraform. Next you can run `terraform plan` to inspect the resources that will be created.
+
+To create the runner, run:
+
+```sh
+terraform apply
+```
+
+To destroy the runner, run:
+
+```sh
+terraform destroy
+```
+
+## Scenarios
+
+### Scenario: Basic usage
+
+Below is a basic examples of usages of the module. Regarding the dependencies such as a VPC, have a look at the [default example](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-default).
+
+```hcl
+module "runner" {
+# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
+source  = "cattle-ops/gitlab-runner/aws"
+
+environment = "basic"
+
+vpc_id    = module.vpc.vpc_id
+subnet_id = element(module.vpc.private_subnets, 0)
+
+runner_gitlab = {
+url = "https://gitlab.com"
+}
+
+runner_gitlab_registration_config = {
+registration_token = "my-token"
+tag_list           = "docker"
+description        = "runner default"
+locked_to_project  = "true"
+run_untagged       = "false"
+maximum_timeout    = "3600"
+}
+
+runner_worker_docker_machine_instance = {
+subnet_ids = module.vpc.private_subnets
+}
+}
+```
+
+### Scenario: Multi-region deployment
+
+Name clashes due to multi-region deployments for global AWS resources create by this module (IAM, S3) can be avoided by including a
+distinguishing region specific prefix via the _cache_bucket_prefix_ string respectively via _name_iam_objects_ in the _overrides_
+map. A simple example for this would be to set _region-specific-prefix_ to the AWS region the module is deployed to.
+
+```hcl
+module "runner" {
+# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
+source  = "cattle-ops/gitlab-runner/aws"
+
+environment = "multi-region-1"
+iam_object_prefix = "<region-specific-prefix>-gitlab-runner-iam"
+
+vpc_id    = module.vpc.vpc_id
+subnet_id = element(module.vpc.private_subnets, 0)
+
+runner_gitlab = {
+url = "https://gitlab.com"
+}
+
+runner_gitlab_registration_config = {
+registration_token = "my-token"
+tag_list           = "docker"
+description        = "runner default"
+locked_to_project  = "true"
+run_untagged       = "false"
+maximum_timeout    = "3600"
+}
+
+runner_worker_cache = {
+bucket_prefix = "<region-specific-prefix>"
+}
+
+runner_worker_docker_machine_instance = {
+subnet_ids = module.vpc.private_subnets
+}
+}
+```
+
+### Scenario: Use of Spot Fleet
+
+Since spot instances can be taken over by AWS depending on the instance type and AZ you are using, you may want multiple instances
+types in multiple AZs. This is where spot fleets come in, when there is no capacity on one instance type and one AZ, AWS will take
+the next instance type and so on. This update has been possible since the
+[fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine supports spot fleets.
+
+We have seen that the [fork](https://gitlab.com/cki-project/docker-machine/-/tree/v0.16.2-gitlab.19-cki.2) of docker-machine this
+module is using consume more RAM using spot fleets. For comparison, if you launch 50 machines in the same time, it consumes
+~1.2GB of RAM. In our case, we had to change the `instance_type` of the runner from `t3.micro` to `t3.small`.
+
+#### Configuration example
+
+```hcl
+module "runner" {
+# https://registry.terraform.io/modules/cattle-ops/gitlab-runner/aws/
+source  = "cattle-ops/gitlab-runner/aws"
+
+environment = "spot-fleet"
+
+vpc_id    = module.vpc.vpc_id
+subnet_id = element(module.vpc.private_subnets, 0)
+
+runner_gitlab = {
+url = "https://gitlab.com"
+}
+
+runner_gitlab_registration_config = {
+registration_token = "my-token"
+tag_list           = "docker"
+description        = "runner default"
+locked_to_project  = "true"
+run_untagged       = "false"
+maximum_timeout    = "3600"
+}
+
+runner_worker = {
+type = "docker+machine"
+}
+
+runner_worker_docker_machine_fleet = {
+enable = true
+}
+
+runner_worker_docker_machine_instance = {
+types = ["t3a.medium", "t3.medium", "t2.medium"]
+subnet_ids = module.vpc.private_subnets
+}
+}
+```
+
+## Examples
+
+A few [examples](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/) are provided. Use the
+following steps to deploy. Ensure your AWS and Terraform environment is set up correctly. All commands below should be
+run from the `terraform-aws-gitlab-runner/examples/<example-dir>` directory. Don't forget to remove the runners
+manually from your Gitlab instance as soon as your are done.
+
+## Concepts
+
+### Service linked roles
+
+The GitLab runner EC2 instance requires the following service linked roles:
+
+- AWSServiceRoleForAutoScaling
+- AWSServiceRoleForEC2Spot
+
+By default, the EC2 instance is allowed to create the required roles, but this can be disabled by setting the option
+`allow_iam_service_linked_role_creation` to `false`. If disabled you must ensure the roles exist. You can create them manually or
+via Terraform.
+
+```hcl
+resource "aws_iam_service_linked_role" "spot" {
+  aws_service_name = "spot.amazonaws.com"
+}
+
+resource "aws_iam_service_linked_role" "autoscaling" {
+  aws_service_name = "autoscaling.amazonaws.com"
+}
+```
+
+### KMS keys
+
+If a KMS key is set via `kms_key_id`, make sure that you also give proper access to the key. Otherwise, you might
+get errors, e.g. the build cache can't be decrypted or logging via CloudWatch is not possible. For a CloudWatch
+example checkout [kms-policy.json](https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/kms-policy.json)
+
+### GitLab runner token configuration
+
+By default, the runner is registered on initial deployment. In previous versions of this module this was a manual process. The
+manual process is still supported but will be removed in future releases. The runner token will be stored in the AWS SSM parameter
+store. See [example](../examples/runner-pre-registered) for more details.
+
+To register the runner automatically set the variable `gitlab_runner_registration_config["registration_token"]`. This token value
+can be found in your GitLab project, group, or global settings. For a generic runner you can find the token in the admin section.
+By default, the runner will be locked to the target project and not run untagged jobs. Below is an example of the configuration map.
+
+```hcl
+runner_gitlab_registration_config = {
+  registration_token = "<registration token>"
+  tag_list           = "<your tags, comma separated>"
+  description        = "<some description>"
+  locked_to_project  = "true"
+  run_untagged       = "false"
+  maximum_timeout    = "3600"
+  # ref_protected runner will only run on pipelines triggered on protected branches. Defaults to not_protected
+  access_level       = "<not_protected OR ref_protected>"
+}
+```
+
+The registration token can also be read in via SSM parameter store. If no registration token is passed in, the module
+will look up the token in the SSM parameter store at the location specified by
+`runner_gitlab_registration_token_secure_parameter_store_name`.
+
+For migration to the new setup simply add the runner token to the parameter store. Once the runner is started it will look up the
+required values via the parameter store. If the value is `null` a new runner will be registered and a new token created/stored.
+
+```sh
+# set the following variables, look up the variables in your Terraform config.
+# see your Terraform variables to fill in the vars below.
+aws-region=<${var.aws_region}>
+token=<runner-token-see-your-gitlab-runner>
+parameter-name=<${var.environment}>-<${var.secure_parameter_store_runner_token_key}>
+
+aws ssm put-parameter --overwrite --type SecureString  --name "${parameter-name}" --value ${token} --region "${aws-region}"
+```
+
+Once you have created the parameter, you must remove the variable `runner_gitlab.registration_token` from your config. The next
+time your GitLab runner instance is created it will look up the token from the SSM parameter store.
+
+Finally, the runner still supports the manual runner creation. No changes are required. Please keep in mind that this setup will be
+removed in future releases.
+
+### Auto Scaling Group
+
+#### Scheduled scaling
+
+When `runner_schedule_enable=true`, the `runner_schedule_config` block can be used to scale the Auto Scaling group.
+
+Scaling may be defined with one `scale_out_*` scheduled action and/or one `scale_in_*` scheduled action.
+
+For example:
+
+```hcl
+  module "runner" {
+    # ...
+    runner_schedule_enable = true
+    runner_schedule_config = {
+      # Configure optional scale_out scheduled action
+      scale_out_recurrence = "0 8 * * 1-5"
+      scale_out_count      = 1 # Default for min_size, desired_capacity and max_size
+      # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size
+
+      # Configure optional scale_in scheduled action
+      scale_in_recurrence  = "0 18 * * 1-5"
+      scale_in_count       = 0 # Default for min_size, desired_capacity and max_size
+      # Override using: scale_out_min_size, scale_out_desired_capacity, scale_out_max_size
+    }
+  }
+```
+
+#### Instance Termination
+
+The Auto Scaling Group may be configured with a [lifecycle hook](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html)
+that executes a provided Lambda function when the runner is terminated to terminate additional instances that were spawned.
+
+The use of the termination lifecycle can be toggled using the `runner_enable_asg_recreation` variable.
+
+When using this feature, a `builds/` directory relative to the root module will persist that contains the packaged Lambda function.
+
+### Access the Runner instance
+
+A few option are provided to access the runner instance:
+
+1. Access via the Session Manager (SSM) by setting `runner_worker.ssm_access` to `true`. The policy to allow access via SSM is not
+   very restrictive.
+2. By setting none of the above, no keys or extra policies will be attached to the instance. You can still configure you own
+   policies by attaching them to `runner_role`.
+
+### GitLab runner cache
+
+By default the module creates a cache for the runner in S3. Old objects are automatically removed via a configurable life cycle
+policy on the bucket.
+
+Creation of the bucket can be disabled and managed outside this module. A good use case is for sharing the cache across multiple
+runners. For this purpose the cache is implemented as a sub module. For more details see the
+[cache module](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/modules/cache). An example implementation of
+this use case can be found in the [runner-public](https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-public)
+example.
+
+In case you enable the access logging for the S3 cache bucket, you have to add the following statement to your S3 logging bucket
+policy.
+
+```json
+{
+    "Sid": "Allow access logging",
+    "Effect": "Allow",
+    "Principal": {
+        "Service": "logging.s3.amazonaws.com"
+    },
+    "Action": "s3:PutObject",
+    "Resource": "<s3-arn>/*"
+}
+```
+
+## Removing the module
+
+As the module creates a number of resources during runtime (key pairs and spot instance requests), it needs a special
+procedure to remove them.
+
+1. Use the AWS Console to set the desired capacity of all auto-scaling groups to 0. To find the correct ones use the
+   `var.environment` as search criteria. Setting the desired capacity to 0 prevents AWS from creating new instances
+   which will in turn create new resources.
+2. Kill all agent ec2 instances via AWS Console. This triggers a Lambda function in the background which removes
+   all resources created during runtime of the EC2 instances.
+3. Wait 3 minutes so the Lambda function has enough time to delete the key pairs and spot instance requests.
+4. Run a `terraform destroy` or `terraform apply` (depends on your setup) to remove the module.
+
+If you don't follow the above procedure key pairs and spot instance requests might survive the removal and might cause
+additional costs. But I have never seen that. You should also be fine by executing step 4 only.