Skip to content

Commit

Permalink
chore(docs): update documentation for tfrun
Browse files Browse the repository at this point in the history
  • Loading branch information
corrieriluca committed Oct 9, 2023
1 parent 20ca3cb commit 4d70a9d
Show file tree
Hide file tree
Showing 11 changed files with 88 additions and 31 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Documentation
name: Documentation
on:
push:
branches:
Expand All @@ -9,7 +9,7 @@ jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: 3.x
Expand All @@ -19,5 +19,5 @@ jobs:
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install mkdocs-material
- run: pip install mkdocs-material
- run: mkdocs gh-deploy --force
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,9 @@ Dockerfile.cross
*.swp
*.swo
*~

# Python virtual environment (for mkdocs)
.env
.venv
env/
venv/
Binary file not shown.
Binary file added docs/assets/design/architecture-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/operator-manual/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ You can configure `burrito` with environment variables.
| `BURRITO_CONTROLLER_TIMERS_ONERROR` | period between two runners launch when an error occurred in the controllers | `1m` |
| `BURRITO_CONTROLLER_TIMERS_WAITACTION` | period between two runners launch when a layer is locked | `1m` |
| `BURRITO_CONTROLLER_TIMERS_FAILUREGRACEPERIOD` | initial time before retry, goes exponential function of number failure | `15s` |
| `BURRITO_CONTROLLER_TERRAFORMMAXRETRIES` | default number of retries for terraform runs (can be overriden in CRDs) | `5` |
| `BURRITO_CONTROLLER_LEADERELECTION_ENABLED` | whether leader election is enabled or not | `true` |
| `BURRITO_CONTROLLER_LEADERELECTION_ID` | lease id used for leader election | `6d185457.terraform.padok.cloud` |
| `BURRITO_CONTROLLER_HEALTHPROBEBINDADDRESS` | address to bind the health probe server embedded in the controllers | `:8081` |
Expand Down
55 changes: 41 additions & 14 deletions docs/operator-manual/architecture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Architectural Overview

<p align="center"><img src="../../assets/design/architecture-overview.excalidraw.png" width="1000px" /></p>
<p align="center"><img src="../../assets/design/architecture-overview.png" width="1000px" /></p>

## Components

Expand All @@ -14,16 +14,22 @@ Other features will be implemented when the Web UI will be in development.

### The repository Controller

The repository controller is a Kubernetes Controller which is only used to register `TerraformRepository` ressources.
The repository controller is a Kubernetes Controller which is only used to register `TerraformRepository` resources.

### The layer Controller

The layer controller is a Kubernetes Controller which continuously monitors declared `TerraformLayer` ressources.
It regularly starts runner pods which runs a `terraform plan` for each of your layer to check if a drift has been introduced.
If so, it has the possibility to run a `terraform apply`.
The layer controller is a Kubernetes Controller which continuously monitors declared `TerraformLayer` resources.
It regurlarly creates `TerraformRun` resources which run a `terraform plan` for each of your layer to check if a drift has been introduced.
If so, it has the possibility to create a `TerraformRun` that does a `terraform apply`.

It is also responsible for running your Terraform `plan` and `apply` if there is a new commit on your layer.

### The run Controller

The run controller is a Kubernetes Controller which continuously monitors declared `TerraformRun` resources.

It is responsible for running the `terraform plan` and `terraform apply` commands by creating runner pods. It handles failure and retries of the runner pods.

It also generates [`Leases`](https://kubernetes.io/docs/concepts/architecture/leases/) to make sure no concurrent terraform commands will be launched on the same layer at the same time.

### The Redis instance
Expand All @@ -40,34 +46,55 @@ The CLI used to start the different components is implemented using [`cobra`](ht

The status of a `TerraformLayer` is defined using the [conditions standards defined by the community](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties).

4 conditions are defined for a layer:
3 conditions are defined for a layer:

- `IsPlanArtifactUpToDate`. This condition is used for drift detection. The evaluation is made by compraing the timestamp of the last `terraform plan` which ran and the current date. The timestamp of the last plan is "stored" using an annotation.
- `IsApplyUpToDate`. This condition is used to check if an `apply` needs to run after the last `plan`. Comparison is made by comparing a checksum of the last planned binary and a checksum last applied binary stored in the annotations.
- `IsLastRelevantCommitPlanned`. This condition is used to check if a new commit has been made to the layer and need to be applied. It is evaluated by comparing the commit used for the last `plan`, the last commit which intoduced changes to the layer and the last commit made to the same branch of the repository. Those commits are "stored" as annotations.
- `IsInfailureGracePeriod`. This condition is used to check if a Terraform workflow has already failed. If so, we use an exponential backoff strategy before restarting a runner on the given layer.

!!! info
We use annotations to store information because we do not want to rely too heavily on the uptime of the Redis instance.

With those 4 conditions, we defined 4 states:
With those 3 conditions, we defined 3 states:

- `Idle`. This is the state of a layer if no runner needs be started
- `PlanNeeded`. This is the state of a layer if burrito needs to start a `plan` runner
- `ApplyNeeded`. This is the state of a layer if burrito needs to start an `apply` runner
- `FailureGracePeriod`. This is the state of a layer if a `plan` or `apply` runner has failed

!!! info
If you use [`dry` remediation strategy](../user-guide/remediation-strategy.md) and an apply is needed, the layer will stay in the `ApplyNeeded` as long as it does not need to enter the `PlanNeeded`.

The layer controller also generates the Kubernetes leases to avoid concurrent use of Terraform on the same layer.
### The TerraformRun Controller

!!! info
N.B. We use lease objects in order to not have to rely on the Redis instance for layer locking.
The status of a `TerraformRun` is also defined using the same [conditions standards defined by the community](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties).

5 conditions are defined for a run:

- `HasStatus`. This condition is used to check if a `TerraformRun` has already been reconciled by the controller.
- `HasReachedRetryLimit`. Used to check if a `TerraformRun` has reached the maximum number of retries.
- `HasSucceeded`. Used to check if a `TerraformRun` has already succeeded (runner pod exited successfully).
- `IsRunning`. Used to check if a `TerraformRun` is currently running by checking the current phase of its associated pod.
- `IsInfailureGracePeriod`. This condition is used to check if a Terraform workflow has already failed. If so, we use an exponential backoff strategy before restarting a runner on the given layer.

With those 5 conditions, we defined 6 states:

- `Initial`. This is the state of a run when it has just been created and has launched its first runner pod.
- `Running`. This is the state of a run if a runner pod is currently running.
- `FailureGracePeriod`. This is the state of a layer if a `plan` or `apply` runner has failed
- `Retrying`. This is an intermediate state of a run if a runner pod has failed and is being restarted (not in failure grace period anymore).
- `Succeeded`. This is one of the two final states a run can have. It means that the runner pod has exited successfully.
- `Failed`. This is the other final state a run can have. It means that the run has failed multiple times and has reached the maximum number of retries.

The layer controller is also responsible for registering runner pods to the Kubernetes API. We decided to use dynamic runners in order to be able to associate specific service accounts for each layers (each layer does not need the same access right to be planned and applied).
The `TerraformRun` controller also creates and deletes the [Kubernetes leases](https://kubernetes.io/docs/concepts/architecture/leases/) to avoid concurrent use of Terraform on the same layer.

!!! info
N.B.: We use lease objects in order to not have to rely on the Redis instance for layer locking.

### The runners

The runner image implementation heavily relies on golang libraries provided by hashicorp such as `tfexec`, `releases` and `product` which allows us to dynamically download and use any version of the Terraform binary.
The runner image implementation heavily relies on Golang libraries provided by Hashicorp such as [`tfexec`](https://github.com/hashicorp/terraform-exec) and [`hc-install`](https://github.com/hashicorp/hc-install) which allows us to dynamically download and use any version of the Terraform binary.
Thus, we support any existing version of Terraform.

The runners also support any existing version of [Terragrunt](https://terragrunt.gruntwork.io/).

The runner is responsible to update the annotations of the layer it is associated with to store information about what commit was plan/apply and when.
3 changes: 1 addition & 2 deletions docs/operator-manual/multi-tenant-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ With our [Helm chart](./install/with-helm.md) we provide a way to setup multi-te
The setup is split across multiple Kubernetes namespaces:

- `burrito-system` is where burrito's components live and operate (controllers, server, Redis)
- the other namespaces (`tenant-namespace-[1-3]` on the schema) where `TerraformRepository`, `TerraformLayer` and `TerraformPullRequest` resources live and where burrito spawns runner pods for Terraform `plan` and `apply` actions.
- the other namespaces (`tenant-namespace-[1-3]` on the schema) where `TerraformRepository`, `TerraformLayer`, `TerraformRun` and `TerraformPullRequest` resources live and where burrito spawns runner pods for Terraform `plan` and `apply` actions.

Thanks to Kubernetes native RBAC system you can restrict access for your users only to the namespaces their burrito resources live.

Expand Down Expand Up @@ -93,7 +93,6 @@ metadata:
spec:
terraform:
version: "1.5.3"
remediationStrategy: dry
path: "infra/layers/aws/production"
branch: "main"
repository:
Expand Down
6 changes: 4 additions & 2 deletions docs/user-guide/additionnal-trigger-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ spec:
terragrunt:
enabled: true
version: "0.45.4"
remediationStrategy: autoApply
remediationStrategy:
autoApply: true
path: "terragrunt/random-pets/test"
branch: "main"
repository:
Expand All @@ -44,7 +45,8 @@ spec:
terragrunt:
enabled: true
version: "0.45.4"
remediationStrategy: autoApply
remediationStrategy:
autoApply: true
path: "terragrunt/random-pets/test"
branch: "main"
repository:
Expand Down
6 changes: 4 additions & 2 deletions docs/user-guide/private-modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ spec:
terragrunt:
enabled: true
version: "0.45.4"
remediationStrategy: autoApply
remediationStrategy:
autoApply: true
path: "terragrunt/random-pets-private-module/test"
branch: main
repository:
Expand Down Expand Up @@ -113,7 +114,8 @@ spec:
terragrunt:
enabled: true
version: "0.45.4"
remediationStrategy: autoApply
remediationStrategy:
autoApply: true
path: "terragrunt/random-pets-private-module-ssh/test"
branch: main
repository:
Expand Down
33 changes: 26 additions & 7 deletions docs/user-guide/remediation-strategy.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,34 @@
# Choose a remediation strategy

Currently, 2 remediation strategies are handled.

| Strategy | Effect |
| :---------: | :-----------------------------------------------------------------: |
| `dry` | The operator will only run the `plan`. This is the default strategy |
| `autoApply` | If a `plan` is not up to date, it will run an `apply` |
The remediation strategy is the way to tell Burrito how it should handle the remediation of drifts on your Terraform layers.

As for the [runner spec override](./override-runner.md), you can specify a `spec.remediationStrategy` either on the `TerraformRepository` or the `TerraformLayer`.

The configuration of the `TerraformLayer` will take precedence.

## `spec.remediationStrategy` API reference

| Field | Type | Default | Effect |
| :------------------: | :-----: | :-------------------------------------------: | :-----------------------------------------------------------------------: |
| `autoApply` | Boolean | `false` | If `true` when a `plan` shows drift, it will run an `apply`. |
| `onError.maxRetries` | Integer | `5` or value defined in Burrito configuration | How many times Burrito should retry a `plan`/`apply` when a runner fails. |

!!! warning
This operator is still experimental. Use `spec.remediationStrategy: "autoApply"` at your own risk.
This operator is still experimental. Use `spec.remediationStrategy.autoApply: true` at your own risk.

## Example

With this example configuration, Burrito will create `apply` runs for this layer, with a maximum of 3 retries.

```yaml
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
name: random-pets-terragrunt
spec:
remediationStrategy:
autoApply: true
onError:
maxRetries: 3
# ... snipped ...
```
3 changes: 2 additions & 1 deletion docs/user-guide/terraform-version.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ spec:
terragrunt:
enabled: true
version: "0.44.5"
remediationStrategy: dry
remediationStrategy:
autoApply: false
path: "internal/e2e/testdata/terragrunt/random-pets/prod"
branch: "feat/handle-terragrunt"
repository:
Expand Down

0 comments on commit 4d70a9d

Please sign in to comment.