Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document migration from v1 to v2 #42

Merged
merged 3 commits into from
Dec 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .editorconfig
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# top-most EditorConfig file
root = true

# Unix-style newlines with a newline ending every file
[*]
charset = utf-8
Expand All @@ -13,7 +16,9 @@ indent_style = space

[*.md]
max_line_length = 0
trim_trailing_whitespace = false
trim_trailing_whitespace = true
intent_style = space
indent_size = 2

# Override for Makefile
[{Makefile, makefile, GNUmakefile, Makefile.*}]
Expand Down
1 change: 0 additions & 1 deletion .github/auto-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ version-resolver:
- 'bugfix'
- 'bug'
- 'hotfix'
- 'no-release'
default: 'minor'

categories:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/validate-codeowners.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ jobs:
steps:
- name: "Checkout source code at current commit"
uses: actions/checkout@v2
# Leave pinned at 0.7.1 until https://github.com/mszostok/codeowners-validator/issues/173 is resolved
- uses: mszostok/codeowners-validator@v0.7.1
if: github.event.pull_request.head.repo.full_name == github.repository
name: "Full check of CODEOWNERS"
Expand Down
92 changes: 63 additions & 29 deletions README.md

Large diffs are not rendered by default.

85 changes: 59 additions & 26 deletions README.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,12 @@ usage: |-

### Avoiding Service Interruptions

It is desirable to avoid having service interruptions when updating a security group. This is not always possible
due to a combination of the way Terraform organizes its activities and the fact that AWS will reject
an attempt to create a duplicate of an existing security group rule. There is also the issue that while many
AWS resources can be associated with and disassociated from security groups at any time, some
may not have their security group association changed, and an attempt to change the security group will
cause Terraform to delete and recreate the resource.
It is desirable to avoid having service interruptions when updating a security group. This is not always
possible due to the way Terraform organizes its activities and the fact that AWS will reject an attempt
to create a duplicate of an existing security group rule. There is also the issue that while most AWS
resources can be associated with and disassociated from security groups at any time, there remain some
that may not have their security group association changed, and an attempt to change their security group
will cause Terraform to delete and recreate the resource.

#### The 2 Ways Security Group Changes Cause Service Interruptions

Expand All @@ -95,20 +95,21 @@ usage: |-
to update the rule to reference the new security group.

The key question you need to answer to decide which configuration to use is "will anything break
if the security group ID changes". If no, then use the defaults `create_before_destroy = true` and
if the security group ID changes". If not, then use the defaults `create_before_destroy = true` and
`preserve_security_group_id = false` and do not worry about providing "keys" for
security group rules. This is the default because it is the easiest and safest solution when
the way the security group is being used allows it.

If things will break when the security group ID changes, then set `preserve_security_group_id`
to `true` and read the information about keys and single source Terraform rules if you want
to mitigate against service interruptions caused by rule changes. Note that even in this case,
you probably want to keep `create_before_destroy = true` because otherwise, if some change
requires the security group to be replaced, Terraform will likely succeed in deleting all the
security group rules but fail to delete the security group itself, leaving the associated resources
completely inaccessible. At least with `create_before_destroy = true`, the new security group
will be created and will be used where Terraform is able to make the changes, even though
the old security group will still fail to be deleted.
to `true`. Also read and follow the guidance below about [keys](#the-importance-of-keys) and
[limiting Terraform security group rules to a single AWS security group rule](#terraform-rules-vs-aws-rules)
if you want to mitigate against service interruptions caused by rule changes.
Note that even in this case, you probably want to keep `create_before_destroy = true` because otherwise,
if some change requires the security group to be replaced, Terraform will likely succeed
in deleting all the security group rules but fail to delete the security group itself,
leaving the associated resources completely inaccessible. At least with `create_before_destroy = true`,
the new security group will be created and used where Terraform can make the changes,
even though the old security group will still fail to be deleted.

#### The 3 Ways to Mitigate Against Service Interruptions

Expand Down Expand Up @@ -230,17 +231,18 @@ usage: |-
then you will have merely recreated the initial problem with using a plain list. If you cannot attach
meaningful keys to the rules, there is no advantage to specifying keys at all.

**But wait, there's more!***
#### Terraform Rules vs AWS Rules

A single security group rule input can actually specify multiple security group rules. For example,
A single security group rule input can actually specify multiple AWS security group rules. For example,
`ipv6_cidr_blocks` takes a list of CIDRs. However, AWS security group rules do not allow for a list
of CIDRs, so the AWS Terraform provider converts that list of CIDRs into a list of AWS security group rules,
one for each CIDR. (This is the underlying cause of several AWS Terraform provider bugs,
such as [#25173](https://github.com/hashicorp/terraform-provider-aws/issues/25173).)
As of this writing, any change to any such element of a rule will cause
As of this writing, any change to any element of such a rule will cause
all the AWS rules specified by the Terraform rule to be deleted and recreated, causing the same kind of
service interruption we sought to avoid by providing keys for the rules. To guard against this issue,
when using "destroy before create" behavior, you should avoid the convenience of specifying multiple AWS rules
service interruption we sought to avoid by providing keys for the rules, or, when create_before_destroy = true,
causing a complete failure as Terraform tries to create duplicate rules which AWS rejects. To guard against this issue,
when not using the default behavior, you should avoid the convenience of specifying multiple AWS rules
in a single Terraform rule and instead create a separate Terraform rule for each source or destination specification.

##### `rules` and `rules_map` inputs
Expand Down Expand Up @@ -347,12 +349,39 @@ usage: |-
above in "Why the input is so complex", each object in the list must be exactly the same type. To use multiple types,
you must put them in separate lists and put the lists in a map with distinct keys.

Example:

```hcl
rules_map = {
ingress = [{
key = "ingress"
type = "ingress"
from_port = 0
to_port = 2222
protocol = "tcp"
cidr_blocks = module.subnets.nat_gateway_public_ips
self = null
description = "2222"
}],
egress = [{
key = "egress"
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
self = null
description = "All output traffic"
}]
}
```

###### Definition of a Rule

For this module, a rule is defined as an object.
- The attributes and values of the rule objects are fully compatible (have the same keys and accept the same values) as the
Terraform [aws_security_group_rule resource](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group_rule),
except
except:
- The `security_group_id` will be ignored, if present
- You can include an optional `key` attribute. If present, its value must be unique among all security group rules in the
security group, and it must be known in the Terraform "plan" phase, meaning it cannot depend on anything being
Expand All @@ -366,9 +395,13 @@ usage: |-


##### `rule_matrix` Input

The other way to set rules is via the `rule_matrix` input. This splits the attributes of the `aws_security_group_rule`
resource into two sets: one set defines the rule and description, the other set defines the subjects of the rule.
Again, optional "key" values can provide stability, but cannot contain derived values.
Again, optional "key" values can provide stability, but cannot contain derived values. This input is an attempt
at convenience, and should not be used unless you are using the default settings of `create_before_destroy = true` and
`preserve_security_group_id = false`, or else a number of failure modes or service interruptions are possible: use
`rules_map` instead.

As with `rules` and explained above in "Why the input is so complex", all elements of the list must be the exact same type.
This also holds for all the elements of the `rules_matrix.rules` list. Because `rule_matrix` is already
Expand Down Expand Up @@ -425,11 +458,11 @@ usage: |-
can make a small change look like a big one when viewing the output of Terraform plan,
and will likely cause a brief (seconds) service interruption.

You can avoid this for the most part by providing the optional keys, and limiting each rule
to a single source or destination. Rules with keys will not be
You can avoid this for the most part by providing the optional keys, and [limiting each rule
to a single source or destination](#terraform-rules-vs-aws-rules). Rules with keys will not be
changed if their keys do not change and the rules themselves do not change, except in the case of
`rule_matrix`, where the rules are still dependent on the order of the security groups in
`source_security_group_ids`. You can avoid this by using `rules` instead of `rule_matrix` when you have
`source_security_group_ids`. You can avoid this by using `rules` or `rules_map` instead of `rule_matrix` when you have
more than one security group in the list. You cannot avoid this by sorting the
`source_security_group_ids`, because that leads to the "Invalid `for_each` argument" error
because of [terraform#31035](https://github.com/hashicorp/terraform/issues/31035).
Expand Down Expand Up @@ -542,7 +575,7 @@ examples: |-
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
self = null
self = null # preferable to self = false
description = "Allow SSH from anywhere"
},
{
Expand Down
125 changes: 125 additions & 0 deletions docs/migration-v1-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Migration Notes for Security Group v2.0

## Key changes in v2.0
- `create_before_destory` default changed from `false` to `true`
- `preserve_security_group_id` added, defaults to `false`
- Terraform version 1.0.0 or later required

## Migration Guide

The defaults under v1 were the equivalent of the v2
`create_before_destroy = false` and `preserve_security_group_id = true`.
This combination is not allowed under v2 (`preserve_security_group_id` is ignored
when `create_before_destory` is `false`), because it causes Terraform to fail
by trying to create duplicate security group rules. Therefore, something must
change. Asses your tolerance for change and choose one of the following options.

Note: This migration guide is for the case where you are using this module,
perhaps indirectly, as a component of a larger Terraform configuration that is
all managed by a single Terraform state file. If you are using this module in
some other way, you will need to extrapolate the instructions to fit your situation.

### Adjust your timeout

At least during migration, you may want to shorten `security_group_delete_timeout`
to something like 3 minutes. This is because there is a high likelihood that
Terraform will want to delete the existing security group (and create a new one)
before removing everything in the group. This will fail, and there is no point
in waiting 15 minutes for it to fail.

Alternately, you may want to use `terraform state mv` to move the existing
`create_before_destroy = false` security group to the new
`create_before_destroy = true` Terraform state address. Terraform will still
want to delete the old security group because its name has changed,
but it will create a new one first. You might want to lengthen the timeout
so that you can manually move resources to the new security group and remove
them from the old group so that the delete will succeed before it times out.

### Assess your situation

Please read the [README](https://github.com/cloudposse/terraform-aws-security-group/#avoiding-service-interruptions) for this module,
at least the section titled "Avoiding Service Interruptions", and determine your desired final configuration.
For the purposes of migration, we are mainly concerned with the settings for `create_before_destroy` and `preserve_security_group_id`.

Three key questions for you to answer:

1. Did you already set `create_before_destroy = true` in your configuration?
2. Do you need to preserve the security group ID?
3. Are there resources outside this Terraform plan that reference the security group?
1. Can you tolerate an interruption in network access to your resources?

#### Did you already set `create_before_destroy = true` in your configuration?

##### Was `true`, staying `true` is the best case
If you did, then migration will be a lot easier. If you are comfortable with
the default `preserve_security_group_id` setting of `false`, then the
upgrade will probably succeed without a service outage without need
for any special action on your part.

##### Was `false`, staying `false` is discouraged

If you did not previously set `create_before_destroy = true`, and want to
preserve the previous default by now explicitly setting `create_before_destroy = false`,
the security group rules will be deleted and recreated. This will cause a service
interruption, as will any future change to the security group rules, because
current rules will be deleted before new ones are created. Changes
necessitating a new security group will cause longer service interruptions,
because the security group will be deleted before the new one is created,
and before it can be deleted it will be disassociated from all resources,
leaving them without network access during the process.

##### Was `false`, switching to `true` is what most people are facing

If you did not previously set `create_before_destroy = true`, and want
to switch to that setting now (highly recommended), then the
existing security group will be destroyed. (This is a requirement because
security group names cannot be modified and must be unique, so
in order to support `create_before_destroy` the name must include a generated suffix
so that the new security group has a different name than the existing one.) Without
some intervention on your part, Terraform will fail, because it will try to delete
the existing security group before it has disassociated all the resources from it.
There is no avoiding this, but you can mitigate the impact by running
`terraform plan` to find the Terraform state addresses of the old and new
security groups, and then use `terraform state mv` to move the old security group
to the new address. This will cause Terraform to create the new security group
before deleting the old one. You can then manually move resources to the new
security group and remove them from the old one, so that the delete will succeed.



#### Do you need to preserve the security group ID?

If the security group ID is referenced by resources (such as security group rules
in other security groups) outside this Terraform plan, then you want to
preserve the security group ID where possible. In that case, you should set

```hcl
create_before_destroy = true
preserve_security_group_id = true
```

Setting `preserve_security_group_id` to `true` will cause a service
interruption, as will any future change to the security group rules, because
current rules will be deleted before new ones are created.
This is a limitation of the AWS provider: it is not smart enough to
know to leave in place (rather than delete and recreate) security group
rules, and attempts to create a duplicate security group rule will fail,
so existing rules are deleted and then new ones are created.


#### Use the default configuration if you can

If:

1. The security group ID is **_NOT_** referenced by resources (such as security group rules
in other security groups) outside this Terraform plan, _and_
2. the resources associated with the security group allow the associations to be changed without requiring
the resources themselves to be destroyed and recreated
3. you can tolerate an interruption in network access to your resources one time during the upgrade process

Then we recommend explicitly configuring this module with its defaults:

```hcl
create_before_destroy = true
preserve_security_group_id = false
```
Loading