-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of security group updates #4726
Conversation
Is it possible to add an acceptance test that simulates the problem (and fails under the existing code) to ensure this works and also prevents future regressions that reintroduce the issue? |
Our test accounts have the default limits of 50 rules, but it may be easier to simulate with another error than over limit |
The skipped test I wrote ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really nice to get unit test coverage on collapseRules
, copyRule
, expandRules
, and idCollapseHash
.
@@ -1599,6 +1601,13 @@ func testAccCheckAWSSecurityGroupAttributesChanged(group *ec2.SecurityGroup) res | |||
group.IpPermissions[0], group.IpPermissions[1] | |||
} | |||
|
|||
if len(group.IpPermissions[1].IpRanges) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could probably just be rewritten as a sort on the IpRanges
if its necessary instead of this very explicit swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I really just copied the block above this one that does exactly same thing with IpPermissions. Although sort might be more universal, it does not add much value here and may be more expensive computationally.
@paultyng I added a test in a73fe74 The test will create a security group with one ingress block and two rules in it. It will then try to add a malformed security group to that block to trigger error on AWS side which normally causes all rules to be dropped. The test will then run plan again, and verify that it's empty after API error. |
Linking a previous issue here for context: hashicorp/terraform#5100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this @noggi -- I know you've been working with @paultyng here, but I also left some initial feedback as well. Mainly this is surrounding the new collapse/copy/expand rule functions to start.
I think we also should have some reservations about what impact this might have against existing Terraform configurations and potential differences when folks upgrade the provider, especially with multiple parts to their ingress/egress configuration. Historically speaking, working with the EC2 API and especially with regards to security group rule logic is a non-trivial exercise to ensure compatibility.
While we have some of the existing acceptance tests looking specifically for the schema.TypeSet
hashes (e.g. the ingress.3629188364.X
attributes in TestAccAWSSecurityGroup_basic
), I think we should fully ensure we have covered all the tests with that level of checking before proceeding to ensure this is not a breaking change for anyone. I'll make a task for myself to get those acceptance test hash checks added for the existing testing by end of this week so this is not held up by that and provide a reference link to that pull request when its ready.
aws/resource_aws_security_group.go
Outdated
@@ -1143,6 +1143,118 @@ func matchRules(rType string, local []interface{}, remote []map[string]interface | |||
return saves | |||
} | |||
|
|||
func collapseRules(ruleset string, rules []interface{}) []interface{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new functions are quite complicated to understand with quick reading. Can we please:
- Add Go documentation explaining why each function exists and what they are intended to accomplish
- Add unit testing to explain the usage a little better and prevent future regressions
One other minor nitpick: the provider currently uses a very flat Go package structure, which means function names span all AWS services -- could we rename these new functions to be more specific as other services also have "rules", e.g. collapaseEc2SecurityGroupRules
Sorry for the slight delay, I have submitted #4891 to bolster the other parts of the acceptance testing of the resource. I still believe the unit testing and Go documentation should be added here. |
This PR is now ready for rebase with master that has additional acceptance testing. 👍 |
Currently, when any single rule is added or removed to/from a security group, Terraform will revoke and re-authorize all rules in the given ingress/egress block. This leads to a couple of undesirable effects: 1. Rules that didn't change are always removed and re-added back on apply, causing temporary packet loss, and potentially - outage. 2. Due to EC2 API specifics, if any single rule in the given ingress/egress block cannot be authorized because of an error (e.g. wrong netmask, non-existent security group, etc), no rules are added at all after revoke, and the security group may be left blank causing permanent packet loss, outage, and manual intervention is then required to recover. This patch improves the way security groups are updated so that only rules that actually changed, are authorized/revoked, which avoids unnecessary revocation of entire ingress/egress blocks and reduces the possibility of an outage.
@bflad Hey sorry I'm currently busy working on some other project, I can add docs/more tests early next week if that works. Just rebased the branch for now. |
@bflad I added comments and test coverage for the new functions. Also renamed as suggested. Let me know if that addresses your concerns. |
Thanks @noggi! I'll be taking a look at this first thing tomorrow morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to go 🚀
Tests passed: 57, ignored: 1
=== RUN TestAccAWSSecurityGroupRule_ExpectInvalidTypeError
--- PASS: TestAccAWSSecurityGroupRule_ExpectInvalidTypeError (2.75s)
=== RUN TestAccAWSSecurityGroupRule_ExpectInvalidCIDR
--- PASS: TestAccAWSSecurityGroupRule_ExpectInvalidCIDR (2.84s)
=== RUN TestAccAWSSecurityGroupRule_Issue5310
--- PASS: TestAccAWSSecurityGroupRule_Issue5310 (16.77s)
=== RUN TestAccAWSSecurityGroupRule_Ingress_Classic
--- PASS: TestAccAWSSecurityGroupRule_Ingress_Classic (17.70s)
=== RUN TestAccAWSSecurityGroupRule_Egress
--- PASS: TestAccAWSSecurityGroupRule_Egress (19.92s)
=== RUN TestAccAWSSecurityGroupRule_Ingress_Protocol
--- PASS: TestAccAWSSecurityGroupRule_Ingress_Protocol (21.98s)
=== RUN TestAccAWSSecurityGroup_importIPRangesWithSameRules
--- PASS: TestAccAWSSecurityGroup_importIPRangesWithSameRules (31.90s)
=== RUN TestAccAWSSecurityGroupRule_MultiIngress
--- PASS: TestAccAWSSecurityGroupRule_MultiIngress (34.26s)
=== RUN TestAccAWSSecurityGroup_importBasic
--- PASS: TestAccAWSSecurityGroup_importBasic (35.03s)
=== RUN TestAccAWSSecurityGroupRule_IngressDescription
--- PASS: TestAccAWSSecurityGroupRule_IngressDescription (21.18s)
=== RUN TestAccAWSSecurityGroupRule_PartialMatching_basic
--- PASS: TestAccAWSSecurityGroupRule_PartialMatching_basic (38.12s)
=== RUN TestAccAWSSecurityGroupRule_EgressDescription
--- PASS: TestAccAWSSecurityGroupRule_EgressDescription (20.62s)
=== RUN TestAccAWSSecurityGroupRule_Ingress_Ipv6
--- PASS: TestAccAWSSecurityGroupRule_Ingress_Ipv6 (44.38s)
=== RUN TestAccAWSSecurityGroupRule_IngressDescription_updates
--- PASS: TestAccAWSSecurityGroupRule_IngressDescription_updates (25.49s)
=== RUN TestAccAWSSecurityGroupRule_SelfSource
--- PASS: TestAccAWSSecurityGroupRule_SelfSource (42.74s)
=== RUN TestAccAWSSecurityGroup_importSelf
--- PASS: TestAccAWSSecurityGroup_importSelf (50.49s)
=== RUN TestAccAWSSecurityGroupRule_PrefixListEgress
--- PASS: TestAccAWSSecurityGroupRule_PrefixListEgress (53.80s)
=== RUN TestAccAWSSecurityGroup_namePrefix
--- PASS: TestAccAWSSecurityGroup_namePrefix (12.76s)
=== RUN TestAccAWSSecurityGroup_basic
--- PASS: TestAccAWSSecurityGroup_basic (23.97s)
=== RUN TestAccAWSSecurityGroup_importIpv6
--- PASS: TestAccAWSSecurityGroup_importIpv6 (58.90s)
=== RUN TestAccAWSSecurityGroup_importIPRangeAndSecurityGroupWithSameRules
--- PASS: TestAccAWSSecurityGroup_importIPRangeAndSecurityGroupWithSameRules (61.24s)
=== RUN TestAccAWSSecurityGroup_vpcNegOneIngress
--- PASS: TestAccAWSSecurityGroup_vpcNegOneIngress (20.50s)
=== RUN TestAccAWSSecurityGroup_ruleGathering
--- PASS: TestAccAWSSecurityGroup_ruleGathering (38.44s)
=== RUN TestAccAWSSecurityGroup_ipv6
--- PASS: TestAccAWSSecurityGroup_ipv6 (37.57s)
=== RUN TestAccAWSSecurityGroup_vpc
--- PASS: TestAccAWSSecurityGroup_vpc (30.69s)
=== RUN TestAccAWSSecurityGroup_importPrefixList
--- PASS: TestAccAWSSecurityGroup_importPrefixList (76.53s)
=== RUN TestAccAWSSecurityGroup_invalidCIDRBlock
--- PASS: TestAccAWSSecurityGroup_invalidCIDRBlock (1.03s)
=== RUN TestAccAWSSecurityGroupRule_SelfReference
--- PASS: TestAccAWSSecurityGroupRule_SelfReference (79.39s)
=== RUN TestAccAWSSecurityGroup_vpcProtoNumIngress
--- PASS: TestAccAWSSecurityGroup_vpcProtoNumIngress (27.31s)
=== RUN TestAccAWSSecurityGroup_DefaultEgress_Classic
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_Classic (12.22s)
=== RUN TestAccAWSSecurityGroup_drift
--- PASS: TestAccAWSSecurityGroup_drift (10.88s)
=== RUN TestAccAWSSecurityGroup_DefaultEgress_VPC
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_VPC (17.70s)
=== RUN TestAccAWSSecurityGroup_MultiIngress
--- PASS: TestAccAWSSecurityGroup_MultiIngress (32.11s)
=== RUN TestAccAWSSecurityGroup_importSourceSecurityGroup
--- PASS: TestAccAWSSecurityGroup_importSourceSecurityGroup (100.06s)
=== RUN TestAccAWSSecurityGroup_generatedName
--- PASS: TestAccAWSSecurityGroup_generatedName (40.29s)
=== RUN TestAccAWSSecurityGroup_ruleLimitCidrBlockExceededAppend
=== RUN TestAccAWSSecurityGroupRule_EgressDescription_updates
--- PASS: TestAccAWSSecurityGroupRule_EgressDescription_updates (80.08s)
=== RUN TestAccAWSSecurityGroup_Change
--- PASS: TestAccAWSSecurityGroup_Change (46.31s)
=== RUN TestAccAWSSecurityGroup_ingressWithCidrAndSGs_classic
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs_classic (20.51s)
=== RUN TestAccAWSSecurityGroupRule_Ingress_VPC
--- PASS: TestAccAWSSecurityGroupRule_Ingress_VPC (111.73s)
=== RUN TestAccAWSSecurityGroup_CIDRandGroups
--- PASS: TestAccAWSSecurityGroup_CIDRandGroups (33.65s)
=== RUN TestAccAWSSecurityGroupRule_PartialMatching_Source
--- PASS: TestAccAWSSecurityGroupRule_PartialMatching_Source (113.03s)
=== RUN TestAccAWSSecurityGroup_failWithDiffMismatch
--- PASS: TestAccAWSSecurityGroup_failWithDiffMismatch (25.31s)
=== RUN TestAccAWSSecurityGroup_ingressWithCidrAndSGs
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs (37.05s)
=== RUN TestAccAWSSecurityGroup_ipv4andipv6Egress
--- PASS: TestAccAWSSecurityGroup_ipv4andipv6Egress (33.93s)
=== RUN TestAccAWSSecurityGroup_drift_complex
--- PASS: TestAccAWSSecurityGroup_drift_complex (48.73s)
=== RUN TestAccAWSSecurityGroup_egressWithPrefixList
--- PASS: TestAccAWSSecurityGroup_egressWithPrefixList (47.82s)
=== RUN TestAccAWSSecurityGroup_ruleLimitExceededAppend
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAppend (40.67s)
=== RUN TestAccAWSSecurityGroup_rulesDropOnError
--- PASS: TestAccAWSSecurityGroup_rulesDropOnError (45.38s)
=== RUN TestAccAWSSecurityGroup_ruleLimitExceededAllNew
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAllNew (50.86s)
=== RUN TestAccAWSSecurityGroup_RuleDescription
--- PASS: TestAccAWSSecurityGroup_RuleDescription (99.34s)
=== RUN TestAccAWSSecurityGroup_ruleLimitExceededPrepend
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededPrepend (57.03s)
=== RUN TestAccAWSSecurityGroup_self
--- PASS: TestAccAWSSecurityGroup_self (119.86s)
=== RUN TestAccAWSSecurityGroupRule_MultiDescription
--- PASS: TestAccAWSSecurityGroupRule_MultiDescription (151.48s)
=== RUN TestAccAWSSecurityGroupRule_Race
--- PASS: TestAccAWSSecurityGroupRule_Race (191.24s)
=== RUN TestAccAWSSecurityGroup_tags
--- PASS: TestAccAWSSecurityGroup_tags (127.70s)
=== RUN TestAccAWSSecurityGroup_forceRevokeRules_false
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_false (648.72s)
=== RUN TestAccAWSSecurityGroup_forceRevokeRules_true
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_true (672.23s)
This has been released in version 1.27.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Currently, when any single rule is added or removed to/from a security group, Terraform revokes and re-authorizes all the rules in the given ingress/egress block. This leads to a couple of undesirable effects:
This patch improves the way security groups are updated so that only rules that actually changed are authorized/revoked, which avoids unnecessary revocation of entire ingress/egress blocks and reduces the possibility of an outage.
Note: this PR also integrates tests from #4665
Output from acceptance testing:
@paultyng