Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASG Instance Security Group AllowOutbound Bug #987

Closed
moofish32 opened this issue Oct 22, 2018 · 18 comments · Fixed by #1021 or #998
Closed

ASG Instance Security Group AllowOutbound Bug #987

moofish32 opened this issue Oct 22, 2018 · 18 comments · Fixed by #1021 or #998
Labels
bug This issue is a bug.

Comments

@moofish32
Copy link
Contributor

If you create an ASG with the L2 in version 0.13.0 and set allowOutbound: true (also the default) and later set up rules that can not be inlined, the outbound access is removed. The CFN template does not remove it, but it also does not work.

This EKS Example is a working a reproduction of the bug: https://github.com/moofish32/cdk-eks-example/blob/test/lib/eks-node-group.ts#L46-L94

You can see the synth output has the outbound but if you build this stack (you'll need to build the cluster first).

This might be solve-able with a documentation update, but it seems we should likely fix this? @rix0rrr I think you have an open doc task.

@rix0rrr rix0rrr added the bug This issue is a bug. label Oct 22, 2018
@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

If this really behaves that way that's certainly a bug.

The doc task I have open is about the default behavior of Security Groups, which behaves by allowing all outbound if you don't add any other rules, but I believe that's default CloudFormation behavior.

Wondering if we should work around that, but that's another issue. This must be about an explicit addEgressRule() that would have to be dropped on the floor.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

I don't see the allowAllOutbound: true in your example though?

@moofish32
Copy link
Contributor Author

I can set it that way too, sorry testing locally getting ready to push. The template output and failing userdata are pretty good indications. I don't believe you can combine egress rules and inline egress. Other tools have prevented this due to inconsistent behaviors. Let me get another push and I'll validate.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

/me expresses surprise

@moofish32
Copy link
Contributor Author

moofish32 commented Oct 22, 2018

Without pasting the whole YAML file, here is the key snippet:

    WorkersInstanceSecurityGroup65472717:
        Type: 'AWS::EC2::SecurityGroup'
        Properties:
            GroupDescription: EksWorkers/Workers/InstanceSecurityGroup
            SecurityGroupEgress:
                -
                    CidrIp: 0.0.0.0/0
                    Description: Outbound traffic allowed by default
                    FromPort: -1
                    IpProtocol: '-1'
                    ToPort: -1
            SecurityGroupIngress: []
            Tags:
                -
                    Key: Name
                    Value: EksWorkers/Workers
                -
                    Key: kubernetes.io/cluster/EksExample
                    Value: owned
                -
                    Key: NodeType
                    Value: Worker
            VpcId:
                'Fn::ImportValue': 'EksCluster:EksVpcVpcId48D9E242'

This screen shot has the only outbound rule.

screen shot 2018-10-22 at 10 34 54 am

@moofish32
Copy link
Contributor Author

@rix0rrr these were the changes I had to make moofish32/cdk-eks-example#2

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

Maybe inline rules and object rules cannot be combined or somesuch

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

Well yeah the extra sg makes it work because empty sg's come with all outbound traffic enabled by default

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

Maybe it's the '-1' protocol string that doesn't work

@moofish32
Copy link
Contributor Author

Maybe inline rules and object rules cannot be combined or somesuch

Perhaps we should isolate this to test.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

Oh I bet I know what the issue is.

In order to achieve the behavior "empty SGs start out with everything enabled by default" but then lose that default once rules are added, I bet it works like this:

  1. Security Group is created, gets created with "all outbound traffic" rule if there are no static egress rules.
  2. Dynamic egress rule is added, if there is an "all outbound traffic" rule, it gets removed.

It just so happens that the second step does not make a distinction between an "all outbound" rule added as a default, vs an "all outbound" rule added by us on purpose. It deletes it in either case.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

Confirmed. The following template shows the behavior:

If the Egress rule is absent, the SG shows the outbound rule just fine, but if you add the egress rule the default rule disappears.

---
Resources:
    WorkersInstanceSecurityGroup65472717:
        Type: 'AWS::EC2::SecurityGroup'
        Properties:
            GroupDescription: EksWorkers/Workers/InstanceSecurityGroup
            SecurityGroupEgress:
                -
                    CidrIp: 0.0.0.0/0
                    Description: Outbound traffic allowed by default
                    FromPort: -1
                    IpProtocol: '-1'
                    ToPort: -1
            SecurityGroupIngress: []
            Tags:
                -
                    Key: Name
                    Value: EksWorkers/Workers
                -
                    Key: kubernetes.io/cluster/EksExample
                    Value: owned
                -
                    Key: NodeType
                    Value: Worker
            VpcId: vpc-60900905
    WorkersInstanceSecurityGrouptoEksWorkersControlPlaneSG070CB121102565535E8766439:
        Type: 'AWS::EC2::SecurityGroupEgress'
        Properties:
            GroupId:
                'Fn::GetAtt':
                    - WorkersInstanceSecurityGroup65472717
                    - GroupId
            IpProtocol: tcp
            Description: 'to somewhere else'
            FromPort: 1234
            ToPort: 1234
            CidrIp: '1.2.3.4/32'

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

It's actually EC2 itself that's doing this.

Just moving the egress rule to a separate Rule object doesn't help, because if the ordering is "just-so", the rule still gets removed:

---
Resources:
    Group111:
        Type: 'AWS::EC2::SecurityGroup'
        Properties:
            GroupDescription: Test security group
            SecurityGroupEgress: []
            SecurityGroupIngress: []
            VpcId: vpc-60900905
    Everything111:
        Type: 'AWS::EC2::SecurityGroupEgress'
        Properties:
            GroupId:
                'Fn::GetAtt':
                    - Group111
                    - GroupId
            IpProtocol: '-1'
            Description: 'Yeah'
            FromPort: -1
            ToPort: -1
            CidrIp: 0.0.0.0/0
    Specific111:
        Type: 'AWS::EC2::SecurityGroupEgress'
        Properties:
            GroupId:
                'Fn::GetAtt':
                    - Group111
                    - GroupId
            IpProtocol: tcp
            Description: 'to somewhere else'
            FromPort: 1234
            ToPort: 1234
            CidrIp: '1.2.3.4/32'
        DependsOn:  [Everything111]

Still leaves the rule nonexistent (because the specific rule gets applied later on).

The only solution is to apply the "all outbound" rule last, or make a situation in which the "default delete" doesn't apply anymore, probably by making more rules or by changing the format ever so slightly (get rid of some of the -1s, for example).

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

Looks like it's the CIDR 0.0.0.0/0 and the protocol -1 that are the identifiers for this rule.

Changing the CIDR to 0.0.0.1/0 doesn't help, because it gets normalized back to 0.0.0.0/0.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

It's not about the default rule being the only one either, because if I add a bogus rule in (so that there are 2 at the moment that AuthorizeSecurityGroupEgress gets called) the all traffic rule still gets picked out and deleted.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

A colleague helpfully pointed out that any other rule in addition to "allow all outbound traffic" is useless anyway. Which is true.

But to solve that, we need to lift this concept into the SecurityGroup class, and what's worse: we then also need to allow disabling it, which requires adding a bogus rule that matches NO traffic (so that the default "all traffic" rule gets cleaned up).

rix0rrr pushed a commit that referenced this issue Oct 23, 2018
Make "allowAllOutbound" traffic a property of a SecurityGroup. By making
the SecurityGroup aware of this configuration property, we can make sure
that future egress rules don't get added to the SecurityGroup. There's
no need to, because all traffic is allowed by default anyway.

If we don't do this, our behavior of adding reciprocal rules between
SecurityGroups conflicts with CloudFormation's behavior to strip the
"default outbound" rule when another rule gets added, and it becomes
impossible to configure "all outbound" traffic on a
SecurityGroup.

Fixes #987.
rix0rrr added a commit that referenced this issue Oct 25, 2018
Fix the issue where "all outbound traffic allowed" rules would be
overwritten if any other egress rules are added to the Security Group.

To solve this, we make `allowAllOutbound` a property of a SecurityGroup
(defaults to `true`). By making the SecurityGroup aware of this
configuration property, we can make sure that future egress rules don't
get added to the SecurityGroup. There's no need to, and adding them 
would only make CloudFormation delete the "all traffic allowed" rule.

Also update documentation on Security Groups in the `README`.

Fixes #987.
rix0rrr pushed a commit that referenced this issue Oct 26, 2018
__IMPORTANT NOTE__: when upgrading to this version of the CDK framework, you must also upgrade
your installation the CDK Toolkit to the matching version:

```shell
$ npm i -g aws-cdk
$ cdk --version
0.14.0 (build ...)
```

Bug Fixes
=========

* remove CloudFormation property renames ([#973](#973)) ([3f86603](3f86603)), closes [#852](#852)
* **aws-ec2:** fix retention of all egress traffic rule ([#998](#998)) ([b9d5b43](b9d5b43)), closes [#987](#987)
* **aws-s3-deployment:** avoid deletion during update using physical ids ([#1006](#1006)) ([bca99c6](bca99c6)), closes [#981](#981) [#981](#981)
* **cloudformation-diff:** ignore changes to DependsOn ([#1005](#1005)) ([3605f9c](3605f9c)), closes [#274](#274)
* **cloudformation-diff:** track replacements ([#1003](#1003)) ([a83ac5f](a83ac5f)), closes [#1001](#1001)
* **docs:** fix EC2 readme for "natgatway" configuration ([#994](#994)) ([0b1e7cc](0b1e7cc))
* **docs:** updates to contribution guide ([#997](#997)) ([b42e742](b42e742))
* **iam:** Merge multiple principals correctly ([#983](#983)) ([3fc5c8c](3fc5c8c)), closes [#924](#924) [#916](#916) [#958](#958)

Features
=========

* add construct library for Application AutoScaling ([#933](#933)) ([7861c6f](7861c6f)), closes [#856](#856) [#861](#861) [#640](#640) [#644](#644)
* add HostedZone context provider ([#823](#823)) ([1626c37](1626c37))
* **assert:** haveResource lists failing properties ([#1016](#1016)) ([7f6f3fd](7f6f3fd))
* **aws-cdk:** add CDK app version negotiation ([#988](#988)) ([db4e718](db4e718)), closes [#891](#891)
* **aws-codebuild:** Introduce a CodePipeline test Action. ([#873](#873)) ([770f9aa](770f9aa))
* **aws-sqs:** Add grantXxx() methods ([#1004](#1004)) ([8c90350](8c90350))
* **core:** Pre-concatenate Fn::Join ([#967](#967)) ([33c32a8](33c32a8)), closes [#916](#916) [#958](#958)

BREAKING CHANGES
=========

* DynamoDB AutoScaling: Instead of `addReadAutoScaling()`, call `autoScaleReadCapacity()`, and similar for write scaling.
* CloudFormation resource usage: If you use L1s, you may need to change some `XxxName` properties back into `Name`. These will match the CloudFormation property names.
* You must use the matching `aws-cdk` toolkit when upgrading to this version, or context providers will cease to work. All existing cached context values in `cdk.json` will be invalidated and refreshed.
@rix0rrr rix0rrr mentioned this issue Oct 26, 2018
rix0rrr added a commit that referenced this issue Oct 26, 2018
__IMPORTANT NOTE__: when upgrading to this version of the CDK framework, you must also upgrade
your installation the CDK Toolkit to the matching version:

```shell
$ npm i -g aws-cdk
$ cdk --version
0.14.0 (build ...)
```

Bug Fixes
=========

* remove CloudFormation property renames ([#973](#973)) ([3f86603](3f86603)), closes [#852](#852)
* **aws-ec2:** fix retention of all egress traffic rule ([#998](#998)) ([b9d5b43](b9d5b43)), closes [#987](#987)
* **aws-s3-deployment:** avoid deletion during update using physical ids ([#1006](#1006)) ([bca99c6](bca99c6)), closes [#981](#981) [#981](#981)
* **cloudformation-diff:** ignore changes to DependsOn ([#1005](#1005)) ([3605f9c](3605f9c)), closes [#274](#274)
* **cloudformation-diff:** track replacements ([#1003](#1003)) ([a83ac5f](a83ac5f)), closes [#1001](#1001)
* **docs:** fix EC2 readme for "natgatway" configuration ([#994](#994)) ([0b1e7cc](0b1e7cc))
* **docs:** updates to contribution guide ([#997](#997)) ([b42e742](b42e742))
* **iam:** Merge multiple principals correctly ([#983](#983)) ([3fc5c8c](3fc5c8c)), closes [#924](#924) [#916](#916) [#958](#958)

Features
=========

* add construct library for Application AutoScaling ([#933](#933)) ([7861c6f](7861c6f)), closes [#856](#856) [#861](#861) [#640](#640) [#644](#644)
* add HostedZone context provider ([#823](#823)) ([1626c37](1626c37))
* **assert:** haveResource lists failing properties ([#1016](#1016)) ([7f6f3fd](7f6f3fd))
* **aws-cdk:** add CDK app version negotiation ([#988](#988)) ([db4e718](db4e718)), closes [#891](#891)
* **aws-codebuild:** Introduce a CodePipeline test Action. ([#873](#873)) ([770f9aa](770f9aa))
* **aws-sqs:** Add grantXxx() methods ([#1004](#1004)) ([8c90350](8c90350))
* **core:** Pre-concatenate Fn::Join ([#967](#967)) ([33c32a8](33c32a8)), closes [#916](#916) [#958](#958)

BREAKING CHANGES
=========

* DynamoDB AutoScaling: Instead of `addReadAutoScaling()`, call `autoScaleReadCapacity()`, and similar for write scaling.
* CloudFormation resource usage: If you use L1s, you may need to change some `XxxName` properties back into `Name`. These will match the CloudFormation property names.
* You must use the matching `aws-cdk` toolkit when upgrading to this version, or context providers will cease to work. All existing cached context values in `cdk.json` will be invalidated and refreshed.
jonparker pushed a commit to jonparker/aws-cdk that referenced this issue Oct 29, 2018
__IMPORTANT NOTE__: when upgrading to this version of the CDK framework, you must also upgrade
your installation the CDK Toolkit to the matching version:

```shell
$ npm i -g aws-cdk
$ cdk --version
0.14.0 (build ...)
```

Bug Fixes
=========

* remove CloudFormation property renames ([aws#973](aws#973)) ([3f86603](aws@3f86603)), closes [aws#852](aws#852)
* **aws-ec2:** fix retention of all egress traffic rule ([aws#998](aws#998)) ([b9d5b43](aws@b9d5b43)), closes [aws#987](aws#987)
* **aws-s3-deployment:** avoid deletion during update using physical ids ([aws#1006](aws#1006)) ([bca99c6](aws@bca99c6)), closes [aws#981](aws#981) [aws#981](aws#981)
* **cloudformation-diff:** ignore changes to DependsOn ([aws#1005](aws#1005)) ([3605f9c](aws@3605f9c)), closes [aws#274](aws#274)
* **cloudformation-diff:** track replacements ([aws#1003](aws#1003)) ([a83ac5f](aws@a83ac5f)), closes [aws#1001](aws#1001)
* **docs:** fix EC2 readme for "natgatway" configuration ([aws#994](aws#994)) ([0b1e7cc](aws@0b1e7cc))
* **docs:** updates to contribution guide ([aws#997](aws#997)) ([b42e742](aws@b42e742))
* **iam:** Merge multiple principals correctly ([aws#983](aws#983)) ([3fc5c8c](aws@3fc5c8c)), closes [aws#924](aws#924) [aws#916](aws#916) [aws#958](aws#958)

Features
=========

* add construct library for Application AutoScaling ([aws#933](aws#933)) ([7861c6f](aws@7861c6f)), closes [aws#856](aws#856) [aws#861](aws#861) [aws#640](aws#640) [aws#644](aws#644)
* add HostedZone context provider ([aws#823](aws#823)) ([1626c37](aws@1626c37))
* **assert:** haveResource lists failing properties ([aws#1016](aws#1016)) ([7f6f3fd](aws@7f6f3fd))
* **aws-cdk:** add CDK app version negotiation ([aws#988](aws#988)) ([db4e718](aws@db4e718)), closes [aws#891](aws#891)
* **aws-codebuild:** Introduce a CodePipeline test Action. ([aws#873](aws#873)) ([770f9aa](aws@770f9aa))
* **aws-sqs:** Add grantXxx() methods ([aws#1004](aws#1004)) ([8c90350](aws@8c90350))
* **core:** Pre-concatenate Fn::Join ([aws#967](aws#967)) ([33c32a8](aws@33c32a8)), closes [aws#916](aws#916) [aws#958](aws#958)

BREAKING CHANGES
=========

* DynamoDB AutoScaling: Instead of `addReadAutoScaling()`, call `autoScaleReadCapacity()`, and similar for write scaling.
* CloudFormation resource usage: If you use L1s, you may need to change some `XxxName` properties back into `Name`. These will match the CloudFormation property names.
* You must use the matching `aws-cdk` toolkit when upgrading to this version, or context providers will cease to work. All existing cached context values in `cdk.json` will be invalidated and refreshed.
@mhuebner
Copy link

mhuebner commented May 22, 2019

I'm wondering why this SecurityGroup is added to the LaunchConfiguration. Wouldn't it be better to just be able to set a securityGroup to have better control?
Currently I'm using AutoScalingGroup.addSecurityGroup to add my own SecurityGroup (with outbound traffic allowed) to I don't want a default group to be added. Would it be possible to just not add any default groups to the LaunchConfiguration (but support setting you own)?

@erikdebruin
Copy link

+1 on that suggestion, @mhuebner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants