Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Generated ELB Security Group caused API to be inaccessible due to blocking ICMP #214

Closed
whereisaaron opened this issue Jan 8, 2017 · 13 comments

Comments

@whereisaaron
Copy link
Contributor

kube-aws version v0.9.3-rc.2

After bringing up the cluster I could not contact the API server, the error in the controller log was:

API server.http: TLS handshake error from 1.1.1.1::12345: EOF

After investigating the culprit was the Security Group for ElbAPIServer that kube-aws creates. It blocks ICMP and so breaks TCP/IP. The client was sending an ICMP packet to negotiate the MTU but the ELB was blocking it, preventing a successful HTTPS connection.

ICMP ip-2.2.2.2.ap-somewhere.compute.internal unreachable - need to frag (mtu 1406), length 556

After adding ICMP to the ELB Security Group I could contact the API server no problem.

The problem is here in the stack template, ICMP needs to be added.

    "SecurityGroupElbAPIServer" : {
      "Properties": {
        "GroupDescription": {
          "Ref": "AWS::StackName"
        },
        "SecurityGroupIngress": [
          {
            "CidrIp": "0.0.0.0/0",
            "FromPort": 443,
            "IpProtocol": "tcp",
            "ToPort": 443
          }
        ],
...
@whereisaaron
Copy link
Contributor Author

I applied this patch to my stack-template.json to add ICMP to the API ELB and now the API is accessible as soon as the controller is up and running.

*** stack-template.json-orig	2017-01-07 19:52:41.597408095 -0500
--- stack-template.json	2017-01-07 19:53:52.577408095 -0500
***************
*** 628,633 ****
--- 628,639 ----
              "FromPort": 443,
              "IpProtocol": "tcp",
              "ToPort": 443
+           },
+           {
+             "CidrIp": "0.0.0.0/0",
+             "FromPort": -1,
+             "IpProtocol": "icmp",
+             "ToPort": -1
            }
          ],
          "Tags": [

@redbaron
Copy link
Contributor

redbaron commented Jan 8, 2017

Thanks for looking into it! Would you mind to prepare PR? To enable just MTU discovery, this rule will be sufficient:

           {
             "CidrIp": "0.0.0.0/0",
             "FromPort": 3,
             "IpProtocol": "icmp",
             "ToPort": 4
}

whereisaaron added a commit to whereisaaron/kube-aws that referenced this issue Jan 9, 2017
As discussed in issue kubernetes-retired#214, stop blocking ICMP for the API ELB. Among other things ICMP is required for MTU discovery to for TCP / HTTPS connections with the API server.
@whereisaaron
Copy link
Contributor Author

Thanks @redbaron I have prepared a PR #220 to unblock ICMP. I unblocked ICMP in general, as I don't see a need to block the other benefits of ICMP like ping or flow control.

I was right there in the '90s blocking or munging various ICMP types on my routers because of the vulnerabilities back then, ping-of-death included. But unless you believe AWS ELB's to be currently vulnerable to ICMP attacks then I don't think we need that any more.

In my case I am not mapping public IPs, so these API ELBs are only internal to my VPCs, so I am even let worried, and more inconvenienced, by the lack of ICMP. If you do feel the need to narrow the patch to just MTU discovery, then perhaps it could be different for public vs internal ELBs?

@mumoshu
Copy link
Contributor

mumoshu commented Jan 10, 2017

@whereisaaron @redbaron Thanks for raising this issue 👍 Sounds like an issue which can't be resolved without supports from great people like you.

One thing I'm unsure is why this had never been an issue.
Would you mind sharing your knowledge/thoughts on it; why blocking ICMP causes the issue not for everyone but only a part of users?

@whereisaaron
Copy link
Contributor Author

Hi @mumoshu this issue has came up a couple of times previously. It was a problem when there was a single controller node and no ELB, but that was fixed. Then the ELB was added in front of the controller to support HA controllers and ICMP got blocked again. 😢

June 2016 "kube-aws: allow all ICMP traffic": 9650ba8

Dec 2015 "multi-node/aws: allow icmp destination unreachable in security groups": ce4f423

If the whole world is one big mesh of public Ethernet networks then the MTU is always the same. Currently most of the Internet is like that (or appears like that). However if you have any different layer 2 network technologies from Ethernet or layer 2/3 tunneling (VPN, IPSec, etc.) then the MTU for a path across the network can be different and needs to be worked out via ICMP. In my use case we secure our traffic with IPsec which adds an encapsulation/encryption header and so makes the MTU slightly smaller than unsecured Ethernet.

If you just hang your cluster nodes and API server out on the public Internet you much less unlikely to hit this issue. At least while Ethernet reigns supreme (Token Ring anyone? 😃 ).

@mumoshu
Copy link
Contributor

mumoshu commented Jan 10, 2017

@whereisaaron Thanks a lot for your detailed answer. It really helped me understand what's going on. Also, as a maintainer, let me say sorry to you for breaking your use-case multiple times 😢

I'll merge #220 asap after sorting out #199

@mumoshu
Copy link
Contributor

mumoshu commented Jan 16, 2017

Closing this as #220 is merged and v0.9.3-rc.5 is released with the fix but feel flee to reopen if the issue still persists.

@mumoshu mumoshu closed this as completed Jan 16, 2017
@mikedeltalima
Copy link

I'm using v0.9.3-rc.5, and I think I've hit the same issue. This mentions that the Destination Unreachable Custom ICMP rule is necessary, instead of the general ICMP rule in place. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#path_mtu_discovery

@mumoshu
Copy link
Contributor

mumoshu commented Mar 2, 2017

Hi @mikedeltalima!

According to the doc about IpProtocol and the change introduced by #220, I assume that the security group should have been properly configured by allowing all the ICMP messages.
Could you share us how your security group assigned to your ELB configured so that we can investigate further?

Also, I'm not sure but are you using ipv6? If that's the case, should we also allow protocol 58(ICMPv6)?

@mumoshu mumoshu reopened this Mar 2, 2017
@mikedeltalima
Copy link

mikedeltalima commented Mar 2, 2017

Hi @mumoshu - thanks for re-opening the issue. I didn't see any improvement when I changed the rule, so my guess didn't pan out. Here's a screenshot of the relevant inbound rules on the instance that is "OutOfService". They are the same as my previous deployment, which is still working.
screen shot 2017-03-02 at 8 21 55 am

I have the same health checks running as here -- #295. Switching to SSL didn't seem to help.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 21, 2017

@mikedeltalima Sorry for being late in replying.
AFAIK, this issue affects ELB, not controller nodes like your case.
Probably running journalctl on one of your OutOfService controller nodes and finding errors would help!

@mikedeltalima
Copy link

@mumoshu, No worries! I ended up using a newer version of kube-aws

@mumoshu
Copy link
Contributor

mumoshu commented Mar 22, 2017

Good to hear it worked for you @mikedeltalima!

@mumoshu mumoshu closed this as completed Mar 22, 2017
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
As discussed in issue kubernetes-retired#214, stop blocking ICMP for the API ELB. Among other things ICMP is required for MTU discovery to for TCP / HTTPS connections with the API server.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants