Weave with AWS EKS is not working #3335

redi-vinogradov · 2018-06-21T18:45:49Z

What you expected to happen?

EKS pods are able to interact with each other vi Weave network

What happened?

Deployed Weave daemonset on a new AWS EKS cluster with updated CIDR range. Pods are able to get proper IP from Weave but cannot interact with each other.

How to reproduce it?

Create AWS EKS cluster, define IPALLOC_RANGE environment variable in your daemonset file to 172.20.0.0/16. Apply Weave daemonset. Now pods are able to get an IP but can't interact.

Anything else we need to know?

AWS EKS, no internet access, using proxy for external access.

Versions:

EKS v1.10

$ weave version
2.3.0
$ docker version
18.03.1-ce
$ uname -a
CentOS 7.4 3.10.0-862.3.2.el7.x86_64
$ kubectl version
1.10.4

Logs:

$ sudo docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
c47e8d7383b6        bridge              bridge              local
9c8e7fa1202a        host                host                local
f8d1debbe4bb        none                null                local

$ sudo docker network inspect f8d1debbe4bb
[
    {
        "Name": "none",
        "Id": "f8d1debbe4bbbfdb5ba3e81a731288b9349c455b64f8287c100b42fead9c2988",
        "Created": "2018-06-21T09:45:41.456125078-04:00",
        "Scope": "local",
        "Driver": "null",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": []
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "a955616beadefd7359a3c1d4cc65ce5971fa1d5c9e7fad164ac860eb7e583218": {
                "Name": "k8s_POD_kube-dns-64b69465b4-mpddk_kube-system_78830a80-7581-11e8-ae49-0a87435622a6_0",
                "EndpointID": "e0d77a64dfe2ba5b592d54deb4d53c51955ddebaf172fef153d25d281679df6a",
                "MacAddress": "",
                "IPv4Address": "",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

Network:

$ ip route
default via 10.182.208.1 dev ens3
10.182.208.0/20 dev ens3 proto kernel scope link src 10.182.211.149
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.20.0.0/16 dev weave proto kernel scope link src 172.20.128.0

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: ens3    inet 10.182.211.149/20 brd 10.182.223.255 scope global dynamic ens3\       valid_lft 2315sec preferred_lft 2315sec
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
6: weave    inet 172.20.128.0/16 brd 172.20.255.255 scope global weave\       valid_lft forever preferred_lft forever

The text was updated successfully, but these errors were encountered:

bboreham · 2018-06-25T10:14:27Z

Thanks for this report, @redi-vinogradov .

Could you add a little more detail to "cannot interact with each other" ? What did you try?

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network. And since kube-dns is in that set, it cannot resolve any service addresses, which will break lots of things.

Happy to receive tips or PRs to let us connect in the api-server(s).

redi-vinogradov · 2018-06-28T17:38:46Z

A few examples:
Service status:

$ kubectl -n kube-system describe svc kubernetes-dashboard
Name:              kubernetes-dashboard
Namespace:         kube-system
Labels:            k8s-app=kubernetes-dashboard
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"kubernetes-dashboard"},"name":"kubernetes-dashboard","namespace":...
Selector:          k8s-app=kubernetes-dashboard
Type:              ClusterIP
IP:                172.20.218.38
Port:              <unset>  443/TCP
TargetPort:        8443/TCP
Endpoints:         172.20.224.3:8443
Session Affinity:  None
Events:            <none>

Pod is trying to connect to kubernetes-dashboard service via port 443:

$ nc -v 172.20.218.38 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: No route to host.

Pod is trying to connect to kubernetes-dashboard directly (via pod's IP):

$ nc -v 172.20.224.3 8443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.

Not sure why EKS-specific network should be an issue since you have to define it first (before EKS cluster creation) and in our case worker nodes are on the same network as EKS master nodes.

brb · 2018-07-09T08:35:50Z

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network.

@errordeveloper Any idea how to access the api-server?

murali-reddy · 2018-07-16T08:36:43Z

Not sure why EKS-specific network should be an issue since you have to define it first (before EKS cluster creation) and in our case worker nodes are on the same network as EKS master nodes.

@redi-vinogradov Wondering how do you bypass the default use of amazon-vpc-cni-k8s as CNI for EKS, Could not find a way in the guides.

brb · 2018-07-16T12:20:24Z

Related eksctl-io/eksctl#109

redi-vinogradov · 2018-07-16T12:33:29Z

@murali-reddy Not sure if that was a correct way of doing this but basically kubectl delete ds --namespace=kube-system aws-node

errordeveloper · 2018-07-17T08:08:28Z

@redi-vinogradov @murali-reddy yes, deleting kube-system:DaemonSet/aws-node is the only way to go, but you also must make sure you recycle all the existing pods (as described in eksctl-io/eksctl#109).

errordeveloper · 2018-07-17T08:16:04Z

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network.
@errordeveloper Any idea how to access the api-server?

@brb @bboreham so from my most recent findings (eksctl-io/eksctl#109), that doesn't seem an issue any more. Although, I only did limited testing, so there may exist conditions under which the API server become inaccessible (we should be able to clarify it by talking to the AWS team).

As you can see from eksctl-io/eksctl#109, what is certainly broken is the DNS, so I suggest we should get to the bottom of that first, and then test more extensively. To be clear, I'd like to see Weave Net as eksctl add-on, otherwise it seems like the process of swapping out the network is not simple enough for someone to do manually.

murali-reddy · 2018-07-17T09:42:24Z

To be clear, I'd like to see Weave Net as eksctl add-on, otherwise it seems like the process of swapping out the network is not simple enough for someone to do manually.

+1

While deleting the DS will result in the deletion of amazon-vpc-cni-k8s pods, it does not necessarily mean things will work after you install new CNI daemon-set. Usually switching CNI's is trail and error. My guess is amazon-vpc-cni-k8s will leave iptables rules and PBR that may interfere later.

errordeveloper · 2018-07-17T20:28:53Z

I agree, it is very likely the iptables lunger around after native network is removed.

redi-vinogradov · 2018-07-23T13:17:36Z

Thank you, gents. I had to reset iptables rules and reboot worker nodes. Now I can confirm that pods are able to communicate successfully however, pods are still not able (no route to host error) to communicate with services.

TigerC10 · 2018-11-26T02:00:50Z

Thank you, gents. I had to reset iptables rules and reboot worker nodes. Now I can confirm that pods are able to communicate successfully however, pods are still not able (no route to host error) to communicate with services.

How exactly did you reset the iptables?

murali-reddy · 2018-11-26T05:23:14Z

I am able to consistently run Weave on EKS. I am following below steps. Can some one please try and confirm if it works?

eksctl create cluster
kubectl delete ds aws-node -n kube-system
delete /etc/cni/net.d/10-aws.conflist on each of the node
edit instance security group to allow UDP, TCP on 6873, 6874 ports
flush iptables nat, mangle, filter
restart kube-proxy pods
apply weave-net daemoset
delete existing pods so the get recreated in Weave pod CIDR's address-space.

I am able to test below scenarios:

pod-to-pod connectivity with in same node and across nodes
pod-to-node connectivity
node-to-pod connectivity
pod-service ip-pod connectivity

EDIT: Note that the api-server for your cluster will not be connected to Weave Net (it runs elsewhere, managed by EKS) so will not be able to connect to pods.

errordeveloper · 2018-11-27T06:35:10Z

@murali-reddy thanks a lot, great to see it didn't require too crazy work-arounds! As your list seems to exclude it, I have to ask - did you test egress to internet, and have you looked into whether pods can connect to the API server using default in-cluster endpoint (KUBERNETES_SERVICE_HOST)?

murali-reddy · 2018-11-27T07:15:23Z

Should have added that. Yes. connection to API server works fine. Both kube-dns and weave pods using service cluster IP of kubernetes are working fine for me. Egress from pods is working as well.

To me everything seems to be working fine. Just need some one to try out and either confirm or report any issue.

redi-vinogradov · 2018-11-27T21:57:50Z

Thank you @murali-reddy! I followed your steps and can confirm that the following is working:

pod-to-pod connectivity within the same node and across nodes
pod to Kubernetes API (via service IP)
pod-to-service connectivity
node-to-pod connectivity

Not sure how can you do pod-service to pod-ip connectivity test but presumably it is.
Now it's only a matter of automation of these steps in order to get it working with AWS ASGs.

murali-reddy · 2018-11-28T03:34:56Z

@redi-vinogradov thanks for confirming it works

Note that these are fairly easy steps to automate. If you just start with master nodes provisioned by EKS, then perform some of the steps (skip the steps needed on the nodes as there are none) one time, then there is nothing to be done on the newer nodes. As the new nodes starts straight away with Weave as CNI

errordeveloper · 2018-11-28T06:36:16Z

Is the /etc/cni/net.d/10-aws.conflist written by the default driver at runtime?

…

On Wed, 28 Nov 2018, 5:35 am Murali Reddy, ***@***.***> wrote: @redi-vinogradov <https://github.com/redi-vinogradov> thanks for confirming it works Note that these are fairly easy steps to automate. If you just start with master nodes provisioned by EKS, then perform some of the steps (skip the steps needed on the nodes as there are none) one time, then *there is nothing to be done* on the newer nodes. As the new nodes starts straight away with Weave as CNI — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3335 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPWS0gYL8Pjy7Z4eehA_K9ybrKEuGHSks5uzgRjgaJpZM4UylQ2> .

murali-reddy · 2018-11-28T06:56:31Z

Yes, EKS's aws-node deamonset running AWS VPC cni creates 10-aws.conflist. Which should be removed in order for Weave's CNI config file 10-weave.conflist to take an effect.

christopherhein · 2018-12-03T22:47:45Z

@murali-reddy what was the source you specified for the TCP and UDP SG changes? I was testing with access from an node in the SG itself and it didn't appear to like that. I resolved to anywhere access which is too open IMO. Any tips?

murali-reddy · 2018-12-04T03:57:40Z

edit instance security group to allow UDP, TCP on 6873, 6874 ports

@christianberg you mean this step? I picked up the security group that applies to the node and added a inbound rule with custom TCP type with range 6873-6874 and source as custom with same security group set as value.

Yes, anywhere would be too open, restrict to the nodes only.

christopherhein · 2018-12-04T04:00:08Z

Interesting, that's what I had. I'll give it another shot. Thanks @murali-reddy

ilkkatoje · 2019-01-04T09:18:54Z

Hi,

I'm setting up EKS cluster with Terraform EKS module. Pod communication seems to work nicely after removing aws-node ds, applying Weave ds and recycling nodes. However, I cannot access to services using kubectl proxy 1, like http://localhost:8001/api/v1/namespaces/mynamespace/services/my-nginx/proxy/. I get the below error:

Error: 'Address is not allowed'
Trying to reach: 'http://10.32.0.3:80/'

The problem seems to be that AWS managed api server cannot get access pods in the overlay network. Should I be able to access the pods with this proxy method at all when using Weave as CNI?

murali-reddy · 2019-01-04T09:31:04Z

I am afraid you wont be able to. Weave overlay network only extend on the non-master nodes.

bboreham · 2019-01-04T09:41:11Z

+- we don't know a way to tell the api-server how to route to the Weave Network.

("Address is not allowed" is interesting - I haven't seen that before. Could it be an ICMPv6 Destination Unreachable code 5?)

ilkkatoje · 2019-01-04T09:44:40Z

Ok, thanks! I believe this proxy access works only with native aws-cni currently, and not with any other cnis.

mkva · 2019-01-04T09:50:26Z

+- we don't know a way to tell the api-server how to route to the Weave Network.

("Address is not allowed" is interesting - I haven't seen that before. Could it be an ICMPv6 Destination Unreachable code 5?)

Tried to get to the root of this message, and closest I found was part of kubernetes proxy implementation: https://github.com/kubernetes/kubernetes/blob/a3ccea9d8743f2ff82e41b6c2af6dc2c41dc7b10/staging/src/k8s.io/apimachinery/pkg/util/proxy/transport.go#L103

which calls go http module RoundTripper interface (https://github.com/golang/go/blob/fdefabadf0a2cb99accb2afe49eafce0eaeb53a7/src/net/http/roundtrip.go).

But could not find a trace of the message from go library, or from Linux kernel sources.

marklz · 2019-01-16T15:16:55Z

Thanks to @murali-reddy - everything works, and especially multicast, which is key for us!

christopherhein · 2019-01-16T19:43:20Z

I am afraid you wont be able to. Weave overlay network only extend on the non-master nodes.

So this means if you deploy Net on EKS you just won't have access to port-forward, exec, logs and any of the other requests that go through the x-account ENI?

murali-reddy · 2019-01-17T07:15:57Z

So this means if you deploy Net on EKS you just won't have access to port-forward, exec, logs and any of the other requests that go through the x-account ENI?

@christopherhein not really. So only case where API server/master node directly needs to access the pod IP you will have problem. I am not sure what could be such cases where direct access to pod IP's are needed.

For port-forward, exec, logs etc requests go through the kubelet on the node. So there should not be any problem. I am able to successfully perform exec, logs, port-foraward etc with Weave CNI on EKS.

alec-v · 2019-01-28T15:42:11Z

@murali-reddy seems i faced this issue deploying metrics-server on top of EKS + weave net. API server tries to access metrics-server POD ip directly instead of metrics-server cluster IP:
kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 172.20.0.1 443/TCP 6h
kube-system kube-dns ClusterIP 172.20.0.10 53/UDP,53/TCP 6h
kube-system metrics-server ClusterIP 172.20.43.14 443/TCP 2h
kube-system tiller-deploy ClusterIP 172.20.215.80 44134/TCP 6h

and what i have in logs:

'no response from https://10.32.0.5:443: Get https://10.32.0.5:443: Address is not allowed'

10.32.0.5 this is POD address of metrics server. Is there any way to change API server behaviour?

murali-reddy · 2019-01-29T06:44:10Z

@alec-v IMO Kubernetes control plane/master not able to reach pod IP's is not necessarily a bad thing from security perspective and make sense for hosted Kubernetes solution. But it does seem to have impact on any extension API using aggregation layer.

Please see this comment you should be able to use hostNetwork for the metricserver pod and make it work.

murali-reddy · 2019-01-29T06:46:49Z

I am closing this bug. Instructions provided seems to work for most cases. Please reopen if you feel this issue need to be addressed.

BlackBsd · 2019-07-19T17:21:31Z

Hello, this is still an issues and the instructions are not fully working.

kubectl -n kube-system logs weave-net-x57lt -c weave-npc 
Error from server: Get https://172.17.172.29:10250/containerLogs/kube-system/weave-net-x57lt/weave-npc: dial tcp 172.17.172.29:10250: connect: no route to host

But if i try to rerun the same command, after about 30 tries or so it works. then after about 5 minutes, it stops working again.

It seem its working then not, then working, then not, etc.

In my case I created a fresh EKS 1.13 cluster with no workers attached at first,
kubectl -n kube-system delete daemonset aws-node
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.0.0/16"
kubectl apply -f ./EKS-Worker-Auth-ConfigMap.yaml

errordeveloper · 2019-07-24T17:33:51Z

@BlackBsd it looks like you are having a stability problem, please open another issue with details of logs that you are seeing and make sure to check if the weave-net pod is restarting.

jwenz723 · 2019-08-03T04:40:38Z

It seems that I am having issues with the API server being able to talk to pods in my EKS cluster since I installed Weave Net. The issue I am seeing is:

I start a proxy:

kubectl proxy

I try to access the installed kubernetes-dashboard instance at the following url: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

I get the following displayed on the page:

Error: 'Address is not allowed'
Trying to reach: 'https://10.44.0.1:8443/'

Seems that the issue is that the API server does not know how to access 10.44.0.1 which makes sense because 10.X.X.X is traditionally a private schema and the API server is not running within the overlay network so it wouldn't know how to route to that ip address. Anyone have any suggestions of how to resolve this?

The strange thing is that I am able to access the dashboard (and other services) if I use kubectl port-forward instead of kubectl proxy. I do not understand the underlying differences between these commands, but seems like they would both need to pass through the API server?

Install details:

Create EKS cluster with 0 nodes
Delete aws-node daemonset: kubectl delete ds -n kube-system aws-node
Install weave net: kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Join the worker nodes

The install works and all of my pods are running as I would expect. I know that they are able to communicate out of the cluster and that pod to pod connectivity is functioning. I do not have any pods trying to access the API server, so I don't know if that works or not.

murali-reddy · 2019-08-05T09:24:38Z

It seems that I am having issues with the API server being able to talk to pods in my EKS cluster

@jwenz723 yes this does not work. As you figured master nodes are not in the weave overaly they can not connect.

The strange thing is that I am able to access the dashboard (and other services) if I use kubectl port-forward instead of kubectl proxy.

https://kubernetes.io/docs/concepts/architecture/master-node-communication/#master-to-cluster

API server goes through kubelet and then to the pod in case of port-forward

jwenz723 · 2019-08-08T04:55:58Z

Is it expected that a service of type LoadBalancer will not work when using weave net on EKS?

I am getting the following error on my service when I try to set the type to LoadBalancer:

Error creating load balancer (will retry): failed to ensure load balancer for service promop/promop-grafana: could not find any suitable subnets for creating the ELB

murali-reddy · 2019-08-08T07:02:34Z

This particular error is nothing do with Weav-net and service type LoadBalancer should work when using EKS.

Please check if you are following the guidelines https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

redi-vinogradov · 2019-08-08T12:59:08Z

@jwenz723 This issue has nothing to do with your error.

P.S. You should have selected some public subnets for your EKS control plane or use special annotation in your pod definition to create an internal LB: https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html

jwenz723 · 2019-08-08T18:35:12Z

This particular error is nothing do with Weav-net and service type LoadBalancer should work when using EKS.

Please check if you are following the guidelines https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

Thanks, adding the proper tags on my subnets and vpc fixed my loadbalancer issue.

anupash147 · 2020-03-18T23:14:32Z

For the most part works but problem comes when istio is installed , istio/istio#16434 <- take a look

bboreham added the feature label Aug 10, 2018

errordeveloper mentioned this issue Nov 5, 2018

Design Proposal #002: Existing VPC eksctl-io/eksctl#303

Merged

murali-reddy mentioned this issue Nov 26, 2018

Option to use Weave Net CNI eksctl-io/eksctl#109

Closed

murali-reddy mentioned this issue Dec 4, 2018

Instructions for running on GKE #3466

Open

murali-reddy closed this as completed Jan 29, 2019

alec-v mentioned this issue Jan 29, 2019

Allow metrcs-server to start in hostNetwork mode helm/charts#10967

Merged

3 tasks

murali-reddy mentioned this issue Oct 14, 2019

Possible issue with kube-proxy on AWS EKS #3719

Closed

murali-reddy mentioned this issue Feb 26, 2020

add steps to install weave-net on EKS #3777

Merged

mumoshu mentioned this issue Dec 14, 2021

Creating Runners fails webhook with 'Address is not allowed' actions/actions-runner-controller#1005

Closed

1 task

Weave with AWS EKS is not working #3335

Weave with AWS EKS is not working #3335

Comments

redi-vinogradov commented Jun 21, 2018

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

Network:

bboreham commented Jun 25, 2018

redi-vinogradov commented Jun 28, 2018

brb commented Jul 9, 2018

murali-reddy commented Jul 16, 2018 • edited Loading

brb commented Jul 16, 2018

redi-vinogradov commented Jul 16, 2018

errordeveloper commented Jul 17, 2018

errordeveloper commented Jul 17, 2018

murali-reddy commented Jul 17, 2018

errordeveloper commented Jul 17, 2018

redi-vinogradov commented Jul 23, 2018 • edited Loading

TigerC10 commented Nov 26, 2018

murali-reddy commented Nov 26, 2018 • edited by bboreham Loading

errordeveloper commented Nov 27, 2018

murali-reddy commented Nov 27, 2018

redi-vinogradov commented Nov 27, 2018

murali-reddy commented Nov 28, 2018

errordeveloper commented Nov 28, 2018 via email

murali-reddy commented Nov 28, 2018

christopherhein commented Dec 3, 2018

murali-reddy commented Dec 4, 2018

christopherhein commented Dec 4, 2018

ilkkatoje commented Jan 4, 2019

murali-reddy commented Jan 4, 2019

bboreham commented Jan 4, 2019

ilkkatoje commented Jan 4, 2019

mkva commented Jan 4, 2019 • edited Loading

marklz commented Jan 16, 2019

christopherhein commented Jan 16, 2019

murali-reddy commented Jan 17, 2019 • edited Loading

alec-v commented Jan 28, 2019 • edited Loading

murali-reddy commented Jan 29, 2019 • edited Loading

murali-reddy commented Jan 29, 2019

BlackBsd commented Jul 19, 2019 • edited Loading

errordeveloper commented Jul 24, 2019 • edited Loading

jwenz723 commented Aug 3, 2019

murali-reddy commented Aug 5, 2019

jwenz723 commented Aug 8, 2019

murali-reddy commented Aug 8, 2019

redi-vinogradov commented Aug 8, 2019

jwenz723 commented Aug 8, 2019

anupash147 commented Mar 18, 2020

murali-reddy commented Jul 16, 2018 •

edited

Loading

redi-vinogradov commented Jul 23, 2018 •

edited

Loading

murali-reddy commented Nov 26, 2018 •

edited by bboreham

Loading

mkva commented Jan 4, 2019 •

edited

Loading

murali-reddy commented Jan 17, 2019 •

edited

Loading

alec-v commented Jan 28, 2019 •

edited

Loading

murali-reddy commented Jan 29, 2019 •

edited

Loading

BlackBsd commented Jul 19, 2019 •

edited

Loading

errordeveloper commented Jul 24, 2019 •

edited

Loading