Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave with AWS EKS is not working #3335

Closed
redi-vinogradov opened this issue Jun 21, 2018 · 42 comments
Closed

Weave with AWS EKS is not working #3335

redi-vinogradov opened this issue Jun 21, 2018 · 42 comments
Labels

Comments

@redi-vinogradov
Copy link

What you expected to happen?

EKS pods are able to interact with each other vi Weave network

What happened?

Deployed Weave daemonset on a new AWS EKS cluster with updated CIDR range. Pods are able to get proper IP from Weave but cannot interact with each other.

How to reproduce it?

Create AWS EKS cluster, define IPALLOC_RANGE environment variable in your daemonset file to 172.20.0.0/16. Apply Weave daemonset. Now pods are able to get an IP but can't interact.

Anything else we need to know?

AWS EKS, no internet access, using proxy for external access.

Versions:

EKS v1.10

$ weave version
2.3.0
$ docker version
18.03.1-ce
$ uname -a
CentOS 7.4 3.10.0-862.3.2.el7.x86_64
$ kubectl version
1.10.4

Logs:

$ sudo docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
c47e8d7383b6        bridge              bridge              local
9c8e7fa1202a        host                host                local
f8d1debbe4bb        none                null                local
$ sudo docker network inspect f8d1debbe4bb
[
    {
        "Name": "none",
        "Id": "f8d1debbe4bbbfdb5ba3e81a731288b9349c455b64f8287c100b42fead9c2988",
        "Created": "2018-06-21T09:45:41.456125078-04:00",
        "Scope": "local",
        "Driver": "null",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": []
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "a955616beadefd7359a3c1d4cc65ce5971fa1d5c9e7fad164ac860eb7e583218": {
                "Name": "k8s_POD_kube-dns-64b69465b4-mpddk_kube-system_78830a80-7581-11e8-ae49-0a87435622a6_0",
                "EndpointID": "e0d77a64dfe2ba5b592d54deb4d53c51955ddebaf172fef153d25d281679df6a",
                "MacAddress": "",
                "IPv4Address": "",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

Network:

$ ip route
default via 10.182.208.1 dev ens3
10.182.208.0/20 dev ens3 proto kernel scope link src 10.182.211.149
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.20.0.0/16 dev weave proto kernel scope link src 172.20.128.0

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: ens3    inet 10.182.211.149/20 brd 10.182.223.255 scope global dynamic ens3\       valid_lft 2315sec preferred_lft 2315sec
3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
6: weave    inet 172.20.128.0/16 brd 172.20.255.255 scope global weave\       valid_lft forever preferred_lft forever
@bboreham
Copy link
Contributor

Thanks for this report, @redi-vinogradov .

Could you add a little more detail to "cannot interact with each other" ? What did you try?

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network. And since kube-dns is in that set, it cannot resolve any service addresses, which will break lots of things.

Happy to receive tips or PRs to let us connect in the api-server(s).

@redi-vinogradov
Copy link
Author

A few examples:
Service status:

$ kubectl -n kube-system describe svc kubernetes-dashboard
Name:              kubernetes-dashboard
Namespace:         kube-system
Labels:            k8s-app=kubernetes-dashboard
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"kubernetes-dashboard"},"name":"kubernetes-dashboard","namespace":...
Selector:          k8s-app=kubernetes-dashboard
Type:              ClusterIP
IP:                172.20.218.38
Port:              <unset>  443/TCP
TargetPort:        8443/TCP
Endpoints:         172.20.224.3:8443
Session Affinity:  None
Events:            <none>

Pod is trying to connect to kubernetes-dashboard service via port 443:

$ nc -v 172.20.218.38 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: No route to host.

Pod is trying to connect to kubernetes-dashboard directly (via pod's IP):

$ nc -v 172.20.224.3 8443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.

Not sure why EKS-specific network should be an issue since you have to define it first (before EKS cluster creation) and in our case worker nodes are on the same network as EKS master nodes.

@brb
Copy link
Contributor

brb commented Jul 9, 2018

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network.

@errordeveloper Any idea how to access the api-server?

@murali-reddy
Copy link
Contributor

murali-reddy commented Jul 16, 2018

Not sure why EKS-specific network should be an issue since you have to define it first (before EKS cluster creation) and in our case worker nodes are on the same network as EKS master nodes.

@redi-vinogradov Wondering how do you bypass the default use of amazon-vpc-cni-k8s as CNI for EKS, Could not find a way in the guides.

@brb
Copy link
Contributor

brb commented Jul 16, 2018

Related eksctl-io/eksctl#109

@redi-vinogradov
Copy link
Author

@murali-reddy Not sure if that was a correct way of doing this but basically kubectl delete ds --namespace=kube-system aws-node

@errordeveloper
Copy link
Contributor

@redi-vinogradov @murali-reddy yes, deleting kube-system:DaemonSet/aws-node is the only way to go, but you also must make sure you recycle all the existing pods (as described in eksctl-io/eksctl#109).

@errordeveloper
Copy link
Contributor

One thing I'm aware of is that pods on the Weave network cannot talk to the Kubernetes api-server, because it is on an EKS-specific network.
@errordeveloper Any idea how to access the api-server?

@brb @bboreham so from my most recent findings (eksctl-io/eksctl#109), that doesn't seem an issue any more. Although, I only did limited testing, so there may exist conditions under which the API server become inaccessible (we should be able to clarify it by talking to the AWS team).

As you can see from eksctl-io/eksctl#109, what is certainly broken is the DNS, so I suggest we should get to the bottom of that first, and then test more extensively. To be clear, I'd like to see Weave Net as eksctl add-on, otherwise it seems like the process of swapping out the network is not simple enough for someone to do manually.

@murali-reddy
Copy link
Contributor

To be clear, I'd like to see Weave Net as eksctl add-on, otherwise it seems like the process of swapping out the network is not simple enough for someone to do manually.

+1

While deleting the DS will result in the deletion of amazon-vpc-cni-k8s pods, it does not necessarily mean things will work after you install new CNI daemon-set. Usually switching CNI's is trail and error. My guess is amazon-vpc-cni-k8s will leave iptables rules and PBR that may interfere later.

@errordeveloper
Copy link
Contributor

I agree, it is very likely the iptables lunger around after native network is removed.

@redi-vinogradov
Copy link
Author

redi-vinogradov commented Jul 23, 2018

Thank you, gents. I had to reset iptables rules and reboot worker nodes. Now I can confirm that pods are able to communicate successfully however, pods are still not able (no route to host error) to communicate with services.

@TigerC10
Copy link

Thank you, gents. I had to reset iptables rules and reboot worker nodes. Now I can confirm that pods are able to communicate successfully however, pods are still not able (no route to host error) to communicate with services.

How exactly did you reset the iptables?

@murali-reddy
Copy link
Contributor

murali-reddy commented Nov 26, 2018

I am able to consistently run Weave on EKS. I am following below steps. Can some one please try and confirm if it works?

  • eksctl create cluster
  • kubectl delete ds aws-node -n kube-system
  • delete /etc/cni/net.d/10-aws.conflist on each of the node
  • edit instance security group to allow UDP, TCP on 6873, 6874 ports
  • flush iptables nat, mangle, filter
  • restart kube-proxy pods
  • apply weave-net daemoset
  • delete existing pods so the get recreated in Weave pod CIDR's address-space.

I am able to test below scenarios:

  • pod-to-pod connectivity with in same node and across nodes
  • pod-to-node connectivity
  • node-to-pod connectivity
  • pod-service ip-pod connectivity

EDIT: Note that the api-server for your cluster will not be connected to Weave Net (it runs elsewhere, managed by EKS) so will not be able to connect to pods.

@errordeveloper
Copy link
Contributor

@murali-reddy thanks a lot, great to see it didn't require too crazy work-arounds! As your list seems to exclude it, I have to ask - did you test egress to internet, and have you looked into whether pods can connect to the API server using default in-cluster endpoint (KUBERNETES_SERVICE_HOST)?

@murali-reddy
Copy link
Contributor

Should have added that. Yes. connection to API server works fine. Both kube-dns and weave pods using service cluster IP of kubernetes are working fine for me. Egress from pods is working as well.

To me everything seems to be working fine. Just need some one to try out and either confirm or report any issue.

@redi-vinogradov
Copy link
Author

Thank you @murali-reddy! I followed your steps and can confirm that the following is working:

  • pod-to-pod connectivity within the same node and across nodes
  • pod to Kubernetes API (via service IP)
  • pod-to-service connectivity
  • node-to-pod connectivity

Not sure how can you do pod-service to pod-ip connectivity test but presumably it is.
Now it's only a matter of automation of these steps in order to get it working with AWS ASGs.

@murali-reddy
Copy link
Contributor

@redi-vinogradov thanks for confirming it works

Note that these are fairly easy steps to automate. If you just start with master nodes provisioned by EKS, then perform some of the steps (skip the steps needed on the nodes as there are none) one time, then there is nothing to be done on the newer nodes. As the new nodes starts straight away with Weave as CNI

@errordeveloper
Copy link
Contributor

errordeveloper commented Nov 28, 2018 via email

@murali-reddy
Copy link
Contributor

Yes, EKS's aws-node deamonset running AWS VPC cni creates 10-aws.conflist. Which should be removed in order for Weave's CNI config file 10-weave.conflist to take an effect.

@christopherhein
Copy link

@murali-reddy what was the source you specified for the TCP and UDP SG changes? I was testing with access from an node in the SG itself and it didn't appear to like that. I resolved to anywhere access which is too open IMO. Any tips?

@murali-reddy
Copy link
Contributor

edit instance security group to allow UDP, TCP on 6873, 6874 ports

@christianberg you mean this step? I picked up the security group that applies to the node and added a inbound rule with custom TCP type with range 6873-6874 and source as custom with same security group set as value.

Yes, anywhere would be too open, restrict to the nodes only.

@christopherhein
Copy link

Interesting, that's what I had. I'll give it another shot. Thanks @murali-reddy

@ilkkatoje
Copy link

Hi,

I'm setting up EKS cluster with Terraform EKS module. Pod communication seems to work nicely after removing aws-node ds, applying Weave ds and recycling nodes. However, I cannot access to services using kubectl proxy 1, like http://localhost:8001/api/v1/namespaces/mynamespace/services/my-nginx/proxy/. I get the below error:

Error: 'Address is not allowed'
Trying to reach: 'http://10.32.0.3:80/'

The problem seems to be that AWS managed api server cannot get access pods in the overlay network. Should I be able to access the pods with this proxy method at all when using Weave as CNI?

@murali-reddy
Copy link
Contributor

I am afraid you wont be able to. Weave overlay network only extend on the non-master nodes.

@bboreham
Copy link
Contributor

bboreham commented Jan 4, 2019

+- we don't know a way to tell the api-server how to route to the Weave Network.

("Address is not allowed" is interesting - I haven't seen that before. Could it be an ICMPv6 Destination Unreachable code 5?)

@ilkkatoje
Copy link

Ok, thanks! I believe this proxy access works only with native aws-cni currently, and not with any other cnis.

@mkva
Copy link

mkva commented Jan 4, 2019

+- we don't know a way to tell the api-server how to route to the Weave Network.

("Address is not allowed" is interesting - I haven't seen that before. Could it be an ICMPv6 Destination Unreachable code 5?)

Tried to get to the root of this message, and closest I found was part of kubernetes proxy implementation: https://github.com/kubernetes/kubernetes/blob/a3ccea9d8743f2ff82e41b6c2af6dc2c41dc7b10/staging/src/k8s.io/apimachinery/pkg/util/proxy/transport.go#L103

which calls go http module RoundTripper interface (https://github.com/golang/go/blob/fdefabadf0a2cb99accb2afe49eafce0eaeb53a7/src/net/http/roundtrip.go).

But could not find a trace of the message from go library, or from Linux kernel sources.

@marklz
Copy link

marklz commented Jan 16, 2019

Thanks to @murali-reddy - everything works, and especially multicast, which is key for us!

@christopherhein
Copy link

I am afraid you wont be able to. Weave overlay network only extend on the non-master nodes.

So this means if you deploy Net on EKS you just won't have access to port-forward, exec, logs and any of the other requests that go through the x-account ENI?

@murali-reddy
Copy link
Contributor

murali-reddy commented Jan 17, 2019

So this means if you deploy Net on EKS you just won't have access to port-forward, exec, logs and any of the other requests that go through the x-account ENI?

@christopherhein not really. So only case where API server/master node directly needs to access the pod IP you will have problem. I am not sure what could be such cases where direct access to pod IP's are needed.

For port-forward, exec, logs etc requests go through the kubelet on the node. So there should not be any problem. I am able to successfully perform exec, logs, port-foraward etc with Weave CNI on EKS.

@alec-v
Copy link

alec-v commented Jan 28, 2019

@murali-reddy seems i faced this issue deploying metrics-server on top of EKS + weave net. API server tries to access metrics-server POD ip directly instead of metrics-server cluster IP:
kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 172.20.0.1 443/TCP 6h
kube-system kube-dns ClusterIP 172.20.0.10 53/UDP,53/TCP 6h
kube-system metrics-server ClusterIP 172.20.43.14 443/TCP 2h
kube-system tiller-deploy ClusterIP 172.20.215.80 44134/TCP 6h

and what i have in logs:

'no response from https://10.32.0.5:443: Get https://10.32.0.5:443: Address is not allowed'

10.32.0.5 this is POD address of metrics server. Is there any way to change API server behaviour?

@murali-reddy
Copy link
Contributor

murali-reddy commented Jan 29, 2019

@alec-v IMO Kubernetes control plane/master not able to reach pod IP's is not necessarily a bad thing from security perspective and make sense for hosted Kubernetes solution. But it does seem to have impact on any extension API using aggregation layer.

Please see this comment you should be able to use hostNetwork for the metricserver pod and make it work.

@murali-reddy
Copy link
Contributor

I am closing this bug. Instructions provided seems to work for most cases. Please reopen if you feel this issue need to be addressed.

@BlackBsd
Copy link

BlackBsd commented Jul 19, 2019

Hello, this is still an issues and the instructions are not fully working.

kubectl -n kube-system logs weave-net-x57lt -c weave-npc 
Error from server: Get https://172.17.172.29:10250/containerLogs/kube-system/weave-net-x57lt/weave-npc: dial tcp 172.17.172.29:10250: connect: no route to host

But if i try to rerun the same command, after about 30 tries or so it works. then after about 5 minutes, it stops working again.

It seem its working then not, then working, then not, etc.

  • In my case I created a fresh EKS 1.13 cluster with no workers attached at first,
  • kubectl -n kube-system delete daemonset aws-node
  • kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.0.0/16"
  • kubectl apply -f ./EKS-Worker-Auth-ConfigMap.yaml

@errordeveloper
Copy link
Contributor

errordeveloper commented Jul 24, 2019

@BlackBsd it looks like you are having a stability problem, please open another issue with details of logs that you are seeing and make sure to check if the weave-net pod is restarting.

@jwenz723
Copy link

jwenz723 commented Aug 3, 2019

It seems that I am having issues with the API server being able to talk to pods in my EKS cluster since I installed Weave Net. The issue I am seeing is:

I start a proxy:

kubectl proxy

I try to access the installed kubernetes-dashboard instance at the following url: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

I get the following displayed on the page:

Error: 'Address is not allowed'
Trying to reach: 'https://10.44.0.1:8443/'

Seems that the issue is that the API server does not know how to access 10.44.0.1 which makes sense because 10.X.X.X is traditionally a private schema and the API server is not running within the overlay network so it wouldn't know how to route to that ip address. Anyone have any suggestions of how to resolve this?

The strange thing is that I am able to access the dashboard (and other services) if I use kubectl port-forward instead of kubectl proxy. I do not understand the underlying differences between these commands, but seems like they would both need to pass through the API server?

Install details:

  1. Create EKS cluster with 0 nodes
  2. Delete aws-node daemonset: kubectl delete ds -n kube-system aws-node
  3. Install weave net: kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
  4. Join the worker nodes

The install works and all of my pods are running as I would expect. I know that they are able to communicate out of the cluster and that pod to pod connectivity is functioning. I do not have any pods trying to access the API server, so I don't know if that works or not.

@murali-reddy
Copy link
Contributor

It seems that I am having issues with the API server being able to talk to pods in my EKS cluster

@jwenz723 yes this does not work. As you figured master nodes are not in the weave overaly they can not connect.

The strange thing is that I am able to access the dashboard (and other services) if I use kubectl port-forward instead of kubectl proxy.

https://kubernetes.io/docs/concepts/architecture/master-node-communication/#master-to-cluster

API server goes through kubelet and then to the pod in case of port-forward

@jwenz723
Copy link

jwenz723 commented Aug 8, 2019

Is it expected that a service of type LoadBalancer will not work when using weave net on EKS?

I am getting the following error on my service when I try to set the type to LoadBalancer:

Error creating load balancer (will retry): failed to ensure load balancer for service promop/promop-grafana: could not find any suitable subnets for creating the ELB

@murali-reddy
Copy link
Contributor

This particular error is nothing do with Weav-net and service type LoadBalancer should work when using EKS.

Please check if you are following the guidelines https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

@redi-vinogradov
Copy link
Author

@jwenz723 This issue has nothing to do with your error.

P.S. You should have selected some public subnets for your EKS control plane or use special annotation in your pod definition to create an internal LB: https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html

@jwenz723
Copy link

jwenz723 commented Aug 8, 2019

This particular error is nothing do with Weav-net and service type LoadBalancer should work when using EKS.

Please check if you are following the guidelines https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html

Thanks, adding the proper tags on my subnets and vpc fixed my loadbalancer issue.

@anupash147
Copy link

For the most part works but problem comes when istio is installed , istio/istio#16434 <- take a look

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests