Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Instructions for running on GKE #3466

Open
scottohara opened this issue Dec 4, 2018 · 3 comments
Open

Instructions for running on GKE #3466

scottohara opened this issue Dec 4, 2018 · 3 comments

Comments

@scottohara
Copy link

What you expected to happen?

As a novice Kubernetes user, installing Weave Net in GKE as per the instructions, the docs specify:

"After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network."

I am assuming (and I could well be wrong here) that this means any new pods created should be allocated an IP within Weave's default range (10.32.0.0/12)

What happened?

New pods are being allocated a range in the default cluster CIDR range (10.20.0.0/14).

(Background: I'm attempting to run Atlassian Confluence Data Center on GKE. Confluence Data Centre uses Hazelcast, and needs multicast support for the nodes to find each other. This has led me to installing Weave Net, for it's multicast support; but as yet I have not been successful in getting it to work).

How to reproduce it?

gcloud container clusters create confluence-data-center
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user [my email]
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
kubectl create -f [pod spec].yml
kubectl get pods -o wide

(the pod created by [pod spec].yml has an IP in the 10.20.x.x range)

Anything else we need to know?

Not really, just a vanilla GKE cluster with Weave installed as per instructions.

I'm certain the problem is simply a lack of understanding on my part; but as far as I can tell I've followed the "Weave Net can be installed onto your CNI-enabled Kubernetes cluster with a single command" instructions to the letter.

kubectl get pods -n kube-system -l name=weave-net

NAME              READY     STATUS    RESTARTS   AGE
weave-net-ghp9p   2/2       Running   0          1h
weave-net-qdrgd   2/2       Running   0          1h
weave-net-vcrhd   2/2       Running   0          1h

Versions:

$ kubectl exec -n kube-system weave-net-ghp9p -c weave -- /home/weave/weave --local status

        Version: 2.5.0 (up to date; next check at 2018/12/04 09:20:33)

        Service: router
       Protocol: weave 1..2
           Name: 72:1c:d5:08:dc:69(gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 3
    Connections: 3 (2 established, 1 failed)
          Peers: 3 (with 6 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.32.0.0/12
  DefaultSubnet: 10.32.0.0/12

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.11", GitCommit:"dc4f6dda6a08aae2108d7a7fdc2a44fa23900f4c", GitTreeState:"clean", BuildDate:"2018-11-10T20:22:02Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Logs:

$ kubectl logs -n kube-system <weave-net-pod> weave
INFO: 2018/12/04 03:54:22.408086 Command line options: map[nickname:gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr no-dns:true db-prefix:/weavedb/weave-net expect-npc:true port:6783 docker-api: metrics-addr:0.0.0.0:6782 http-addr:127.0.0.1:6784 host-root:/host ipalloc-init:consensus=3 ipalloc-range:10.32.0.0/12 name:72:1c:d5:08:dc:69 conn-limit:100 datapath:datapath]
INFO: 2018/12/04 03:54:22.408157 weave  2.5.0
WARN: 2018/12/04 03:54:22.422541 Skipping bridge creation of "bridged_fastdp" due to: : bridge not supported
INFO: 2018/12/04 03:54:22.611872 Bridge type is bridge
INFO: 2018/12/04 03:54:22.611893 Communication between peers is unencrypted.
INFO: 2018/12/04 03:54:22.620254 Our name is 72:1c:d5:08:dc:69(gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr)
INFO: 2018/12/04 03:54:22.620302 Launch detected - using supplied peer list: [10.128.0.4 10.128.0.2 10.128.0.3]
INFO: 2018/12/04 03:54:22.644300 Unable to fetch ConfigMap kube-system/weave-net to infer unique cluster ID
INFO: 2018/12/04 03:54:22.644339 Checking for pre-existing addresses on weave bridge
INFO: 2018/12/04 03:54:22.730939 [allocator 72:1c:d5:08:dc:69] No valid persisted data
INFO: 2018/12/04 03:54:22.766043 [allocator 72:1c:d5:08:dc:69] Initialising via deferred consensus
INFO: 2018/12/04 03:54:22.766322 Sniffing traffic on vethwe-pcap (via pcap)
INFO: 2018/12/04 03:54:22.857902 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/12/04 03:54:22.858031 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/12/04 03:54:22.887401 ->[10.128.0.3:6783] attempting connection
INFO: 2018/12/04 03:54:22.887706 ->[10.128.0.4:6783] attempting connection
INFO: 2018/12/04 03:54:22.887766 ->[10.128.0.2:6783] attempting connection
INFO: 2018/12/04 03:54:22.887953 ->[10.128.0.3:38864] connection accepted
INFO: 2018/12/04 03:54:22.888071 ->[10.128.0.2:6783] error during connection attempt: dial tcp4 :0->10.128.0.2:6783: connect: connection refused
INFO: 2018/12/04 03:54:22.888089 ->[10.128.0.4:6783] error during connection attempt: dial tcp4 :0->10.128.0.4:6783: connect: connection refused
INFO: 2018/12/04 03:54:22.892914 ->[10.128.0.3:38864|72:1c:d5:08:dc:69(gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/12/04 03:54:22.894318 ->[10.128.0.3:6783|72:1c:d5:08:dc:69(gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/12/04 03:54:23.081041 ->[10.128.0.4:56665] connection accepted
INFO: 2018/12/04 03:54:23.082725 ->[10.128.0.4:56665|c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)]: connection ready; using protocol version 2
INFO: 2018/12/04 03:54:23.082772 overlay_switch ->[c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)] using sleeve
INFO: 2018/12/04 03:54:23.082803 ->[10.128.0.4:56665|c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)]: connection added (new peer)
INFO: 2018/12/04 03:54:23.128451 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/12/04 03:54:23.128838 ->[10.128.0.4:56665|c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)]: connection fully established
INFO: 2018/12/04 03:54:23.169292 sleeve ->[10.128.0.4:6783|c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)]: Effective MTU verified at 1398
mkdir: can't create directory '/host/opt/cni/': Read-only file system
INFO: 2018/12/04 03:54:23.381569 [kube-peers] Added myself to peer list &{[{72:1c:d5:08:dc:69 gke-confluence-data-cent-default-pool-1f7ba6a2-ncnr}]}
DEBU: 2018/12/04 03:54:23.396737 [kube-peers] Nodes that have disappeared: map[]
DEBU: 2018/12/04 03:54:23.562229 registering for updates for node delete events
INFO: 2018/12/04 03:54:23.764991 Discovered remote MAC c2:34:3d:7d:7f:66 at c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)
10.32.0.1
10.128.0.4
10.128.0.2
10.128.0.3
INFO: 2018/12/04 03:54:24.961315 ->[10.128.0.2:6783] attempting connection
INFO: 2018/12/04 03:54:24.962139 ->[10.128.0.2:6783] error during connection attempt: dial tcp4 :0->10.128.0.2:6783: connect: connection refused
INFO: 2018/12/04 03:54:27.504708 ->[10.128.0.2:6783] attempting connection
INFO: 2018/12/04 03:54:27.505189 ->[10.128.0.2:6783] error during connection attempt: dial tcp4 :0->10.128.0.2:6783: connect: connection refused
INFO: 2018/12/04 03:54:32.809611 ->[10.128.0.2:6783] attempting connection
INFO: 2018/12/04 03:54:32.810408 ->[10.128.0.2:6783] error during connection attempt: dial tcp4 :0->10.128.0.2:6783: connect: connection refused
INFO: 2018/12/04 03:54:42.512272 ->[10.128.0.2:6783] attempting connection
INFO: 2018/12/04 03:54:42.513092 ->[10.128.0.2:6783] error during connection attempt: dial tcp4 :0->10.128.0.2:6783: connect: connection refused
INFO: 2018/12/04 03:54:49.579989 ->[10.128.0.2:35888] connection accepted
INFO: 2018/12/04 03:54:49.592855 ->[10.128.0.2:35888|76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)]: connection ready; using protocol version 2
INFO: 2018/12/04 03:54:49.592929 overlay_switch ->[76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)] using sleeve
INFO: 2018/12/04 03:54:49.592947 ->[10.128.0.2:35888|76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)]: connection added (new peer)
INFO: 2018/12/04 03:54:49.605351 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/12/04 03:54:49.605438 ->[10.128.0.2:35888|76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)]: connection fully established
INFO: 2018/12/04 03:54:49.605822 sleeve ->[10.128.0.2:6783|76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)]: Effective MTU verified at 1398
INFO: 2018/12/04 03:54:50.815866 Discovered remote MAC 76:c8:ed:b9:04:b2 at 76:c8:ed:b9:04:b2(gke-confluence-data-cent-default-pool-1f7ba6a2-frs2)
@murali-reddy
Copy link
Contributor

@scottohara AFAIK there is no recommended steps that works (reliably) to run Weave on GKE.

Reason being GKE as managed services uses different CNI (last i checked it was kubenet) by default. I believe it is possible to run Weave of GKE but require hacks to configuration while provisioning cluster.

For e.g. #3111

Also GKE I believe uses different path (/home/kubernetes/bin) for CNI configuration hence you see this error in your weave logs.

INFO: 2018/12/04 03:54:23.169292 sleeve ->[10.128.0.4:6783|c2:34:3d:7d:7f:66(gke-confluence-data-cent-default-pool-1f7ba6a2-1z9n)]: Effective MTU verified at 1398
mkdir: can't create directory '/host/opt/cni/': Read-only file system

Not ideal but If you are comfortable try the hacks you can find scattered across the CNI on how users are able to run on GKE (for e.g cilium/cilium#4794)

@scottohara
Copy link
Author

Thanks very much @murali-reddy, I’ll try some of those hacks.

Do you think the install docs should include something about the limitations of GKE?

Currently there is nothing there to suggest it won’t work, and the presence of the paragraph regarding the RBAC requirements for GKE actually suggests the opposite.

I think it would save a lot of wasted time and effort if the docs indicated that GKE is not supported by default.

@murali-reddy
Copy link
Contributor

Agree. docs should include limitations with managed Kubernetes services. There is similar issue with EKS #3335 which needs special steps to enable Weave as CNI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants