Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE Pod Failed to create cluster with "failed to init node with kubeadm" #997

Closed
axsaucedo opened this issue Oct 22, 2019 · 5 comments
Closed
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@axsaucedo
Copy link

What happened:
We are trying to run our end to end tests in a GKE kubernetes cluster, creating the cluster from within the pod and then running all the tests. This has been unsuccessful so far due to the error (as shown below). I provide the pod spec used, together with the output of the command (with the error), and the version of the K8s cluster and docker server.

I also tried to find different issues such as #982 (got fixed updating), #928 (doesn't seem same problem).

The issue fails with and without config file, which is as follows:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
  extraPortMappings:
  - containerPort: 30080
    hostPort: 8003

This is the configuration used in the pod in order to run (tried multiple different combinations):

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: k8s-builder
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: k8s-101
    spec:
      containers:
      - name: k8s-docker-builder
        image: seldonio/core-builder:0.4
        imagePullPolicy: Always
        command:
        - tail
        args:
        - -f
        - /dev/null
        volumeMounts:
        - mountPath: /var/run/docker.sock
          name: docker-socket-volume
        - mountPath: /etc/docker/
          name: docker-path-volume
        securityContext:
          privileged: true
      volumes:
      - name: docker-socket-volume
        hostPath:
          path: /var/run/docker.sock
      - name: docker-path-volume
        hostPath:
          path: /etc/docker/

This is the output of the command kind create cluster --loglevel debug:

root@k8s-builder-b96c7d69b-qt7l2:/work# kind create cluster --loglevel debug
DEBU[18:30:19] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}}]
Creating cluster "kind" ...
DEBU[18:30:19] Running: /usr/bin/docker [docker inspect --type=image kindest/node:v1.15.3]
INFO[18:30:19] Image: kindest/node:v1.15.3 present locally
 ✓ Ensuring node image (kindest/node:v1.15.3) 🖼
DEBU[18:30:19] Running: /usr/bin/docker [docker info --format '{{json .SecurityOptions}}']
DEBU[18:30:19] Running: /usr/bin/docker [docker run --detach --tty --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --hostname kind-control-plane --name kind-control-plane --label io.k8s.sigs.kind.cluster=kind --label io.k8s.sigs.kind.role=control-plane --expose 40607 --publish=127.0.0.1:40607:6443/TCP kindest/node:v1.15.3@sha256:27e388752544890482a86b90d8ac50fcfa63a2e8656a96ec5337b902ec8e5157]
 ✓ Preparing nodes 📦
DEBU[18:30:27] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}} --filter label=io.k8s.sigs.kind.cluster=kind]
DEBU[18:30:27] Running: /usr/bin/docker [docker inspect -f {{index .Config.Labels "io.k8s.sigs.kind.role"}} kind-control-plane]
DEBU[18:30:28] Running: /usr/bin/docker [docker exec --privileged kind-control-plane cat /kind/version]
DEBU[18:30:28] Running: /usr/bin/docker [docker inspect -f {{range .NetworkSettings.Networks}}{{.IPAddress}},{{.GlobalIPv6Address}}{{end}} kind-control-plane]
DEBU[18:30:28] Running: /usr/bin/docker [docker inspect -f {{range .NetworkSettings.Networks}}{{.IPAddress}},{{.GlobalIPv6Address}}{{end}} kind-control-plane]
DEBU[18:30:28] Configuration Input data: {kind v1.15.3 169.254.123.2:6443 6443 127.0.0.1 true 169.254.123.2 abcdef.0123456789abcdef 10.244.0.0/16 10.96.0.0/12 false {}}
DEBU[18:30:28] Configuration generated:
 # config generated by kind
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
metadata:
  name: config
kubernetesVersion: v1.15.3
clusterName: "kind"
controlPlaneEndpoint: "169.254.123.2:6443"
# on docker for mac we have to expose the api server via port forward,
# so we need to ensure the cert is valid for localhost so we can talk
# to the cluster after rewriting the kubeconfig to point to localhost
apiServer:
  certSANs: [localhost, "127.0.0.1"]
controllerManager:
  extraArgs:
    enable-hostpath-provisioner: "true"
    # configure ipv6 default addresses for IPv6 clusters

scheduler:
  extraArgs:
    # configure ipv6 default addresses for IPv6 clusters

networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
metadata:
  name: config
# we use a well know token for TLS bootstrap
bootstrapTokens:
- token: "abcdef.0123456789abcdef"
# we use a well know port for making the API server discoverable inside docker network.
# from the host machine such port will be accessible via a random local port instead.
localAPIEndpoint:
  advertiseAddress: "169.254.123.2"
  bindPort: 6443
nodeRegistration:
  criSocket: "/run/containerd/containerd.sock"
  kubeletExtraArgs:
    fail-swap-on: "false"
    node-ip: "169.254.123.2"
---
# no-op entry that exists solely so it can be patched
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
metadata:
  name: config
controlPlane:
  localAPIEndpoint:
    advertiseAddress: "169.254.123.2"
    bindPort: 6443
nodeRegistration:
  criSocket: "/run/containerd/containerd.sock"
  kubeletExtraArgs:
    fail-swap-on: "false"
    node-ip: "169.254.123.2"
discovery:
  bootstrapToken:
    apiServerEndpoint: "169.254.123.2:6443"
    token: "abcdef.0123456789abcdef"
    unsafeSkipCAVerification: true
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
metadata:
  name: config
# configure ipv6 addresses in IPv6 mode

# disable disk resource management by default
# kubelet will see the host disk that the inner container runtime
# is ultimately backed by and attempt to recover disk space. we don't want that.
imageGCHighThresholdPercent: 100
evictionHard:
  nodefs.available: "0%"
  nodefs.inodesFree: "0%"
  imagefs.available: "0%"
---
# no-op entry that exists solely so it can be patched
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
metadata:
  name: config
DEBU[18:30:28] Using kubeadm config:
apiServer:
  certSANs:
  - localhost
  - 127.0.0.1
apiVersion: kubeadm.k8s.io/v1beta2
clusterName: kind
controlPlaneEndpoint: 169.254.123.2:6443
controllerManager:
  extraArgs:
    enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.15.3
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler:
  extraArgs: null
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- token: abcdef.0123456789abcdef
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 169.254.123.2
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  kubeletExtraArgs:
    fail-swap-on: "false"
    node-ip: 169.254.123.2
---
apiVersion: kubeadm.k8s.io/v1beta2
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 169.254.123.2
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: 169.254.123.2:6443
    token: abcdef.0123456789abcdef
    unsafeSkipCAVerification: true
kind: JoinConfiguration
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  kubeletExtraArgs:
    fail-swap-on: "false"
    node-ip: 169.254.123.2
---
apiVersion: kubelet.config.k8s.io/v1beta1
evictionHard:
  imagefs.available: 0%
  nodefs.available: 0%
  nodefs.inodesFree: 0%
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
DEBU[18:30:28] Running: /usr/bin/docker [docker exec --privileged kind-control-plane mkdir -p /kind]
DEBU[18:30:28] Running: /usr/bin/docker [docker exec --privileged -i kind-control-plane cp /dev/stdin /kind/kubeadm.conf]
 ✓ Creating kubeadm config 📜
DEBU[18:30:28] Running: /usr/bin/docker [docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6]
DEBU[18:30:28] I1022 18:30:28.499461      39 initconfiguration.go:189] loading configuration from "/kind/kubeadm.conf"
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration
I1022 18:30:28.504785      39 feature_gate.go:216] feature gates: &{map[]}
cannot use "169.254.123.2" as the bind address for the API Server
 ✗ Starting control-plane 🕹️
DEBU[18:30:28] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}} --filter label=io.k8s.sigs.kind.cluster=kind]
DEBU[18:30:28] Running: /usr/bin/docker [docker rm -f -v kind-control-plane]
Error: failed to create cluster: failed to init node with kubeadm: exit status 1

This is the version of the kubernetes cluster:

Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.7-gke.10", GitCommit:"8cea5f8ae165065f0d35e5de5dfa2f73617f02d1", GitTreeState:"clean", BuildDate:"2019-10-05T00:08:10Z", GoVersion:"go1.12.9b4", Compiler:"gc", Platform:"linux/amd64"}

This is the version of my docker server (through docker info command within the pod):

Server:
 Containers: 37
  Running: 37
  Paused: 0
  Stopped: 0
 Images: 43
 Server Version: 17.03.2-ce
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host macvlan null overlay
  Log:
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 595e75c212d19a81d2b808a518fe1afc1391dad5 (expected: 4ab9917febca54791c5f071a9d1f404867857fcc)
 runc version: 54296cf (expected: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe)
 init version: v0.13.0 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.14.137+
 Operating System: Container-Optimized OS from Google
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 14.69GiB
 Name: gke-jx-production-cluster-pool-1-58ef625a-gd2f
 ID: WSDV:6UV5:4DRW:DDLD:7BFE:ARUJ:55TA:U2DP:4PWY:BQF3:EWU6:RQGY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  10.0.0.0/8
  127.0.0.0/8
 Registry Mirrors:
  https://mirror.gcr.io/
  https://mirror.gcr.io/
 Live Restore Enabled: true
@axsaucedo axsaucedo added the kind/bug Categorizes issue or PR as related to a bug. label Oct 22, 2019
@BenTheElder
Copy link
Member

please see: #303

@BenTheElder
Copy link
Member

this will not work, you're nodes will be out on the host in a different namespace. you'll have to follow the details in #303 to get this working.

additionally, while we do run kind on GKE for kubernetes CI because that is where Kubernetes's CI runs, we do not recommend it over VM based CI. an additional layer of "Containers 'in' Containers" is a bit of a footgun.

@BenTheElder BenTheElder added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 22, 2019
@BenTheElder
Copy link
Member

BenTheElder commented Oct 22, 2019

specifically, details are in #303 (comment)

unfortunately this "mount the docker socket" pattern is unlikely to ever work well for kind, due to the nature of how containers work, so this is not supported...

EDIT: dind however is supported, if tricky and non-ideal (see the link).

@axsaucedo
Copy link
Author

Thank you very much @BenTheElder - this definitely felt like quite a big hack, running in kubernetes pods sounds much more sensible. Perfect. I will have a look at #303 and will try that approach instead!

@BenTheElder
Copy link
Member

I appreciate your understanding. #303 is not great either, but done carefully can help prevent resource leakage and avoids some of the network namespace issues.

We use https://github.com/kubernetes/test-infra/tree/master/images/krte in Kubernetes CI, formerly (and for a few things still) https://github.com/kubernetes/test-infra/tree/master/images/kubekins-e2e

these are at gcr.io/k8s-testimages/${name}, they contain many extra things (and not a kind binary at the moment!) but can be made to work, wrapper.sh in KRTE or runner.sh in kubekins-e2e contain the dind setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

2 participants