Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster not starting with DIND setup #625

Closed
fedepaol opened this issue Jun 18, 2019 · 7 comments
Closed

Cluster not starting with DIND setup #625

fedepaol opened this issue Jun 18, 2019 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@fedepaol
Copy link
Contributor

What happened:
Kind 0.3.0 cluster not starting on prow with k8s test images and docker in docker enabled.
Kind 0.2.1 works fine.
We are running our CI in a prow cluster, using cr.io/k8s-testimages/bootstrap:latest as base image with dind enabled.
The cluster does not start.

What you expected to happen:
The cluster to be up & running.

How to reproduce it (as minimally and precisely as possible):
docker run --privileged --rm -it -e DOCKER_IN_DOCKER_ENABLED='true' -v $(pwd):/workspace --entrypoint /usr/local/bin/runner.sh gcr.io/k8s-testimages/bootstrap:latest bash -c "wget https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz && tar -C /usr/local -xzf go1.12.6.linux-amd64.tar.gz && GO111MODUILE='on' /usr/local/go/bin/go get sigs.k8s.io/kind@v0.3.0 && ~/go/bin/kind create cluster --name=fede

Anything else we need to know?:
The control plane node starts correctly. If I look for kubelet logs inside I can see a bunch of

If I bash into it and look for the kubelet logs:


Jun 17 12:38:24 fede-control-plane kubelet[216]: E0617 12:38:24.141929     216 kuberuntime_manager.go:693] createPodSandbox for pod "kube-controller-manager-fede-control-plane_kube-system(ced6e7a763e96d1888013e32a44b1066)" failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/24/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/24/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: invalid argument: unknown
Jun 17 12:38:24 fede-control-plane kubelet[216]: E0617 12:38:24.141977     216 pod_workers.go:190] Error syncing pod ced6e7a763e96d1888013e32a44b1066 ("kube-controller-manager-fede-control-plane_kube-system(ced6e7a763e96d1888013e32a44b1066)"), skipping: failed to "CreatePodSandbox" for "kube-controller-manager-fede-control-plane_kube-system(ced6e7a763e96d1888013e32a44b1066)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-controller-manager-fede-control-plane_kube-system(ced6e7a763e96d1888013e32a44b1066)\" failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/24/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/24/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: invalid argument: unknown"

And also a bunch of failures while hitting the api (which makes sense since the apiserver is still down).

Environment:

  • kind version: (use kind version):
  • Kubernetes version: (use kubectl version):
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):
@fedepaol fedepaol added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2019
@BenTheElder
Copy link
Member

Note #303, we run our official CI on a Prow cluster with that image 😅

@fedepaol
Copy link
Contributor Author

Nice, I'll try to add the mounts and see if it works! Thanks!

@fedepaol
Copy link
Contributor Author

Not working :-(
If I understood correctly @BenTheElder suggestion, I should mount /lib/modules and /sys/fs/cgroup in the container.

I run as follows but still getting the same error:

docker run --privileged --rm -it -e DOCKER_IN_DOCKER_ENABLED="true" -v /lib/modules:/lib/modules -v /sys/fs/cgroup:/sys/fs/cgroup -v $(pwd):/workspace --entrypoint /usr/local/bin/runner.sh gcr.io/k8s-testimages/bootstrap:latest bash -c "wget https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz && tar -C /usr/local -xzf go1.12.6.linux-amd64.tar.gz && GO111MODUILE='on' /usr/local/go/bin/go get sigs.k8s.io/kind@v0.3.0 && ~/go/bin/kind create cluster --name=fede"
.
.
.
Error: failed to create cluster: failed to init node with kubeadm: exit status 1

Docker is pretty recent (18.09.6), the os where I run the container from is a Fedora 30

@BenTheElder
Copy link
Member

@fedepaol does this match your pod config fully? you need to have a volume mounted at /docker-graph (remapped /var/lib/docker for legacy reasons, the test-infra "bootstrap" image is full of cruft like that)

I see no issues with the following:

docker run --privileged --rm -it -e DOCKER_IN_DOCKER_ENABLED="true" -v /lib/modules:/lib/modules -v /sys/fs/cgroup:/sys/fs/cgroup -v $(pwd):/workspace --tmpfs /docker-graph --entrypoint /usr/local/bin/runner.sh gcr.io/k8s-testimages/bootstrap:latest bash -c "wget https://github.com/kubernetes-sigs/kind/releases/download/v0.3.0/kind-linux-amd64 && chmod +x ./kind-linux-amd64 && ./kind-linux-amd64 create cluster --name=fede --loglevel=debug"

NOTE:

  • downloading the kind binary instead of installing go and then go geting to save time
  • using --tmpfs /docker-graph to lazily simulate an emptyDir volume for docker's storage (NOTE: emptyDir is by default disk backed rather than memory backed, but you get the idea)
  • added --loglevel=debug to help see what happens

@BenTheElder
Copy link
Member

similarly no issues with the following

to sort of simulate an empty dir:
docker volume create fede
simulate the pod:
docker run --privileged --rm -it -e DOCKER_IN_DOCKER_ENABLED="true" -v /lib/modules:/lib/modules -v /sys/fs/cgroup:/sys/fs/cgroup -v $(pwd):/workspace -v fede:/docker-graph --entrypoint /usr/local/bin/runner.sh gcr.io/k8s-testimages/bootstrap:latest bash -c "wget https://github.com/kubernetes-sigs/kind/releases/download/v0.3.0/kind-linux-amd64 && chmod +x ./kind-linux-amd64 && ./kind-linux-amd64 create cluster --name=fede --loglevel=debug" # note the -v fede:/docker-graph
simulate cleaning up the empty dir:
docker rm volume fede

@BenTheElder BenTheElder added triage/needs-information Indicates an issue needs more information in order to work on it. kind/support Categorizes issue or PR as a support question. labels Jun 24, 2019
@fedepaol
Copy link
Contributor Author

Oh, I was missing the docker-graph dir. Works now!
Thanks and sorry for the interruption.

@BenTheElder
Copy link
Member

Glad it works!! Interesting that it may have worked with kind 0.2 without that ... I would not have expected that to work reliably 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

2 participants