Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KinD fails to start on GKE with DinD #677

Closed
howardjohn opened this issue Jul 1, 2019 · 6 comments
Closed

KinD fails to start on GKE with DinD #677

howardjohn opened this issue Jul 1, 2019 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@howardjohn
Copy link
Contributor

What happened:
I have two GKE nodes. I am running into two separate problems with them.

On node 1, KinD fails to start with Error: failed to create cluster: failed to apply overlay network: exit status 1. Full logs attached.
On node 2, the control plane seems to be in an unhealthy state:

NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   etcd-test-control-plane                      1/1     Running   20         10m
kube-system   kube-apiserver-test-control-plane            1/1     Running   20         10m
kube-system   kube-controller-manager-test-control-plane   1/1     Running   20         10m
kube-system   kube-scheduler-test-control-plane            1/1     Running   20         10m

Note the 20 restarts in 10m, they also don't seem to go into crashloopbackoff, maybe intended though.

Logs show

E0701 14:41:23.412993       1 controller.go:148] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
E0701 14:42:55.880979       1 autoregister_controller.go:193] v1alpha1.certmanager.k8s.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.certmanager.k8s.io" already exists
E0701 14:42:55.881134       1 autoregister_controller.go:193] v1alpha3.networking.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha3.networking.istio.io" already exists
E0701 14:42:55.881221       1 autoregister_controller.go:193] v1alpha1.authentication.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.authentication.istio.io" already exists
E0701 14:42:55.881253       1 autoregister_controller.go:193] v1alpha2.config.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha2.config.istio.io" already exists

I consistently get the same errors on the same nodes. I only have 2 nodes in my cluster.

What you expected to happen:

KinD successfully creates a cluster.

How to reproduce it (as minimally and precisely as possible):

Not exactly sure. It seems related to something with the image we are using, prow, or GKE, or something. See http://prow.istio.io/?job=istio-kind-simpleTest-master for a host of failures, including the failed to apply overlay network one (https://k8s-gubernator.appspot.com/build/istio-prow/logs/istio-kind-simpleTest-master/358). Note that link is not from my runs, that is a separate cluster attempting to do the same thing I am doing (run tests on prow with KinD).

Anything else we need to know?:

Environment:

  • kind version: (use kind version): v0.3.0.
  • Kubernetes version: (use kubectl version): 1.15
  • Docker version: (use docker info): 18.06.1-ce
  • OS (e.g. from /etc/os-release): Ubuntu 16.04

It's running in image gcr.io/istio-testing/istio-builder:v20190628-31457b43 on a GKE cluster.

I still have the nodes up if you need anymore debugging info

  • node1.log shows failure to start
  • node2.log shows succesful start on node 2
    kube-node2.log shows some attempts at getting logs from the crashing api server on node 2. Not sure if there is a better way since the logs come from the api server which is crashing.
@howardjohn howardjohn added the kind/bug Categorizes issue or PR as related to a bug. label Jul 1, 2019
@howardjohn
Copy link
Contributor Author

Got the kind logs on node 2
kind-logs.zip

@howardjohn
Copy link
Contributor Author

I tried with a simpler docker image and kind 0.4.0 and still see the same issue. Dockerfile:

# For DinD
FROM docker:latest as docker

# For golang
FROM golang:1.12.5 as golang
ENV GO111MODULE=on
RUN go get -u sigs.k8s.io/kind@v0.4.0

FROM debian:9-slim


# Copy from prior stages
COPY --from=docker /usr/local/bin/docker /usr/local/bin/docker

COPY --from=golang /go/bin/kind /usr/local/bin/kind

# Set CI variable which can be checked by test scripts to verify
# if running in the continuous integration environment.
ENV CI prow

# Add entrypoint to start docker
ADD prow-runner.sh /usr/local/bin/entrypoint
RUN chmod +rx /usr/local/bin/entrypoint

RUN apt-get update && apt-get -qqy --no-install-recommends install \
    build-essential \
    ca-certificates \
    curl \
    git

# Add kubectl
RUN curl -Lo /tmp/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/linux/amd64/kubectl && chmod +x /tmp/kubectl && mv /tmp/kubectl /usr/local/bin/

ENTRYPOINT ["entrypoint"]

@howardjohn
Copy link
Contributor Author

I think its related to #303, trying out the steps there

@BenTheElder
Copy link
Member

Quick comment: in addition to 303 ensure docker storage is a volume (eg emptyDir), typically /var/lib/docker

howardjohn added a commit to howardjohn/test-infra that referenced this issue Jul 1, 2019
@howardjohn howardjohn changed the title KinD fails to start on GKE/Prow KinD fails to start on GKE with DinD Jul 1, 2019
@howardjohn
Copy link
Contributor Author

Got it working with those, thanks, seems to be running smoothly now!

Big +1 on #303 though

@BenTheElder
Copy link
Member

I hear you on #303, just juggling priorities, glad you got it working! 😅

will add a note about dind to 303 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants