Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kind pods in kube-system namespace are being restarted every 30s and are using wrong namespace when run within a k8s based jenkins pipeline #621

Closed
timowuttke opened this issue Jun 17, 2019 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@timowuttke
Copy link

Hi,

we are currently trying to get kind running in our jenkins CI which in turn is running in kubernetes. It's basically kubernetes in docker in jenkins in kubernetes.

What happened:
Cluster creation with kind finishes without errors and (after setting kubeconfig) "kubectl cluster-info" returns

Kubernetes master is running at https://localhost:43349
KubeDNS is running at https://localhost:43349/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

This seems to be fine. It's not possible to do anything with the cluster though as all commands either fail to execute or seem to not have any effect.

Observation 1:
Many commands are trying to use the namespace of the jenkins k8s cluster which obviously doesn't exist in kind, despite configuring kubectl to use kind:

kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
Error from server (NotFound): namespaces "jenkins" not found

When run with "-n default" the above command does create a deployment in kind default namespace.
Interestingly, "kubectl get po" correctly uses the kind default namespace and doesn't complain about non existing jenkins namespaces (after setting the kubeconfig):

kubectl get po
No resources found.

Observation 2:
Pods in kube-system namespace are being restarted once every 30s, initial pod creation can be delayed up to 20min:

kubectl get po --all-namespaces
NAMESPACE     NAME                                         READY   STATUS              RESTARTS   AGE
default       hello-node-78cd77d68f-4tj5q                  0/1     ContainerCreating   0          14m
kube-system   coredns-fb8b8dccf-6hksq                      0/1     Unknown             13         14m
kube-system   coredns-fb8b8dccf-t4c4v                      0/1     Running             16         14m
kube-system   etcd-kind-control-plane                      1/1     Running             43         37m
kube-system   ip-masq-agent-2dtgr                          1/1     Running             16         14m
kube-system   kindnet-45q4g                                0/1     Unknown             19         14m
kube-system   kube-apiserver-kind-control-plane            1/1     Running             43         37m
kube-system   kube-controller-manager-kind-control-plane   1/1     Running             43         37m
kube-system   kube-proxy-q49wq                             0/1     Error               16         14m
kube-system   kube-scheduler-kind-control-plane            1/1     Running             43         37m

Observation 3:
Some of the pods in kube-system namespace are also trying to use the jenkins namespace:

kubectl describe po kube-proxy-q49wq
Error from server (NotFound): namespaces "jenkins" not found

Observation 4:
One in two kubectl commands outright fails with:
Unable to connect to the server: EOF

What you expected to happen:
KInd cluster should start fully without any interference from "higher level" kubernetes namespaces. Pod creation should work normally.

How to reproduce it (as minimally and precisely as possible):

I'm able to reproduce this locally when running jenkins 2.177 in minikube v1.0.1 (I don't think the versions matter to much though). This is the podTemplate I'm using:

podTemplate(label: podLabel, yaml: """
kind: Pod
spec:
  imagePullSecrets:
  - name: <secret>
  containers:
  - name: main
    image: <image with kind, docker client and kubectl>
    command: ['cat']
    tty: true
    env:
    - name: DOCKER_HOST
      value: tcp://localhost:2375
    volumeMounts:
    - name: workdir
      mountPath: /home/jenkins/work
  - name: dind
    image: docker:18-dind
    securityContext:
      privileged: true
    volumeMounts:
    - name: dind-storage 
      mountPath: /var/lib/docker
  volumes:
  - name: dind-storage
    emptyDir: {}  
  - name: workdir
    emptyDir: {}  
    
"""
 ){
  node(podLabel) {
    stage('Preparation') {
  <git checkout>
 }

    stage('Kind') {
      container('main') {
        <kind commands as described above>
      }
    }
  }
}

Anything else we need to know?:
We tried giving more recourses to the pod but that didn't help.

Environment:

  • kind version: (use kind version): 0.3.0
  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2",
    Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2",
  • Docker version: (use docker info):
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.09.6
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.15.0
Operating System: Alpine Linux v3.9 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 9.524GiB
Name: migration-verification-e4f9af59-0975-4c2c-900d-436379adca-47j88
ID: 7K7A:WYI4:7IVX:J3TW:632V:HGNH:KWNU:KSLS:2MKW:RVHR:QP2F:PQZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

We really would like to use kind for e2e testing and similar things, so any help is appreciated :)

@timowuttke timowuttke added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2019
@BenTheElder
Copy link
Member

Many commands are trying to use the namespace of the jenkins k8s cluster which obviously doesn't exist in kind, despite configuring kubectl to use kind:

kind is not responsible for creating namespaces? you'll need to create these yourself

@BenTheElder
Copy link
Member

kube-system kube-proxy-q49wq 0/1 Error 16 14m

can you show the logs?

also note: to run successfully on kubernetes in docker in docker see #303

@BenTheElder
Copy link
Member

Observation 1:
Many commands are trying to use the namespace of the jenkins k8s cluster which obviously doesn't exist in kind, despite configuring kubectl to use kind:

Observation 3:
Some of the pods in kube-system namespace are also trying to use the jenkins namespace:

kubectl describe po kube-proxy-q49wq
Error from server (NotFound): namespaces "jenkins" not found

This observation is not quite correct, kubectl describe po kube-proxy-q49wq is missing --namespace=kube-system, which is why it can't see the pod. The pod isn't using the jenkins namespace, your kubectl is configured with jenkins as the default namespace, probably due to default access to the outer cluster and the pod running in this namespace. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod

You can disable this on the outer cluster's pod with automountServiceAccountToken: false
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

Also note that typically the default namespace is default in which case without the --namespace flag you'd still get the same error trying to list a pod in kube-system.


Observation 2:
Pods in kube-system namespace are being restarted once every 30s, initial pod creation can be delayed up to 20min:

Observation 4:
One in two kubectl commands outright fails with:
Unable to connect to the server: EOF

This sounds like a resources issue in your cluster. Can't tell what without more details / logs, but ...

You should also see #303. Sticking multiple layers of docker in docker requires more things to work properly. Running kind inside a Kubernetes pod is something we do ourselves for CI, but requires some special environmental setup that kind cannot do for you.

@BenTheElder
Copy link
Member

I'm inclined to mark this as a duplicate of #303 with regards to issues running in a Kubernetes Pod.

The namespaces issue might be worth documenting somewhere, but isn't really specific to kind. If you span up a cluster in the cloud or any other cluster deployment tool via your Jenkins pod you'd see the same kubectl behavior. 😬 This is working as intended from the point of view of the outer cluster, and not controllable by kind (nor should it be).

@timowuttke
Copy link
Author

timowuttke commented Jun 18, 2019

Hi, thanks for pointing me to #303, this issues does indeed sound like a duplicate. Is there any way to solve this without having to access host directories via hostpath? In our case that's not possible due to security concerns :/

@BenTheElder
Copy link
Member

I haven't found one yet. In particular you really do want the host's modules to be picked up by some things and you want the first docker in docker to get the host's cgroups mounts. It might be possible to work out without those but once we're running privileged pods we sorta treat that node as insecure and move on... 😬

@BenTheElder
Copy link
Member

Those mounts should be on the dind container fwiw.

@timowuttke
Copy link
Author

timowuttke commented Jun 18, 2019

Yeah, I now got it running fine in a minikube setup, not sure how to handle the hostpath issue, but that has nothing do to with kind. Thanks for your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants