Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document how to run kind in a kubernetes pod #303

Open
BenTheElder opened this issue Feb 15, 2019 · 50 comments
Open

document how to run kind in a kubernetes pod #303

BenTheElder opened this issue Feb 15, 2019 · 50 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@BenTheElder
Copy link
Member

BenTheElder commented Feb 15, 2019

NOTE: We do NOT recommend doing this if it is at all avoidable. We don't have another option so we do it ourselves, but it has many footguns.

xref: #284
additionally these mounts are known to be needed:

    volumeMounts:
      # not strictly necessary in all cases
      - mountPath: /lib/modules
        name: modules
        readOnly: true
      - mountPath: /sys/fs/cgroup
        name: cgroup
   volumes:
    - name: modules
      hostPath:
        path: /lib/modules
        type: Directory
    - name: cgroup
      hostPath:
        path: /sys/fs/cgroup
        type: Directory

thanks to @maratoid

/kind documentation
/priority important-longterm

We probably need a new page in the user guide for this.

EDIT: Additionally, for any docker in docker usage the docker storage (typically /var/lib/docker) should be a volume. A lot of attempts at using kind in Kubernetes seem to miss this one. Typically an emptyDir is suitable for this.

EDIT2: you also probably want to set a pod DNS config to some upstream resolvers so as not to have your inner cluster pods trying to talk to the outer cluster's DNS which is probably on a clusterIP and not necessarily reachable.

 dnsPolicy: "None"
  dnsConfig:
    nameservers:
     - 1.1.1.1
     - 1.0.0.1

EDIT3: Loop devices are not namespaced, follow from #1248 to find our current workaround

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 15, 2019
This was referenced Feb 23, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 16, 2019
@BenTheElder
Copy link
Member Author

/remove-lifecycle stale

@BenTheElder
Copy link
Member Author

this came up again in #677 and again today in another deployment
/assign

@BenTheElder
Copy link
Member Author

see this about possibly inotify watch limits on the host and a work around #717 (comment)

this issue may also apply to other linux hosts (non-kubernetes)

@radu-matei
Copy link
Contributor

radu-matei commented Aug 6, 2019

For future reference, here's a working pod spec for running kind in a pod:
(Add your own image)
(cc @BenTheElder - is this a sane pod spec for kind?)

That being said, there should also be documentation for:

apiVersion: v1
kind: Pod
metadata:
  name: dind-k8s
spec:
  containers:
    - name: dind
      image: <image>
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /lib/modules
          name: modules
          readOnly: true
        - mountPath: /sys/fs/cgroup
          name: cgroup
        - name: dind-storage
          mountPath: /var/lib/docker
  volumes:
  - name: modules
    hostPath:
      path: /lib/modules
      type: Directory
  - name: cgroup
    hostPath:
      path: /sys/fs/cgroup
      type: Directory
  - name: dind-storage
    emptyDir: {}

@howardjohn
Copy link
Contributor

Make sure you do kind delete cluster! See #759

@BenTheElder
Copy link
Member Author

That's pretty sane. As @howardjohn notes please make sure you clean up the top level containers in that pod (IE kind delete cluster in an exit trap or similar). DNS may also give you issues.

why kind needs the volume mounts and what impact they have on the underlying node infrastructure

  • /lib/modules is not strictly necessary, but a number of things want to probe these contents, and it's harmless to mount them. For clarity I would make this mount read-only. No impact.
  • cgroups are mounted because cgroupsv1 containers don't exactly nest. if we were just doing docker in docker we wouldn't need this.

what happens when the pod is terminated before deleting the cluster (in the context of #658 (comment))

It depends on your setup, with these mounts IIRC the processes / containers can leak. Don't do this. Have an exit handler, deleting the containers should happen within the grace period.

configuring garbage collection for unused image to avoid node disk pressure (#663)

You shouldn't need this in CI, kind clusters should be ephemeral. Please, please use them ephemerally. There are a number of ways kind is not optimized for production long lived clusters. For temporary clusters used during a test this is a non-issue.

Also note that turning on disk eviction risks your pods being evicted based on the disk usage of the host. There's a reason this is off by default. Eventually we will ship an alternative to make long lived clusters better, but for now it's best to not depend on long lived clusters or image GC.

anything else?

DNS (see above). Your outer cluster's in-cluster DNS servers are typically on a clusterIP which won't necessarily be visible to the containers in the inner cluster. Ideally configure the "host machine" Pod's DNS to your preferred upstream DNS provider (see above).

@axsaucedo
Copy link

@BenTheElder thank you for pointing me in this issue - I am trying to see how we would fit @radu-matei's example into the testing automation we are introducing for our kubernetes project. Right now we want to trigger the creation of the cluster and the commands within that cluster from within a pod. I've tried creating a container that has docker and kind installed.

I've tried creating a pod with the instructions provided above, but I still can't seem to run the kind create cluster command provided - I get the error:

root@k8s-builder-7b5cc87566-fnz5b:/work# kind create cluster
Error: could not list clusters: failed to list nodes: exit status 1

For testing I am currently creating the container, running kubectl exec into it and running kind create cluster.

The current pod specification I have is the following:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: k8s-builder
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: k8s-101
    spec:
      containers:
      - name: k8s-docker-builder
        image: seldonio/core-builder:0.4
        imagePullPolicy: Always
        command: 
        - tail 
        args:
        - -f 
        - /dev/null
        volumeMounts:
        - mountPath: /lib/modules
          name: modules
          readOnly: true
        - mountPath: /sys/fs/cgroup
          name: cgroup
        - name: dind-storage
          mountPath: /var/lib/docker
        securityContext:
          privileged: true
      volumes:
      - name: modules
        hostPath:
          path: /lib/modules
          type: Directory
      - name: cgroup
        hostPath:
          path: /sys/fs/cgroup
          type: Directory
      - name: dind-storage
        emptyDir: {}

For explicitness, the way that I am installing Kind in the Dockerfile is as follows:

# Installing KIND
RUN wget https://github.com/kubernetes-sigs/kind/releases/download/v0.5.1/kind-linux-amd64 && \
    chmod +x kind-linux-amd64 && \
    mv ./kind-linux-amd64 /bin/kind

For explicitness, the way that I am installing Kubectl in the Dockerfile is as follows:

# Installing Kubectl
RUN wget https://storage.googleapis.com/kubernetes-release/release/v1.16.2/bin/linux/amd64/kubectl && \
    chmod +x ./kubectl && \
    mv ./kubectl /bin

For explicitness, the way that I am installing Docker in the Dockerfile is as follows:

# install docker
RUN \
    apt-get update && \
    apt-get install -y \
         apt-transport-https \
         ca-certificates \
         curl \
         gnupg2 \
         software-properties-common && \
    curl -fsSL https://download.docker.com/linux/$(. /etc/os-release; echo "$ID")/gpg | apt-key add - && \
    add-apt-repository \
       "deb [arch=amd64] https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
       $(lsb_release -cs) \
       stable" && \
    apt-get update && \
    apt-get install -y docker-ce

What should i make sure I take into account to make sure this works?

@felipecrs
Copy link
Contributor

felipecrs commented Feb 22, 2022

Just wondering: won't mounting cgroups affect the pod's memoryRequests/memoryLimits?

@felipecrs
Copy link
Contributor

felipecrs commented Feb 22, 2022

Also, would not dnsPolicy: Default be a better suggestion over hardcoded DNS servers?

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

UPDATE: I'm running with dnsPolicy: Default ever since and it solved all the DNS related issues indeed.

@felipecrs
Copy link
Contributor

felipecrs commented Feb 22, 2022

Another question that I have, would cgroups v2 help to alleviate these requirements, especially on mounting /sys/fs/cgroup?

@felipecrs
Copy link
Contributor

felipecrs commented Feb 23, 2022

Again, another question: would the regular docker cleanup be sufficient over kind delete cluster?

$ docker ps --all --quiet | xargs --no-run-if-empty -- docker rm --force
$ docker system prune --all --volumes --all

I'm planning to setup this as a post-hook for all my dind pods (because I can't control what people do inside them, so I can't force them to call kind delete cluster).

Also, running kind delete cluster by myself as a post-hook may not be as much effective, as users are free to run inner containers in this dind.

EDIT: yes, it should be sufficient since this is what is being done in Kubernetes CI.

@felipecrs
Copy link
Contributor

FYI the cgroups changes (a modified / enhanced version of what is outlined in @jieyu's post above, thanks!) in the KIND entrypoint are in kind v0.10.0+ images, but you still have your docker-in-docker setup to contend with.

@BenTheElder by your last sentence, do you mean that even though changes are in-place for the kindest/node it is still needed to have them in the dind image which kind is running inside?

@felipecrs
Copy link
Contributor

felipecrs commented Feb 23, 2022

I would also mention another countermeasure, which is required in my case:

  • Unset all the environment variables which starts with KUBERNETES_ during the pod initialization.
    • These variables are injected automatically by Kubernetes on the pod
    • They can conflict, causing kubectl commands inside of the pod to try to interact with the outer Kubernetes.

If using bash in the entrypoint, this can be achieved with:

unset "${!KUBERNETES_@}

Refs: felipecrs/docker-images@690a879

@BenTheElder
Copy link
Member Author

DNS Default is a good choice.

cgroupsv2 should have actual nesting but we've not had the opportunity to test for any kind issues nesting in this way. I still don't particularly recommend Kubernetes in Kubernetes.

We have layers including cleanup hook to delete docker resources and an inner hook to delete the cluster. Deleting the clusters should generally be sufficient but the other hook is more general than kind. You can test deletion behaviors locally and observe pod removal.

Yes re: dind.

We disable service account automount for CI pods. I highly recommend this.

@felipecrs
Copy link
Contributor

@BenTheElder

So please add:

automountServiceAccountToken: false

To the issue description. And, still, these environment variables:

KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443

Gets injected no matter what (and there isn't a way to disable it):

https://github.com/kubernetes/kubernetes/blob/25ccc48c606f99d4d142093a84764fda9588ce1e/pkg/kubelet/kubelet_pods.go#L550-L551

So, I still wonder, can't them still cause conflicts?

@felipecrs
Copy link
Contributor

felipecrs commented Feb 23, 2022

Ok, funnily enough, if I add @jieyu's --cgroup-parent adjustment to my dind container, whenever I try to spin up a pod with constrained resources I run into the following issue when trying to deploy another pod with constrained resources inside of the kind cluster (which is inside of the dind pod):

Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "200000": write /sys/fs/cgroup/cpu,cpuacct/kubelet/kubepods/burstable/pod82c385b7-d6c6-4333-b006-394c9552dad7/init/cpu.cfs_quota_us: invalid argument: unknown

If I understood well, as long as we properly cleanup any leftover containers/clusters before deleting the main pod, we don't need such adjustment. Right?

In either case, I would really appreciate to hear if anyone has anything to say about the error I mentioned. The exact same thing also happens if I don't mount /sys/fs/cgroup (and do not use --cgroup-parent).

I discarded kubernetes/kubernetes#72878 because I'm running with kernel 5.4.

But there is kubernetes/kubeadm#2335 which mentions something about fixes included in K8s 1.21. It could be the case, as I'm running K8s 1.20 (but I'm using the default 1.21 with kind 0.11.1 inside of the pod).

felipecrs added a commit to felipecrs/docker-images that referenced this issue Feb 23, 2022
@BenTheElder
Copy link
Member Author

To the issue description. And, still, these environment variables:

It's not necessary to run kind in Kubernetes, however I recommend it for running any sort of CI in Kubernetes if you're going to do that, YMMV depending on exact use case for running kind in Kubernetes.

So, I still wonder, can't them still cause conflicts?

No, not as long as the service account credentials are not mounted at the well-known path.

https://kubernetes.io/docs/reference/kubectl/overview/#in-cluster-authentication-and-namespace-overrides

https://cs.k8s.io/?q=POD_NAMESPACE&i=nope&files=staging%2Fsrc%2Fk8s.io%2Fclient-go&excludeFiles=&repos=kubernetes/kubernetes

felipecrs added a commit to felipecrs/docker-images that referenced this issue Feb 24, 2022
felipecrs added a commit to felipecrs/docker-images that referenced this issue Mar 4, 2022
- Use `s6-setuidgid` instead of building a custom entrypoint with `shc` to use suid
- By dropping the custom entrypoint, we also drop the automatic removal of `KUBERNETES_` variables, which [should not needed](kubernetes-sigs/kind#303 (comment))
@howardjohn
Copy link
Contributor

howardjohn commented Jun 28, 2023

Has anyone gotten this running on a host that uses cgroupsv2?

edit: will post more details in a few minutes but I think I have a solution.

@felipecrs
Copy link
Contributor

I have to say I switched to k3d, which required no additional volume mounts nor any specific workarounds to work. I'm still using cgroupsv1 in the nodes, but maybe it's worth trying k3d on a cgroupsv2 one.

@BenTheElder
Copy link
Member Author

It shouldn't require any additional mounts these days. This issue is quite old.

It still remains a huge pile of footguns to run any sort of nested Kubernetes kind, k3d or othwerise:

  • non-namespace limits like inotify
  • DNS nesting

...

@jglick
Copy link

jglick commented Jun 28, 2023

docker:*-dind seems to work well enough to run Kind in CI if you give it enough memory (otherwise you get opaque errors) and provide an emptyDir for /var/lib/docker.

@felipecrs
Copy link
Contributor

felipecrs commented Jun 28, 2023

docker:*-dind seems to work well enough to run Kind in CI if you give it enough memory (otherwise you get opaque errors) and provide an emptyDir for /var/lib/docker.

That's what I do: https://github.com/felipecrs/jenkins-agent-dind#:~:text=Then%20you%20can%20use%20the%20following%20Pod%20Template%3A (in my case docker data dir is remapped to the agent workspace).

Additionally I use https://github.com/cndoit18/lxcfs-on-kubernetes to fake /proc CPU and memory so that the docker daemon within the pod will not "believe" it can use the entire host CPUs and memory for its containers (yes, docker itself doesn't take cgroups into account for its own limits :P).

@BenTheElder
Copy link
Member Author

/help

We should start putting together a new site under the docs where we can at least clearly keep this updated and just put a warning note at the top about the security implications aside from the footguns. We can iterate better on a markdown page than this thread, it's long overdue.

@k8s-ci-robot
Copy link
Contributor

@BenTheElder:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

We should start putting together a new site under the docs where we can at least clearly keep this updated and just put a warning note at the top about the security implications aside from the footguns. We can iterate better on a markdown page than this thread, it's long overdue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 29, 2023
@danielfbm
Copy link

I can give it a shot if you guys don't mind, I can start next week, and I will mostly need some help reviewing, is that ok? @BenTheElder

@erichorwath
Copy link

@howardjohn we found some problems with cgroupv2 only nodes: kubernetes/kubernetes#119853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests