Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default the "cgroupDriver" setting of the kubelet to "systemd" #2376

Closed
3 of 4 tasks
neolit123 opened this issue Jan 13, 2021 · 35 comments · Fixed by kubernetes/kubernetes#102133
Closed
3 of 4 tasks
Assignees
Labels
kind/design Categorizes issue or PR as related to design. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@neolit123
Copy link
Member

neolit123 commented Jan 13, 2021

the kubelet from the official k8s package is run using systemd (as the serv manager) and kubeadm depends on this assumption.
the drivers between the kubelet and the runtime should match and if the kubelet is run using systemd the driver should be "systemd":
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

for Docker there is auto-detection of cgroup drivers, which we may move to be on the side of dockershim:
kubernetes/kubernetes#97764 (comment)
for runtimes other than Docker, currently kubeadm does not auto-detect or set the value.

this ticket is about the proposal to default KubeletConfiguration that kubeadm generates to the "systemd" driver unless the user is explicit about it - e.g.:

if kubeletConfig.cgroupDriver = "" {
  kubeletConfig.cgroupDriver = "systemd"
}

the container runtime docs already instruct users to use the systemd driver:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

but the above will be a breaking change to users that are not configuring their CR to systemd and not matching that for the kubelet.

opening this ticket to gather feedback from users and higher level tools.

chat in the #kind channel from today:
https://kubernetes.slack.com/archives/CEKK1KTN2/p1610555169036600


1.21

1.22

@k8s-ci-robot k8s-ci-robot added kind/design Categorizes issue or PR as related to design. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 13, 2021
@neolit123
Copy link
Member Author

@BenTheElder (kind) @afbjorklund (minikube) @randomvariable @fabriziopandini (cluster api)

@neolit123
Copy link
Member Author

@floryut (kubespray)

@champtar
Copy link

kubespray force the configuration I think, for docker it's systemd, and for containerd it's still cgroupfs but should change soon

@aojea
Copy link
Member

aojea commented Jan 13, 2021

ping @saschagrunert for crio feedback

@neolit123
Copy link
Member Author

neolit123 commented Jan 13, 2021

i've noticed we are missing the cgroup_manager: systemd in our cri-o config / install instructions BTW:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

this is something i can update this cycle (but would prefer if someone more familiar with cri-o does this).

@neolit123 neolit123 added this to the v1.21 milestone Jan 13, 2021
@afbjorklund
Copy link

afbjorklund commented Jan 13, 2021

the container runtime docs already instruct users to use the systemd driver:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

the docs say that you should use systemd as the driver, if your system is using systemd...

When systemd is chosen as the init system for a Linux distribution, [...]

the emphasis is on that you shouldn't have different drivers, not that you must use systemd ?

When there are two cgroup managers on a system, you end up with two views of those resources


minikube does some detection of what the host is using, and sets cgroupDriver to match

kubernetes/minikube#4172 (PR kubernetes/minikube#6287)

however, for the supported OS distribution (VM) the default has been changed to systemd

kubernetes/minikube#4770 (PR kubernetes/minikube#6651)

but the above will be a breaking change to users that are not configuring their CR to systemd and not matching that for the kubelet.

I think the vocal minority that does not use systemd could get upset about the runtime default changing...

But as far as I know, Kubernetes has recommended changing the Docker default driver for a long time ?

@afbjorklund
Copy link

@neolit123

i've noticed we are missing the cgroup_manager: systemd in our cri-o config

the cri-o default changed in version 1.18, so most users would have systemd (by default)

cri-o/cri-o@9ec532c

cri-o/cri-o#3719

but it would be good to have it documented, since cgroupfs was the default for cri-o 1.17

@neolit123
Copy link
Member Author

the docs say that you should use systemd as the driver, if your system is using systemd...

that is true, of course. but the official kubelet package uses systemd to manage the kubelet.

@neolit123
Copy link
Member Author

neolit123 commented Jan 13, 2021

thanks for the details on minikube

I think the vocal minority that does not use systemd could get upset about the runtime default changing...
But as far as I know, Kubernetes has recommended changing the Docker default driver for a long time ?

indeed, we have explained to users that they should use the "systemd" driver for a number of releases (outside of the support skew).
if someone is using the official kubeadm / kubelet packages it makes sense for the driver to be set to "systemd" on both CR and kubelet sides.

@saschagrunert
Copy link
Member

@neolit123 @afbjorklund I can give the CRI-O docs an update this cycle and mention the driver. 👍

@neolit123
Copy link
Member Author

thanks @saschagrunert

@oldthreefeng
Copy link

oldthreefeng commented Jan 16, 2021

and for containerd it's still cgroupfs but should change soon

so Is there any docs for containerd change default cgroup driver to systemd? @champtar

@floryut
Copy link

floryut commented Jan 16, 2021

There is no doc at the moment, but we plan to default containerd (as the CR) in the next kubespray release, and while we do that we plan to move the cgroup driver to systemd.

@oldthreefeng
Copy link

i think there must be a common method to check cri cgroup driver, so during kubeadm init , if kubeletconfig. groupDirver is blank, we check the cri cgroup driver by the commom method. this checking is default by using docker now ? but missing for container/cri-o?

@afbjorklund
Copy link

i think there must be a common method to check cri cgroup driver, so during kubeadm init , if kubeletconfig. groupDirver is blank, we check the cri cgroup driver by the commom method. this checking is default by using docker now ? but missing for container/cri-o?

It was discussed in #844, when doing the current docker info -f "{{.CgroupDriver}}" hack

@neolit123
Copy link
Member Author

we check the cri cgroup driver by the common method

this is currently not possible. some time ago, someone invested a lot of time investigating how to place cgroup driver detection inside the kublet for all container runtimes but it never happened.

@fabriziopandini
Copy link
Member

fabriziopandini commented Jan 19, 2021

@neolit123 I have some concern about changing the default mostly for upgrade workflow:

In the case of kubeadm upgrades (in-place upgrades):
I should double check if changing the default for cgroup impacts upgrades on an existing node; assuming the worst case (it impacts), this could lead to problems as documented here.

In the case of Cluster API (immutable upgrades), this requires documentation/coordination with the users, unless this happens by default in CAPBK; also AFAIK currently it is not possible to change the KubeletConfiguration (see kubernetes-sigs/cluster-api#1584), so are we still relying on ExtraArgs, which is not ideal.

@afbjorklund
Copy link

afbjorklund commented Jan 19, 2021

We had some "interesting" issues in minikube, that turned out to be Docker configuration (or handling):

$ sudo systemctl stop docker.socket
$ docker info --format {{.CgroupDriver}}
panic: reflect: indirection through nil pointer to embedded struct [recovered]
	panic: reflect: indirection through nil pointer to embedded struct

The moral of the story is to always check the docker daemon status, before trying to look at server details...

@neolit123
Copy link
Member Author

@fabriziopandini

In the case of kubeadm upgrades (in-place upgrades):

this will not be a problem for the kubeadm mutable upgrades as kubeadm no longer upgrades the KubeletConfiguration (part of Rosti's work). so the shared KubeletConfiguration for all nodes will remain on "cgroupfs" if the user did not set it.
for new clusters however (via init), kubeadm will default it to "systemd" which will then require all nodes to have the CR set to "systemd" too.

In the case of Cluster API (immutable upgrades),

if both image builder set the CR to "systemd" and kubeadm sets KubeletConfiguration to "systemd" (by default) and assuming that image-builder produces images strictly targeting a k8s version this will not be a problem. the --cgroup-driver flag doesn't have to be set via image builder or CAPBK doesn't have to pass this via KubeletConfiguration.

(see kubernetes-sigs/cluster-api#1584)

with all kubelet flags being removed this will hit the users.
so ideally CAPBK should be ahead of time and support KubeletConfiguration soon.

@neolit123
Copy link
Member Author

@afbjorklund

panic: reflect: indirection through nil pointer to embedded struct [recovered]

this seems like something that can be avoided even if the server is not running.

@afbjorklund
Copy link

this seems like something that can be avoided even if the server is not running.

Indeed, but haven't opened a bug with docker (or moby?) yet. Still happens, though.

@detiber
Copy link
Member

detiber commented Jan 20, 2021

I don't think that kubeadm should be modifying the configuration for the container runtime. It looks like crictl from cri-tools already supports exposing this information and is already being used by kubeadm, it seems like it would be a good choice to use for this purpose as well.

output from running crictl info within a kind cluster:

{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  },
  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",
    "PluginMaxConfNum": 1,
    "Prefix": "eth",
    "Networks": [
      {
        "Config": {
          "Name": "cni-loopback",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "type": "loopback",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"type\":\"loopback\"}"
            }
          ],
          "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n  \"type\": \"loopback\"\n}]\n}"
        },
        "IFName": "lo"
      },
      {
        "Config": {
          "Name": "multus-cni-network",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "cniVersion": "0.3.1",
                "name": "multus-cni-network",
                "type": "multus",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"cniVersion\":\"0.3.1\",\"delegates\":[{\"cniVersion\":\"0.3.1\",\"name\":\"kindnet\",\"plugins\":[{\"ipMasq\":false,\"ipam\":{\"dataDir\":\"/run/cni-ipam-state\",\"ranges\":[[{\"subnet\":\"10.244.0.0/24\"}]],\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"type\":\"host-local\"},\"mtu\":1500,\"type\":\"ptp\"},{\"capabilities\":{\"portMappings\":true},\"type\":\"portmap\"}]}],\"kubeconfig\":\"/etc/cni/net.d/multus.d/multus.kubeconfig\",\"name\":\"multus-cni-network\",\"type\":\"multus\"}"
            }
          ],
          "Source": "{\"cniVersion\":\"0.3.1\",\"name\":\"multus-cni-network\",\"plugins\":[{\"cniVersion\":\"0.3.1\",\"delegates\":[{\"cniVersion\":\"0.3.1\",\"name\":\"kindnet\",\"plugins\":[{\"ipMasq\":false,\"ipam\":{\"dataDir\":\"/run/cni-ipam-state\",\"ranges\":[[{\"subnet\":\"10.244.0.0/24\"}]],\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"type\":\"host-local\"},\"mtu\":1500,\"type\":\"ptp\"},{\"capabilities\":{\"portMappings\":true},\"type\":\"portmap\"}]}],\"kubeconfig\":\"/etc/cni/net.d/multus.d/multus.kubeconfig\",\"name\":\"multus-cni-network\",\"type\":\"multus\"}]}"
        },
        "IFName": "eth0"
      }
    ]
  },
  "config": {
    "containerd": {
      "snapshotter": "overlayfs",
      "defaultRuntimeName": "runc",
      "defaultRuntime": {
        "runtimeType": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": ""
      },
      "untrustedWorkloadRuntime": {
        "runtimeType": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": ""
      },
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": ""
        },
        "test-handler": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": ""
        }
      },
      "noPivot": false,
      "disableSnapshotAnnotations": false,
      "discardUnpackedLayers": false
    },
    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
      "maxConfNum": 1,
      "confTemplate": ""
    },
    "registry": {
      "mirrors": {
        "docker.io": {
          "endpoint": [
            "https://registry-1.docker.io"
          ]
        }
      },
      "configs": null,
      "auths": null,
      "headers": null
    },
    "imageDecryption": {
      "keyModel": ""
    },
    "disableTCPService": true,
    "streamServerAddress": "127.0.0.1",
    "streamServerPort": "0",
    "streamIdleTimeout": "4h0m0s",
    "enableSelinux": false,
    "selinuxCategoryRange": 1024,
    "sandboxImage": "k8s.gcr.io/pause:3.3",
    "statsCollectPeriod": 10,
    "systemdCgroup": false,
    "enableTLSStreaming": false,
    "x509KeyPairStreaming": {
      "tlsCertFile": "",
      "tlsKeyFile": ""
    },
    "maxContainerLogSize": 16384,
    "disableCgroup": false,
    "disableApparmor": false,
    "restrictOOMScoreAdj": false,
    "maxConcurrentDownloads": 3,
    "disableProcMount": false,
    "unsetSeccompProfile": "",
    "tolerateMissingHugetlbController": true,
    "disableHugetlbController": true,
    "ignoreImageDefinedVolumes": false,
    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.13.15",
  "lastCNILoadStatus": "OK"
}

@neolit123
Copy link
Member Author

i saw no objections to the kubeadm change so i will send a WIP PR for this today to get some feedback.
code freeze for 1.21 is the 9th of March.

@neolit123
Copy link
Member Author

i've sent the PR for moving to systemd in kubeadm 1.21:
kubernetes/kubernetes#99471

@neolit123
Copy link
Member Author

cc @xphoniex @fcolista @oz123 for feedback on Alpine / OpenRC.

  • would this change be problematic for OpenRC?
  • my assumption is that Alpine container runtimes are defaulted to the "cgroupfs" driver?
  • are there plans for cgroupv2 support under Alpine which would need the "systemd" driver IIUC?

alternatively in the pending PR change we could only apply the "systemd" driver if the systemd init system is used.

@pacoxu
Copy link
Member

pacoxu commented Feb 26, 2021

/cc

@oz123
Copy link

oz123 commented Feb 26, 2021

I can't speak for alpine. However, I am fine if this can be changed with a configuration (as this is the case).

@neolit123
Copy link
Member Author

after some discussion on the PR the latest proposal is the following:

  • in 1.21 "kubeadm init" will start applying the "systemd" driver by default unless the user is explicit in the KubeletConfiguration, but it will not do that for other commands like "kubeadm upgrade".
  • we keep this issue open and in 1.22 all kubeadm commands will default to the "systemd" driver unless the user was explicit about it.

i have plans to write a small guide on how users can migrate to the "systemd" driver and this guide will be linked from https://kubernetes.io/docs/setup/production-environment/container-runtimes/

users that don't wish to migrate can be explicit and set "cgroupfs" as their driver, which means "upgrade" will not touch their setup in this regard.

@neolit123 neolit123 added kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Mar 1, 2021
@neolit123
Copy link
Member Author

website guide PR: kubernetes/website#26786

@neolit123
Copy link
Member Author

added a new item for 1.22 to pin the cgroup drivers in kinder to "systemd", see this in the OP:

pin the cgroup driver for docker / containerd to "systemd" in kinder. should be done here; information about setting the driver for CRs is here

@pacoxu
Copy link
Member

pacoxu commented May 19, 2021

/assign
kubernetes/kubernetes#102133 is for moving the defaulting to cmd/kubeadm/app/componentconfigs/kubelet.go#Default().

@BenTheElder
Copy link
Member

kind switched in https://github.com/kubernetes-sigs/kind/releases/tag/v0.13.0, but delayed to Kubernetes v1.24.0+ (to minimize breaking change to our users)

@pacoxu
Copy link
Member

pacoxu commented Jun 6, 2023

Currently, starting with v1.22 and later, when creating a cluster with kubeadm, if the user does not set the cgroupDriver field under KubeletConfiguration, kubeadm defaults it to systems.

As kubernetes/enhancements#4034 is in the discussion, kubeadm may not set the default value if kubelet can detect and set the cgroup driver using CRI to get container runtime cgroup driver status. After that KEP is merged, we may need a new issue to track it. (The KEP is not merged and not implemented yet, so it is still in process.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.