Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efs-plugin error: Detected OS without systemd #1245

Closed
mohammadasim opened this issue Jan 14, 2024 · 4 comments
Closed

efs-plugin error: Detected OS without systemd #1245

mohammadasim opened this issue Jan 14, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mohammadasim
Copy link

/kind bug

What happened?
I have deployed efs-csi-driver to our Kops managed kubernetes clusters. The chart version is aws-efs-csi-driver-2.5.0. We use ubuntu-22.04 for our worker and master nodes. Recently I upgraded our clusters from kubernetes version 1.26.6 to 1.27.8. After this upgrade the efs-csi-node have the above this error in the logs, Detected OS without systemd. After this error, the pod is not able to mount efs volumes and the pods that have efs volumes are stuck in container creating state.
To resolve this issue, I have manually delete the efs-csi-node pod and at that point the issue will be resolved. However, after few minutes the error reappears in the logs.

What you expected to happen?
I expected the pods would start normally and mounts should work without any issues.

How to reproduce it (as minimally and precisely as possible)?
perhaps running the same version of chart on kops managed k8s cluster having version 1.27.8 and on ubuntu 22.04
Anything else we need to know?:
We still have a kops managed kubernetes cluster using the AMI for worker and master nodes, kubernetes version 1.26.6 running the same version of efs-csi-driver chart without any of the above issues.
Environment

  • Kubernetes version (use kubectl version): 1.27.8
  • Driver version: chart version: aws-efs-csi-driver-2.5.0 App version: 1.7.0

Please also attach debug logs to help us better diagnose

  • Instructions to gather debug logs can be found here
Name:                 efs-csi-node-plvxl
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      efs-csi-node-sa
Node:                 i-0a6dfa7de7ffe8d72/10.10.119.234
Start Time:           Sun, 14 Jan 2024 18:06:23 +0000
Labels:               app=efs-csi-node
                      app.kubernetes.io/instance=aws-efs-csi-driver
                      app.kubernetes.io/name=aws-efs-csi-driver
                      controller-revision-hash=7d4d778d48
                      pod-template-generation=3
Annotations:          <none>
Status:               Running
IP:                   10.10.119.234
IPs:
  IP:           10.10.119.234
Controlled By:  DaemonSet/efs-csi-node
Containers:
  efs-plugin:
    Container ID:  containerd://a4350efe150fbad25c4be09b924f7075c36687db375d99560c94385f2d58101f
    Image:         amazon/aws-efs-csi-driver:v1.6.0
    Image ID:      docker.io/amazon/aws-efs-csi-driver@sha256:f8174f687776a29954aa8984145e9b0576369d6089fdc39003206682b2a4fe58
    Port:          9809/TCP
    Host Port:     9809/TCP
    Args:
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --v=2
      --vol-metrics-opt-in=false
      --vol-metrics-refresh-period=240
      --vol-metrics-fs-rate-limit=5
    State:          Running
      Started:      Sun, 14 Jan 2024 18:39:58 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sun, 14 Jan 2024 18:33:55 +0000
      Finished:     Sun, 14 Jan 2024 18:34:55 +0000
    Ready:          True
    Restart Count:  10
    Limits:
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
    Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=2s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:                 unix:/csi/csi.sock
      CSI_NODE_NAME:                 (v1:spec.nodeName)
      AWS_ROLE_ARN:                 arn:aws:iam::122666361821:role/dev-efs-csi-role
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /csi from plugin-dir (rw)
      /etc/amazon/efs-legacy from efs-utils-config-legacy (rw)
      /var/amazon/efs from efs-utils-config (rw)
      /var/lib/kubelet from kubelet-dir (rw)
      /var/run/efs from efs-state-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7c99g (ro)
  csi-driver-registrar:
    Container ID:  containerd://ce436d59821462a66ead6250f9d5563e5eeb6cd08389db814c248ad34e4b70e3
    Image:         public.ecr.aws/eks-distro/kubernetes-csi/node-driver-registrar:v2.8.0-eks-1-27-3
    Image ID:      public.ecr.aws/eks-distro/kubernetes-csi/node-driver-registrar@sha256:74e13dfff1d73b0e39ae5883b5843d1672258b34f7d4757995c72d92a26bed1e
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
      --v=2
    State:          Running
      Started:      Sun, 14 Jan 2024 18:06:47 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:                      /csi/csi.sock
      DRIVER_REG_SOCK_PATH:         /var/lib/kubelet/plugins/efs.csi.aws.com/csi.sock
      KUBE_NODE_NAME:                (v1:spec.nodeName)
      AWS_ROLE_ARN:                 arn:aws:iam::122666361821:role/dev-efs-csi-role
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7c99g (ro)
  liveness-probe:
    Container ID:  containerd://620674ce4b01287c2f3d6be793ee300b284303d5c003762da30d29d97b989f79
    Image:         public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.10.0-eks-1-27-3
    Image ID:      public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe@sha256:25b4d3f9cf686ac464a742ead16e705da3adcfe574296dd75c5c05ec7473a513
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
      --health-port=9809
      --v=2
    State:          Running
      Started:      Sun, 14 Jan 2024 18:06:56 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      AWS_ROLE_ARN:                 arn:aws:iam::122666361821:role/dev-efs-csi-role
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7c99g (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/efs.csi.aws.com/
    HostPathType:  DirectoryOrCreate
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  efs-state-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/efs
    HostPathType:  DirectoryOrCreate
  efs-utils-config:
    Type:          HostPath (bare host directory volume)
    Path:          /var/amazon/efs
    HostPathType:  DirectoryOrCreate
  efs-utils-config-legacy:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/amazon/efs
    HostPathType:  DirectoryOrCreate
  kube-api-access-7c99g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  36m                    default-scheduler  Successfully assigned kube-system/efs-csi-node-plvxl to i-0a6dfa7de7ffe8d72
  Normal   Pulling    35m                    kubelet            Pulling image "amazon/aws-efs-csi-driver:v1.6.0"
  Normal   Pulled     35m                    kubelet            Successfully pulled image "amazon/aws-efs-csi-driver:v1.6.0" in 7.614163902s (15.821406651s including waiting)
  Normal   Pulling    35m                    kubelet            Pulling image "public.ecr.aws/eks-distro/kubernetes-csi/node-driver-registrar:v2.8.0-eks-1-27-3"
  Normal   Started    35m                    kubelet            Started container csi-driver-registrar
  Normal   Pulling    35m                    kubelet            Pulling image "public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.10.0-eks-1-27-3"
  Normal   Pulled     35m                    kubelet            Successfully pulled image "public.ecr.aws/eks-distro/kubernetes-csi/node-driver-registrar:v2.8.0-eks-1-27-3" in 2.80759201s (6.850662337s including waiting)
  Normal   Created    35m                    kubelet            Created container csi-driver-registrar
  Normal   Started    35m                    kubelet            Started container liveness-probe
  Normal   Pulled     35m                    kubelet            Successfully pulled image "public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.10.0-eks-1-27-3" in 2.154018932s (9.005039836s including waiting)
  Normal   Created    35m                    kubelet            Created container liveness-probe
  Normal   Created    33m (x3 over 35m)      kubelet            Created container efs-plugin
  Normal   Pulled     33m (x2 over 34m)      kubelet            Container image "amazon/aws-efs-csi-driver:v1.6.0" already present on machine
  Normal   Started    33m (x3 over 35m)      kubelet            Started container efs-plugin
  Warning  Unhealthy  33m (x3 over 34m)      kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    5m55s (x111 over 34m)  kubelet            Back-off restarting failed container efs-plugin in pod efs-csi-node-plvxl_kube-system(5737bb58-13e3-4446-8f77-ebd8f5fdafab)

NAME                 READY   STATUS    RESTARTS        AGE
efs-csi-node-plvxl   3/3     Running   10 (9m3s ago)   37m

I couldn't find any logs in this location, /var/log/amazon/efs
content of /var/run/efs

ls /var/run/efs
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.09605787-e283-4a32-83fa-968c6d673194.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20188
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.09605787-e283-4a32-83fa-968c6d673194.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20188+
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5bde4cd0-39b4-4c9e-bdec-8608cd0a36fa.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20356
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5bde4cd0-39b4-4c9e-bdec-8608cd0a36fa.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20356+
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5c033092-1c08-4840-9063-6635ad996329.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20293
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5c033092-1c08-4840-9063-6635ad996329.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20293+
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.9066d50f-0354-420d-9a72-3e388f0d0eef.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20196
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.9066d50f-0354-420d-9a72-3e388f0d0eef.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20196+
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.de99316d-ce03-4252-8ea8-e34505db07a7.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20376
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.de99316d-ce03-4252-8ea8-e34505db07a7.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20376+
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326
fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326+
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.09605787-e283-4a32-83fa-968c6d673194.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20188
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5bde4cd0-39b4-4c9e-bdec-8608cd0a36fa.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20356
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.5c033092-1c08-4840-9063-6635ad996329.volumes.kubernetes.io~csi.pvc-b12cd68a-64e5-4e8a-ac16-47d27d0b7987.mount.20293
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.9066d50f-0354-420d-9a72-3e388f0d0eef.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20196
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.de99316d-ce03-4252-8ea8-e34505db07a7.volumes.kubernetes.io~csi.pvc-52287a6c-282e-4b8a-b4fd-f5ccfc2da896.mount.20376
stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326
at stunnel-config.fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326
fips = no
foreground = quiet
socket = l:SO_REUSEADDR=yes
socket = a:SO_BINDTODEVICE=lo
pid = /var/run/efs/fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326+/stunnel.pid
[efs]
client = yes
accept = 127.0.0.1:20326
connect = fs-0e870fb95f8bf7867.efs.eu-west-1.amazonaws.com:2049
sslVersion = TLSv1.2
renegotiation = no
TIMEOUTbusy = 20
TIMEOUTclose = 0
TIMEOUTidle = 70
delay = yes
verify = 2
CAfile = /etc/amazon/efs/efs-utils.crt
cert = /var/run/efs/fs-0e870fb95f8bf7867.var.lib.kubelet.pods.e80278a4-42b5-4a14-b6d7-ae518f92f2f7.volumes.kubernetes.io~csi.pvc-20851503-88f9-424d-acd4-320a56691154.mount.20326+/certificate.pem
key = /etc/amazon/efs/privateKey.pem
checkHost = <my efs id>

Any help will be highly appreciated.
Thanks

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 14, 2024
@gkalwig
Copy link

gkalwig commented Jan 15, 2024

After updating the Kubernetes cluster from 1.26, on one of the nodes I also noticed this error

Environment
node 5.10.205-195.804.amzn2.x86_64
Kubernetes version (use kubectl version): 1.28.5
Driver version: chart version: aws-efs-csi-driver-2.5.3 App version: v1.7.3

Interesting is:
State: Running
Started: Mon, 15 Jan 2024 11:02:56 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 15 Jan 2024 10:56:40 +0100
Finished: Mon, 15 Jan 2024 10:57:52 +0100

I see that the logs were not included in the bug report, so I am posting the logs of the current container

gkalwig@l-gkalwig:~/$ kubectl logs -n kube-system efs-csi-node-t2djp
Defaulted container "efs-plugin" out of: efs-plugin, csi-driver-registrar, liveness-probe
I0115 10:02:56.193469       1 config_dir.go:88] Creating symlink from '/etc/amazon/efs' to '/var/amazon/efs'
I0115 10:02:56.193807       1 metadata.go:65] getting MetadataService...
I0115 10:02:56.195579       1 metadata.go:70] retrieving metadata from EC2 metadata service
I0115 10:02:56.198052       1 driver.go:150] Did not find any input tags.
I0115 10:02:56.198200       1 driver.go:116] Registering Node Server
I0115 10:02:56.198210       1 driver.go:118] Registering Controller Server
I0115 10:02:56.198220       1 driver.go:121] Starting efs-utils watchdog
I0115 10:02:56.198272       1 efs_watch_dog.go:221] Skip copying /etc/amazon/efs/efs-utils.conf since it exists already
I0115 10:02:56.198281       1 efs_watch_dog.go:221] Skip copying /etc/amazon/efs/efs-utils.crt since it exists already
I0115 10:02:56.198475       1 driver.go:127] Starting reaper
I0115 10:02:56.210977       1 driver.go:137] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0115 10:03:58.192602       1 mount_linux.go:244] Detected OS without systemd
I0115 10:03:58.192606       1 mount_linux.go:244] Detected OS without systemd
I0115 10:03:58.192606       1 mount_linux.go:244] Detected OS without systemd
I0115 10:03:58.192762       1 mount_linux.go:244] Detected OS without systemd
I0115 10:03:58.290375       1 mount_linux.go:244] Detected OS without systemd

@gkalwig
Copy link

gkalwig commented Jan 28, 2024

As mentioned in my previous comment, we've observed that within our cluster, the EFS driver consumes more memory when attempting to connect EFS storage to pods on newer Kubernetes versions. This increase in memory usage leads to Out Of Memory (OOM) errors. Following these OOM errors, new created EFS driver container fails to perform correctly, returning an error message stating "Detected OS without systemd."

Our current workaround involves setting a higher memory limit for the EFS driver container. This adjustment allows it to allocate more memory during node startup and successfully bind EFS storage to all pods. It's seems to work.

@mohammadasim
Copy link
Author

In our case we increased the memory limit from 128 to 256 and it stabilised the pods.

@mohammadasim
Copy link
Author

I am closing this issue, as our problem is resolved. Please open the issue if your problem persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants