Upgrade from Chart 2.4.4 #1372

m-parrella · 2024-06-10T19:09:06Z

/kind bug

What happened?

We recently upgraded our EKS cluster to 1.29. We are using Managed Nodes with amazon-eks-node-1.29-v20240227 AMI and we are using the EFS CSI Driver 1.5.6 deployed by Helm. Chart 2.4.4.

Following an upgrade of the driver from Chart 2.4.4 to Chart 2.4.5 (or higher), we encountered an issue where deployments using the EFS Storage Class ceased functioning correctly. Both Pods and Nodes failed to respond to the 'df' command. In examining /var/log/messages on the node, we found the following error message:

Jun 10 15:07:44 ip-XXX-XXX-XXX-XXX kernel: nfs: server 127.0.0.1 not responding, still trying

If we move the Pods mounting EFS volumenes to a new node, the Pod runs as expected.

Upon comparing both charts, the significant alteration lies in the EFS State Directory as outlined in the CHANGELOG. This leads us to suspect that stunnel may not be capable of resuming connections post-upgrade.

{
  "hostPath": {
    "path": "/var/run/efs",
    "type": "DirectoryOrCreate"
  },
  "name": "efs-state-dir"
}

To avoid refreshing the nodes, we have identified two workarounds. The first approach involves patching the DaemonSet to utilize the original path. This can be achieved by executing the following command:

kubectl patch daemonsets -n kube-system efs-csi-node --type json -p='[{"op": "replace", "path": "/spec/template/spec/volumes/3/hostPath/path", "value": "/var/run/efs-csi-driver"}]'

The second approach it to create a symbolic link prior the upgrade:

[root@ip-XXX-XXX-XXX-XXX /]# ln -s /var/run/efs-csi-driver /var/run/efs
[root@ip-XXX-XXX-XXX-XXX /]# ls -ld /var/run/efs /var/run/efs-csi-driver
lrwxrwxrwx 1 root root  23 Jun 10 18:15 /var/run/efs -> /var/run/efs-csi-driver
drwxr-xr-x 4 root root 160 Jun 10 18:21 /var/run/efs-csi-driver

Is this the expected behavior? Thanks in advance!

What you expected to happen?

Containers volumes should remain operational after the upgrade.

How to reproduce it (as minimally and precisely as possible)?

Upgrade from Chart 2.4.4 to Chart 2.4.5 or higher using Helmfile.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 10, 2024

m-parrella changed the title ~~Upgrade from Chart 2.4.4 Hungs.~~ Upgrade from Chart 2.4.4 Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade from Chart 2.4.4 #1372

Upgrade from Chart 2.4.4 #1372

m-parrella commented Jun 10, 2024 •

edited

Loading

Upgrade from Chart 2.4.4 #1372

Upgrade from Chart 2.4.4 #1372

Comments

m-parrella commented Jun 10, 2024 • edited Loading

m-parrella commented Jun 10, 2024 •

edited

Loading