Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from Chart 2.4.4 #1372

Open
m-parrella opened this issue Jun 10, 2024 · 0 comments
Open

Upgrade from Chart 2.4.4 #1372

m-parrella opened this issue Jun 10, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@m-parrella
Copy link

m-parrella commented Jun 10, 2024

/kind bug

What happened?

We recently upgraded our EKS cluster to 1.29. We are using Managed Nodes with amazon-eks-node-1.29-v20240227 AMI and we are using the EFS CSI Driver 1.5.6 deployed by Helm. Chart 2.4.4.

Following an upgrade of the driver from Chart 2.4.4 to Chart 2.4.5 (or higher), we encountered an issue where deployments using the EFS Storage Class ceased functioning correctly. Both Pods and Nodes failed to respond to the 'df' command. In examining /var/log/messages on the node, we found the following error message:

Jun 10 15:07:44 ip-XXX-XXX-XXX-XXX kernel: nfs: server 127.0.0.1 not responding, still trying

If we move the Pods mounting EFS volumenes to a new node, the Pod runs as expected.

Upon comparing both charts, the significant alteration lies in the EFS State Directory as outlined in the CHANGELOG. This leads us to suspect that stunnel may not be capable of resuming connections post-upgrade.

{
  "hostPath": {
    "path": "/var/run/efs",
    "type": "DirectoryOrCreate"
  },
  "name": "efs-state-dir"
}

To avoid refreshing the nodes, we have identified two workarounds. The first approach involves patching the DaemonSet to utilize the original path. This can be achieved by executing the following command:

kubectl patch daemonsets -n kube-system efs-csi-node --type json -p='[{"op": "replace", "path": "/spec/template/spec/volumes/3/hostPath/path", "value": "/var/run/efs-csi-driver"}]'

The second approach it to create a symbolic link prior the upgrade:

[root@ip-XXX-XXX-XXX-XXX /]# ln -s /var/run/efs-csi-driver /var/run/efs
[root@ip-XXX-XXX-XXX-XXX /]# ls -ld /var/run/efs /var/run/efs-csi-driver
lrwxrwxrwx 1 root root  23 Jun 10 18:15 /var/run/efs -> /var/run/efs-csi-driver
drwxr-xr-x 4 root root 160 Jun 10 18:21 /var/run/efs-csi-driver

Is this the expected behavior? Thanks in advance!

What you expected to happen?

Containers volumes should remain operational after the upgrade.

How to reproduce it (as minimally and precisely as possible)?

Upgrade from Chart 2.4.4 to Chart 2.4.5 or higher using Helmfile.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 10, 2024
@m-parrella m-parrella changed the title Upgrade from Chart 2.4.4 Hungs. Upgrade from Chart 2.4.4 Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants