-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver fails to release ports on unmount #281
Comments
We are experiencing the same issue in an environment where a lot of pod autoscaling is happening. From our experience it happens every 7 to 10 days, the quick fix here is to replace all nodes but in a production environment this is not a behavior we want to have. Right after this issue we have opened up a support case but they requested us to update this issue to start with. This is the part of the error we are seeing
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
facing a similar issue. the pattern i've identified so far is that when the node reaches a high cpu usage, the efs-csi-driver crashes. if this happens 3+ times the node can no longer mount any efs pv's. |
/remove-lifecycle rotten |
It is fixed by this PR |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
the issue appeared again "Failed to locate an available port in the range [20049, 20449], try specifying a different port range in /etc/amazon/efs/efs-utils.conf". kubernets: 1.23 |
I saw the issue today: It should happen any time we have more than 401 pods running on the same node and trying all to mount an EFS volume. |
this is happening for us too, is there a workaround for this in AWS EKS?
|
Encountered on v2.0.1 (released ~April 2024), so might still be a thing. It seems there are related issues https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues?q=is%3Aissue+ports+is%3Aclosed so I'll try updating to the latest version v2.0.7 (as of ~Aug, 2024) |
Glad it's not just me @neoakris! |
/kind bug
What happened?
We noticed that pods were unable to mount EFS PVCs, and got stuck in ContainerCreating. The logs showed
Output: Failed to locate an available port in the range [20049, 20449], try specifying a different port range in /etc/amazon/efs/efs-utils.conf
Logged into the node and found with netstat that all 400 ports were populated by stunnel processes.
The watchdog logs shows that it fails to kill processes on unmount. These log lines repeat for several pids.
2020-11-12 16:12:16,530 - INFO - Unmount grace period expired for fs-6a3285fb.var.lib.kubelet.pods.c3015949-346d-42cf-9594-3be561ca30c8.volumes.kubernetes.io~csi.pvc-7ef93798-9182-469f-b35a-72cd13ecfcac.mount.20402 2020-11-12 16:12:16,530 - INFO - Terminating running TLS tunnel - PID: 2773, group ID: 2773 2020-11-12 16:12:16,530 - INFO - TLS tunnel: 2773 is still running, will retry termination
What you expected to happen?
Ports are freed upon unmount and pods on all nodes are able to mount EFS PVCs.
How to reproduce it (as minimally and precisely as possible)?
Not sure how to reproduce as it seems random.
Anything else we need to know?:
We have seen this issue several times, however it seems random when a node fails to release the ports. We experience this in our medium sized cluster maybe once a week. Other nodes are working just fine when this happens. A quick fix is to replace the bad node.
Environment
kubectl version
): v1.17.13The text was updated successfully, but these errors were encountered: