-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod creation problems due to slow mounting/attaching #79
Comments
Analysis so far => The problem seems to be related to k8s, ZFS-LocalPV seems to be fine. K8s services are taking too long to create the volumeattachment object, which will attach the volume to the node. the pod will be stuck in ContainerCreating state, until volumeattachment object has been created. Once it is created the pod will come into running state. |
following this kubernetes/kubernetes#84169 (comment) setting attach-detach-reconcile-sync-period to 300s improves the situation. Perhaps LIST_VOLUMES_PUBLISHED_NODES should be implemented? |
Discussed here :- kubernetes/kubernetes#84169 |
I reproduced this issue with 200s of volumes. i saw that when this no increases Volume attachment is getting very slow. I've some stats regarding this timings.
To overcome this issue we are avoiding the creation of volumeattachment object as it is not required for ZFSPV as of now. Refer the PR: #85 To validate this PR i reproduced the scenario with 200 volumes and zfs-driver:v0.4. Then we upgrade the driver to 0.6.0 with the changes done in the above mentioned PR where we are removing the csi-attacher container. And also as a cleanup part this upgrade deleting the volumeattachments also After upgrade i tried 2 scenario...one where i cloned the 200 volumes for which i already took the snapshot before upgrading and second one where i provisioned new 200 volumes. I observed a very significant amount of decrease in time for pods coming into the Running state. For new provisioning of volumes i have some stats regarding timings.
|
So this PR #85 now been tested on three k8s versions (1.16 , 1.17 and 1.18) which resolves this issue #79. |
fixed to issue by avoiding the volumeattachment object (#85). Now we can see volumes are getting attached very fast. |
We have 28 total nodes (3 masters, 25 worker)
Each node has 48 cores, 252GB ram
~228 zfs volumes (305 including old path volumes)
~212 volumeattachments
The text was updated successfully, but these errors were encountered: