-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906
Comments
/sig aws |
PVC binding is currently done independently of pod scheduler, and is therefore not integrated with pod anti-affinity policies. I am working on making the PVC binding more integrated with pod scheduling, but it's a big design change and will take a few releases to be fully functional. /sig storage |
/assign |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
is there a work around until this is addressed? Currently what I am thinking of doing is attaching a label to each node based on AZ and then have a node selector policy on the Pod. But this seems like an awful work around to me since kubernetes is supposed to abstract out the underlying cloud provider. |
Actually, looking at the OP's example again, I'm not clear and why it failed. Each pod in the statefulset should have followed the zone where the PV was provisioned. The only times it would fail scheduling is if you run out of resources in the zones where the PV was provisioned. @toidiu are you seeing the same issue as the OP? Can you paste your example here? |
Oh nm I see the problem. The OP's nodes came up a-b-a-b-a, but the volumes were provisioned as b-a-b-a-b. The only workarounds I see are to:
@toidiu, I'm not sure I see how a node selector would help, unless you break up your StatefulSet to one per zone. |
@msau42 I was seeing problems when working with only a handful of nodes. Once I over-provisioned this seems to be less of a problem. I believe this is still an issues if not enough memory/cpu is available in the AZ the PVC is attached to. Kube will not automatically move pods to other node (nor should it) and therefore you might need to manually delete pods in a particular AZ. So the problem is that currently one still needs to think about what the underlying cloud-provider is doing, which Kube should be able to abstract. |
Exactly, it's mostly an issue when you are tight on resources. Unfortunately there are not really any good workarounds today. I'm hoping to get PV scheduling to beta in 1.10. But it will still be at least another release before dynamic provisioning support will be added. Then, we can migrate existing zonal PVs to use it and solve this problem. |
Just following up here, I'm having a similar problem that is not resource based, but still the same root problem, and may be a helpful consideration. I have a StatefulSet deploying across 3 AZs (using node affinity on instance group labels, and an anti-affinity to prevent colocating). The StatefulSet is limited to only those 3 AZs (a/b/c), despite the cluster operating across more (a/b/c/d/e). When deployed, I'm seeing the volume be created in zone e, which does not work with the hard node-affinity. |
@nrmitchi are you using a custom storage class to restrict your volumes to only those AZs? https://kubernetes.io/docs/concepts/storage/storage-classes/#aws |
I was not; I had looked at it but dismissed it because I thought it would only restrict to a single AZ (which appears to have been incorrect). Thanks! |
Design proposal for integrating dynamic provisioning into the scheduler is here: kubernetes/community#1857 |
Topology-aware dynamic provisioning for the gce-pd, aws-ebs and azure disks is available as beta in 1.12 /close |
@msau42: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Again I am facing the same issue in Kubernetes version 1.13 |
I am hitting the same issue, both in Kubernetes 1.13 and 1.14. |
For default storage class, volumeBindingMode will be set to Immediate where pvc will be created without the knowledge of pod. set volumeBindingMode: WaitForFirstConsumer in your storage class. it will work fine then |
@msau42 @vbasavani That seems to have fixed it. Thanks. |
/kind bug
What happened: I am trying to deploy a stateful set with PVCs to a set of 5 instances spread across 2 AZs. The stateful set has 5 pods, and I have set anti affinity so that there will be 1 pod per instance. The instances came up in AZs a-b-a-b-a. When I deploy the stateful set, their PVCs come up in AZs b-a-b-a-b. Thus the last pod in the set cannot be scheduled due to NoVolumeZoneConflict.
What you expected to happen: The PVCs should be assigned to the same AZ as the instance where the pod that requires it gets scheduled. Is there any way for me to make this happen?
How to reproduce it (as minimally and precisely as possible):
Deploy a stateful set of 5 replicas with PVCs and anti pod affinity to a cluster with 5 nodes on 2 AZs:
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: