PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906

d-shi · 2017-07-31T22:47:41Z

/kind bug

What happened: I am trying to deploy a stateful set with PVCs to a set of 5 instances spread across 2 AZs. The stateful set has 5 pods, and I have set anti affinity so that there will be 1 pod per instance. The instances came up in AZs a-b-a-b-a. When I deploy the stateful set, their PVCs come up in AZs b-a-b-a-b. Thus the last pod in the set cannot be scheduled due to NoVolumeZoneConflict.

What you expected to happen: The PVCs should be assigned to the same AZ as the instance where the pod that requires it gets scheduled. Is there any way for me to make this happen?

How to reproduce it (as minimally and precisely as possible):
Deploy a stateful set of 5 replicas with PVCs and anti pod affinity to a cluster with 5 nodes on 2 AZs:

spec:
  replicas: 5
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - foo
            topologyKey: kubernetes.io/hostname
  volumeClaimTemplates:
  - metadata:
      name: datadir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T22:55:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration**: AWS
OS (e.g. from /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Kernel (e.g. uname -a):

Linux 4.4.65-k8s x86_64 GNU/Linux

Install tools: custom
Others:

The text was updated successfully, but these errors were encountered:

d-shi · 2017-07-31T23:02:26Z

/sig aws

msau42 · 2017-08-02T22:56:49Z

PVC binding is currently done independently of pod scheduler, and is therefore not integrated with pod anti-affinity policies. I am working on making the PVC binding more integrated with pod scheduling, but it's a big design change and will take a few releases to be fully functional.

/sig storage

msau42 · 2017-08-02T22:58:41Z

/assign

fejta-bot · 2018-01-11T02:21:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

msau42 · 2018-01-11T02:31:44Z

/remove-lifecycle stale
/lifecycle frozen

toidiu · 2018-02-02T04:47:09Z

is there a work around until this is addressed?

Currently what I am thinking of doing is attaching a label to each node based on AZ and then have a node selector policy on the Pod. But this seems like an awful work around to me since kubernetes is supposed to abstract out the underlying cloud provider.

msau42 · 2018-02-02T18:34:15Z

Actually, looking at the OP's example again, I'm not clear and why it failed. Each pod in the statefulset should have followed the zone where the PV was provisioned. The only times it would fail scheduling is if you run out of resources in the zones where the PV was provisioned.

@toidiu are you seeing the same issue as the OP? Can you paste your example here?

msau42 · 2018-02-02T18:37:59Z

Oh nm I see the problem. The OP's nodes came up a-b-a-b-a, but the volumes were provisioned as b-a-b-a-b.

The only workarounds I see are to:

Overprovision your nodes + 1
Use even number of replicas
Don't require hard spreading across nodes. Generally, the scheduler already tries to prefer spreading across nodes even when you don't specify anti-affinity.

@toidiu, I'm not sure I see how a node selector would help, unless you break up your StatefulSet to one per zone.

toidiu · 2018-02-02T22:15:15Z

@msau42 I was seeing problems when working with only a handful of nodes. Once I over-provisioned this seems to be less of a problem.

I believe this is still an issues if not enough memory/cpu is available in the AZ the PVC is attached to. Kube will not automatically move pods to other node (nor should it) and therefore you might need to manually delete pods in a particular AZ.

So the problem is that currently one still needs to think about what the underlying cloud-provider is doing, which Kube should be able to abstract.

msau42 · 2018-02-03T00:36:25Z

Exactly, it's mostly an issue when you are tight on resources. Unfortunately there are not really any good workarounds today.

I'm hoping to get PV scheduling to beta in 1.10. But it will still be at least another release before dynamic provisioning support will be added. Then, we can migrate existing zonal PVs to use it and solve this problem.

Ref kubernetes/enhancements#490

nrmitchi · 2018-03-13T17:20:23Z

Just following up here, I'm having a similar problem that is not resource based, but still the same root problem, and may be a helpful consideration.

I have a StatefulSet deploying across 3 AZs (using node affinity on instance group labels, and an anti-affinity to prevent colocating). The StatefulSet is limited to only those 3 AZs (a/b/c), despite the cluster operating across more (a/b/c/d/e).

When deployed, I'm seeing the volume be created in zone e, which does not work with the hard node-affinity.

toidiu · 2018-03-13T17:38:32Z

@nrmitchi are you using a custom storage class to restrict your volumes to only those AZs? https://kubernetes.io/docs/concepts/storage/storage-classes/#aws

nrmitchi · 2018-03-13T17:49:57Z

I was not; I had looked at it but dismissed it because I thought it would only restrict to a single AZ (which appears to have been incorrect).

Thanks!

msau42 · 2018-03-27T21:36:03Z

Design proposal for integrating dynamic provisioning into the scheduler is here: kubernetes/community#1857

msau42 · 2018-09-21T21:42:36Z

Topology-aware dynamic provisioning for the gce-pd, aws-ebs and azure disks is available as beta in 1.12

/close

k8s-ci-robot · 2018-09-21T21:42:37Z

@msau42: Closing this issue.

In response to this:

Topology-aware dynamic provisioning for the gce-pd, aws-ebs and azure disks is available as beta in 1.12

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ghost · 2019-09-18T13:48:58Z

Again I am facing the same issue in Kubernetes version 1.13

Sewci0 · 2019-09-19T18:13:37Z

I am hitting the same issue, both in Kubernetes 1.13 and 1.14.

msau42 · 2019-09-19T18:20:56Z

Please try https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode

ghost · 2019-09-20T05:44:16Z

For default storage class, volumeBindingMode will be set to Immediate where pvc will be created without the knowledge of pod. set volumeBindingMode: WaitForFirstConsumer in your storage class. it will work fine then

Sewci0 · 2019-09-20T06:00:18Z

@msau42 @vbasavani That seems to have fixed it. Thanks.

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 31, 2017

k8s-ci-robot added the sig/aws label Jul 31, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 31, 2017

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 2, 2017

k8s-ci-robot assigned msau42 Aug 2, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2018

This was referenced Mar 27, 2018

Dynamic volume provisioning creates EBS volume in the wrong availability zone #39178

Closed

Pod scheduling not respecting PVC zone (NoVolumeZoneConflict) #54370

Closed

alena1108 mentioned this issue Apr 25, 2018

EBS - Volumes get created in a different zone ,when they get created using PVC created from EBS object store. rancher/rancher#12704

Closed

k8s-ci-robot closed this as completed Sep 21, 2018

javsalgar mentioned this issue Nov 15, 2018

Elasticsearch AWS EKS with multiple availability zones bitnami/charts#927

Closed

nward mentioned this issue Nov 11, 2022

Pod anti-affinity not respected when binding PVCs to PVs when etcd timeout #108560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906

PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906

d-shi commented Jul 31, 2017 •

edited

Loading

d-shi commented Jul 31, 2017

msau42 commented Aug 2, 2017

msau42 commented Aug 2, 2017

fejta-bot commented Jan 11, 2018

msau42 commented Jan 11, 2018

toidiu commented Feb 2, 2018

msau42 commented Feb 2, 2018

msau42 commented Feb 2, 2018

toidiu commented Feb 2, 2018

msau42 commented Feb 3, 2018

nrmitchi commented Mar 13, 2018

toidiu commented Mar 13, 2018

nrmitchi commented Mar 13, 2018

msau42 commented Mar 27, 2018

msau42 commented Sep 21, 2018

k8s-ci-robot commented Sep 21, 2018

ghost commented Sep 18, 2019

Sewci0 commented Sep 19, 2019

msau42 commented Sep 19, 2019

ghost commented Sep 20, 2019

Sewci0 commented Sep 20, 2019

PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906

PVCs do not get assigned to correct AZ as where the pod needs to be scheduled #49906

Comments

d-shi commented Jul 31, 2017 • edited Loading

d-shi commented Jul 31, 2017

msau42 commented Aug 2, 2017

msau42 commented Aug 2, 2017

fejta-bot commented Jan 11, 2018

msau42 commented Jan 11, 2018

toidiu commented Feb 2, 2018

msau42 commented Feb 2, 2018

msau42 commented Feb 2, 2018

toidiu commented Feb 2, 2018

msau42 commented Feb 3, 2018

nrmitchi commented Mar 13, 2018

toidiu commented Mar 13, 2018

nrmitchi commented Mar 13, 2018

msau42 commented Mar 27, 2018

msau42 commented Sep 21, 2018

k8s-ci-robot commented Sep 21, 2018

ghost commented Sep 18, 2019

Sewci0 commented Sep 19, 2019

msau42 commented Sep 19, 2019

ghost commented Sep 20, 2019

Sewci0 commented Sep 20, 2019

d-shi commented Jul 31, 2017 •

edited

Loading