Support inter-Pod affinity to one or more Pods #68701

bsalamat · 2018-09-15T00:50:53Z

In the current implementation of inter-Pod affinity, the scheduler looks for a single existing pod that can satisfy all the terms of inter-pod affinity of an incoming pod.
With the recent changes made to the implementation of inter-Pod affinity, we can now support multiple pods satisfying inter-pod affinity. One of the main reasons we didn't pursue the idea before was the fact that the inter-pod affinity feature was very slow (3 orders of magnitude slower than other scheduler predicates). We didn't want to add more complication to an already slow predicate. However, we can now think about adding the feature.
With this feature, a pod can have multiple affinity terms satisfied by a group of pods, as opposed to only a single pod. For example:

- Assume that the cluster has two nodes: 
- nodeA located in in zone1/region1
- nodeB located in zone2/region1

- There are two existing pods on these nodes:
- Pod1:
    - nodeName: "nodeA"
    - label "foo":""

- Pod2:
    - nodeName: "nodeB"
    - label "bar":""

- Pod3 comes in with inter-pod affinity:
    - affinity terms:
        - {label "foo" exists, topologyKey: "region"}
        - {label "bar" exists, topologyKey: "zone"}

With our current (K8s 1.12) implementation, Pod3 is not schedulable, because there is no single pod that satisfies all of its affinity terms. However, if we support multiple pods satisfying the affinity terms, Pod3 can be scheduled on nodeB. Pod1 satisfies the first term of its affinity in region1 and Pod2 satisfies its second term in zone2. So, any node in zone2/region1 will be feasible for Pod3.

Given the current implementation of inter-pod affinity and the use of "Topology Pair Maps", I believe implementing this feature requires little changes and won't have noticeable performance impact.

/kind feature
/sig scheduling

cc/ @Huang-Wei @ahmad-diaa

The text was updated successfully, but these errors were encountered:

bsalamat · 2018-09-15T00:55:14Z

It is worth noting that this will apply only to inter-pod affinity, not inter-pod anti-affinity. Inter-pod anti-affinity is considered "violated" if there is a pod that matches ANY terms of the anti-affinity. So, matching against a group of Pods does not make sense for anti-affinity.

Huang-Wei · 2018-09-15T01:13:28Z

@bsalamat I have a full picture on affinity/anti-affinity code after delivering #68173. I'm more than happy to help on this :)

/assign

fejta-bot · 2018-12-14T01:58:04Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

misterikkit · 2018-12-20T23:24:06Z

This question was probably answered elsewhere, but could the change in behavior disrupt existing clusters?

e.g. a canary workload is launched with pod-affinity for labels {app="foo", env="canary"}. That workload could end up in a topology containing {app="foo", env="prod"} & {app="bar", env="canary"} after this change.

fejta-bot · 2019-01-20T00:24:42Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Huang-Wei · 2019-01-20T06:31:08Z

/remove-lifecycle rotten

Huang-Wei · 2019-01-20T06:38:13Z

Re @misterikkit:

a canary workload is launched with pod-affinity for labels {app="foo", env="canary"}. That workload could end up in a topology containing {app="foo", env="prod"} & {app="bar", env="canary"} after this change.

If the goal is to gather pods with labels {app="foo", env="canary"}, the workload should have it defined within the same affinityTerm, in different expressions.

And if app="foo" and env="canary" are defined in two different affinityTerms, then yes, after the change, a topology containing {app="foo", env="prod"} & {app="bar", env="canary"} can be a fit.

fejta-bot · 2019-04-20T07:03:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Huang-Wei · 2019-04-21T00:28:11Z

/remove-lifecycle stale

fejta-bot · 2019-07-20T01:31:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Huang-Wei · 2019-07-20T04:49:12Z

/remove-lifecycle stale

fejta-bot · 2019-10-18T05:02:25Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-11-17T05:45:22Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Huang-Wei · 2022-10-12T22:27:08Z

I'd suggest putting it on our plate only when there is a couple of viable practical use cases.

sanposhiho · 2022-10-13T15:36:39Z

Quoting from the request:

Suppose we got an incoming pod with 2 pod affinity requirements. One is region and the other is zone. E.g the first one is for affinity to some light-weight RPC service it depends on, while the second for affinity to some heavy-weight storage access.

It makes sense. To be more general, when people want to make sure dependent services exist in the same domain and the pod has more than one kind of dependent services (like the above example), then they cannot use today’s PodAffinity to achieve that.

alculquicondor · 2022-10-13T15:48:11Z

I know it makes sense, but there doesn't seem to be enough pressure to get this done. This issue has been opened since 2018, and there aren't more people asking for it. Maybe you can collect some feedback sending an email to the mailing list?

sanposhiho · 2022-10-13T15:55:28Z

👍 In sig-scheduling mailing list? or is there another suitable one?

Huang-Wei · 2022-10-13T17:19:21Z

yes, and discuss.k8s.io

k8s-triage-robot · 2023-01-11T17:21:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-02-10T18:02:39Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-03-12T18:39:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-03-12T18:39:34Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sanposhiho · 2023-04-18T22:52:04Z

/remove-lifecycle rotten
/reopen

k8s-ci-robot · 2023-04-18T22:52:09Z

@sanposhiho: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fentas · 2023-04-19T06:57:35Z

Curious shouldn't this be possible by reading the design proposal?
I assumed (interpreted it) that each entry in requiredDuringSchedulingIgnoredDuringExecution looks up a set of pods and these pods reduce the set of nodes. Kind of confused why this is even an array then, as either matchExpressions or matchLabels can match multiple things. Why would I define a new entry in requiredD.. when I can add an expression in matchExpressions itself to match the same pod?
Also, it is nowhere written that you can't target multiple pods (or I over-/missread).

But it seems not, so my use case is related to:

We're running cilium in chaining with aks CSI. But since mid-last year removal of taints is only possible via Azure API breaking the cilium node readiness.
Right now, we're unable to run AKS in BYOCNI mode, unless it leaves its preview phase with Cilium.

Following this comment from Microsoft we followed the advice and created a mutation webhook (instead of calling the API) adding a podAffinity so that pods only get scheduled on nodes where cilium agent is already running.
But this breaks now other podAffinities.

alculquicondor · 2023-04-20T15:20:16Z

I can't speak for the original intent of the design, as it predates my time here, but the reality is that it's not implemented like that. Then we can't change the behavior as it would be backwards-incompatible.

So two things:

Docs need to be updated to reflect the current implementation. Feel free to open one issue or I'll do so when I have a chance.
Is there still justification for this feature? In your case, the cilium agent is probably a daemonset, so it already is designed to run in a set of nodes that have a label. Then the pods can use node affinity, instead of pod affinity. Also, node affinity is faster to calculate, in case that matters to you.

fentas · 2023-04-20T16:42:22Z

Thanks for the feedback.
node affinity won't work in this use case.

The idea is/was to make sure that cilium agent (daemonset) is scheduled first before any other pod is scheduled.
Normally this happens via taints and the agent removes the taint but Microsoft broke or rather disabled this functionality.

Our mutation webhook just merged this podAffinity to every pod created, but this breaks now any other podAffinity already set on the pod itself.

# resulting in this
  affinity:
    podAffinity:
      requiredduringschedulingignoredduringexecution:
      - labelSelector:
          matchLabels:
            example.com/name: myservice
        topologyKey: kubernetes.io/hostname
      - labelSelector:
          matchExpressions:
          - key: k8s-app
            operator: In
            values:
            - cilium
        namespaces:
        - kube-system
        topologyKey: kubernetes.io/hostname

kerthcet · 2023-04-24T09:05:52Z

Sorry, a bit confusion here, doesn't this works as expected? Unless the cilium pod launches, this new created pod will stuck in pending?

alculquicondor · 2023-04-24T19:47:42Z

@kerthcet the problem is that kube-scheduler looks for a single pod that satisfies all affinities.

Well, even if we add the feature that you request, it would only be available in 1.28 at the earliest (potentially 1.29, as the feature has to be disabled by default first). So I would suggest you open an issue against Azure support.

Other than that, you are welcome to work on this feature.

k8s-triage-robot · 2024-01-19T02:00:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-18T02:54:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-19T03:46:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-19T03:46:04Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Sep 15, 2018

k8s-ci-robot assigned Huang-Wei Sep 15, 2018

Huang-Wei mentioned this issue Sep 17, 2018

Inter-PodAffinity is calculated on multiple pods #68725

Closed

bsalamat mentioned this issue Oct 1, 2018

Add inter-pod affinity to more than one pod kubernetes/enhancements#623

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 20, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2019

Huang-Wei mentioned this issue Jul 20, 2019

Even Pods Spread - 2. Calculating Predicates Metadata #77760

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 17, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 10, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2023

k8s-ci-robot reopened this Apr 18, 2023

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 18, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support inter-Pod affinity to one or more Pods #68701

Support inter-Pod affinity to one or more Pods #68701

bsalamat commented Sep 15, 2018

bsalamat commented Sep 15, 2018 •

edited

Loading

Huang-Wei commented Sep 15, 2018

fejta-bot commented Dec 14, 2018

misterikkit commented Dec 20, 2018

fejta-bot commented Jan 20, 2019

Huang-Wei commented Jan 20, 2019

Huang-Wei commented Jan 20, 2019

fejta-bot commented Apr 20, 2019

Huang-Wei commented Apr 21, 2019

fejta-bot commented Jul 20, 2019

Huang-Wei commented Jul 20, 2019

fejta-bot commented Oct 18, 2019

fejta-bot commented Nov 17, 2019

Huang-Wei commented Oct 12, 2022

sanposhiho commented Oct 13, 2022

alculquicondor commented Oct 13, 2022

sanposhiho commented Oct 13, 2022

Huang-Wei commented Oct 13, 2022

k8s-triage-robot commented Jan 11, 2023

k8s-triage-robot commented Feb 10, 2023

k8s-triage-robot commented Mar 12, 2023

k8s-ci-robot commented Mar 12, 2023

sanposhiho commented Apr 18, 2023

k8s-ci-robot commented Apr 18, 2023

fentas commented Apr 19, 2023 •

edited

Loading

alculquicondor commented Apr 20, 2023

fentas commented Apr 20, 2023 •

edited

Loading

kerthcet commented Apr 24, 2023

alculquicondor commented Apr 24, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

Support inter-Pod affinity to one or more Pods #68701

Support inter-Pod affinity to one or more Pods #68701

Comments

bsalamat commented Sep 15, 2018

bsalamat commented Sep 15, 2018 • edited Loading

Huang-Wei commented Sep 15, 2018

fejta-bot commented Dec 14, 2018

misterikkit commented Dec 20, 2018

fejta-bot commented Jan 20, 2019

Huang-Wei commented Jan 20, 2019

Huang-Wei commented Jan 20, 2019

fejta-bot commented Apr 20, 2019

Huang-Wei commented Apr 21, 2019

fejta-bot commented Jul 20, 2019

Huang-Wei commented Jul 20, 2019

fejta-bot commented Oct 18, 2019

fejta-bot commented Nov 17, 2019

Huang-Wei commented Oct 12, 2022

sanposhiho commented Oct 13, 2022

alculquicondor commented Oct 13, 2022

sanposhiho commented Oct 13, 2022

Huang-Wei commented Oct 13, 2022

k8s-triage-robot commented Jan 11, 2023

k8s-triage-robot commented Feb 10, 2023

k8s-triage-robot commented Mar 12, 2023

k8s-ci-robot commented Mar 12, 2023

sanposhiho commented Apr 18, 2023

k8s-ci-robot commented Apr 18, 2023

fentas commented Apr 19, 2023 • edited Loading

alculquicondor commented Apr 20, 2023

fentas commented Apr 20, 2023 • edited Loading

kerthcet commented Apr 24, 2023

alculquicondor commented Apr 24, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

bsalamat commented Sep 15, 2018 •

edited

Loading

fentas commented Apr 19, 2023 •

edited

Loading

fentas commented Apr 20, 2023 •

edited

Loading