Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: affinity priority #1548

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

helen-frank
Copy link
Contributor

Fixes #1418

Description
Priority scheduling of pods with anti-affinity or topologySpreadConstraints
How was this change tested?
I have 10 pending pods:

pod1: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod10 and pod9.
pod2 ~ pod8: 1c1g requests; no anti-affinity is configured.
pod9: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod10.
pod10: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod9.

I want the resources of the three nodes to be evenly distributed, like:

node1: c7a.4xlarge, 8c16g (4Pod)
node2: c7a.xlarge, 4c8g (3Pod)
node3: c7a.xlarge, 4c8g (3Pod)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: helen-frank
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 11, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 11, 2024
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2024
@helen-frank helen-frank changed the title [WIP] fix: affinity priority fix: affinity priority Aug 30, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2024
@helen-frank
Copy link
Contributor Author

Current Test Results:

❯ kubectl get nodeclaims
NAME            TYPE               CAPACITY   ZONE          NODE                             READY   AGE
default-8wq87   c-8x-amd64-linux   spot       test-zone-d   blissful-goldwasser-3014441860   True    67s
default-chvld   c-4x-amd64-linux   spot       test-zone-b   exciting-wescoff-4170611030      True    67s
default-kbr7n   c-2x-amd64-linux   spot       test-zone-d   vibrant-aryabhata-969189106      True    67s
❯ kubectl get pod -owide
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE                             NOMINATED NODE   READINESS GATES
nginx1-67877d4f4d-nbmj7    1/1     Running   0          77s   10.244.1.0   vibrant-aryabhata-969189106      <none>           <none>
nginx10-6685645984-sjftg   1/1     Running   0          76s   10.244.2.2   exciting-wescoff-4170611030      <none>           <none>
nginx2-5f45bfcb5b-flrlw    1/1     Running   0          77s   10.244.2.0   exciting-wescoff-4170611030      <none>           <none>
nginx3-6b5495bfff-xt7d9    1/1     Running   0          77s   10.244.2.1   exciting-wescoff-4170611030      <none>           <none>
nginx4-7bdd687bb6-nzc8f    1/1     Running   0          77s   10.244.3.5   blissful-goldwasser-3014441860   <none>           <none>
nginx5-6b5d886fc7-6m57l    1/1     Running   0          77s   10.244.3.0   blissful-goldwasser-3014441860   <none>           <none>
nginx6-bd5d6b9fb-x6lkq     1/1     Running   0          77s   10.244.3.2   blissful-goldwasser-3014441860   <none>           <none>
nginx7-5559545b9f-xs5sm    1/1     Running   0          77s   10.244.3.4   blissful-goldwasser-3014441860   <none>           <none>
nginx8-66bb679c4-zndwz     1/1     Running   0          76s   10.244.3.1   blissful-goldwasser-3014441860   <none>           <none>
nginx9-6c47b869dd-nfds6    1/1     Running   0          76s   10.244.3.3   blissful-goldwasser-3014441860   <none>           <none>

Signed-off-by: helen <helenfrank@protonmail.com>
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 10632308037

Details

  • 21 of 31 (67.74%) changed or added relevant lines in 2 files are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.1%) to 80.656%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/scheduling/queue.go 4 6 66.67%
pkg/utils/pod/scheduling.go 17 25 68.0%
Files with Coverage Reduction New Missed Lines %
pkg/scheduling/requirements.go 2 98.01%
pkg/controllers/disruption/consolidation.go 4 87.25%
Totals Coverage Status
Change from base Build 10622339666: -0.1%
Covered Lines: 8406
Relevant Lines: 10422

💛 - Coveralls

Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily as clear-cut of a change to me. Is there data that you've generated to give you confidence that this doesn't have any adverse affects?

@@ -96,6 +97,15 @@ func byCPUAndMemoryDescending(pods []*v1.Pod) func(i int, j int) bool {
return true
}

// anti-affinity pods should be sorted before normal pods
if affinityCmp := pod.PodAffinityCmp(lhsPod, rhsPod); affinityCmp != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the right move, but I'm not sure how this breaks down in our bin-packing algorithm. From what I understand, this just sorts pods with affinity + tsc before others with the same exact pod requests.

return 0
}

// PodAntiAffinityCmp compares two pods based on their anti-affinity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PodAntiAffinityCmp compares two pods based on their anti-affinity
// PodAntiAffinityCmp compares two pods based on their the size of their anti-affinity constraints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

node selection: One supper large node with many small size nodes
4 participants