Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coscheduling queue sort plugin starves pods #110

Closed
mateuszlitwin opened this issue Nov 21, 2020 · 49 comments
Closed

coscheduling queue sort plugin starves pods #110

mateuszlitwin opened this issue Nov 21, 2020 · 49 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@mateuszlitwin
Copy link
Contributor

Currently coscheduling plugin is using InitialAttemptTimestamp to compare pods of the same priority. If there are enough pods with early InitialAttemptTimestamp which cannot be scheduled then pods with later InitialAttemptTimestamp will get starved - scheduler will never attempt to schedule them. This is because scheduler will re-queue "early" pods before "later" pods are attempted. Normal scheduler is using time when pod was inserted into the queue, so this situation cannot occur.

@Huang-Wei
Copy link
Contributor

This sounds a reasonable optimization. @denkensk @cwdsuzhou thoughts?

@denkensk
Copy link
Member

denkensk commented Dec 1, 2020

@mateuszlitwin @Huang-Wei
We may talk about it In the beginning. kubernetes/enhancements#1463 (comment)
If we use the LastFailureTimestamp as the Normal scheduler. It will lead to undefined behavior in the heap.

@Huang-Wei
Copy link
Contributor

Ah true, I fail to notice that point.

This is because scheduler will re-queue "early" pods before "later" pods are attempted.

@mateuszlitwin The failed PodGroup with an earlier timestamp will go through an internal backoff period, so that latter PodGroup is actually able to get scheduled, isn't it? If not, are you able to compose a simple test case to simulate this starvation?

@cwdsuzhou
Copy link
Member

@mateuszlitwin @Huang-Wei
We may talk about it In the beginning. kubernetes/enhancements#1463 (comment)
If we use the LastFailureTimestamp as the Normal scheduler. It will lead to undefined behavior in the heap.

+1 for this

@mateuszlitwin
Copy link
Contributor Author

Might be hard to design a simple test.

Issue occurred multiple times in the production environment where we had 100s of pods pending and 1000s of nodes to check. I observed that newer, recently created, pods were not attempted by the scheduler (based on lack of scheduling events and relevant logs), however older pods were attempted on a regular basis (but could not be scheduled because their scheduling constraints), at least once every sync. The issue went away when I disabled the coscheduling queue sort.

Maybe a test like this would reproduce the issue:

  • create say 500 pods that are unschedulable
  • then create a single pod that could be scheduled (it's timestamp in the queue will be greater than other 500 pods)
  • generate some fake events in the cluster to move pods from backoff/unschedulable queues back to the active queue

I am not familiar with all the details how queuing works in the scheduler, but AFAIK certain events can put all pending pods back to the active queue, which could lead to the starvation I described where old unschedulable pods always go to the front of the active queue and starve pods which were in the queue for long time. Isn't the periodic flush/sync such event for example?


Coscheduling plugin queue sort is not compatible with the default sort. That is problematic especially because all scheduler profiles need to use the same sorting plugin, that is all profiles (e.g. default profile) are in fact forced to use co-scheduling sorting if co-scheduling is enabled.

Maybe with more customization for the queue plugin we could improve it?

@Huang-Wei
Copy link
Contributor

however older pods were attempted on a regular basis (but could not be scheduled because their scheduling constraints), at least once every sync.

ok, it sounds like a head of line blocking problem. Have you tried to increase the backoff and flush settings to mitigate the symptom? (I know it's just a mitigation :))

Coscheduling plugin queue sort is not compatible with the default sort. That is problematic especially because all scheduler profiles need to use the same sorting plugin, that is all profiles (e.g. default profile) are in fact forced to use co-scheduling sorting if co-scheduling is enabled.

Totally understood the pain point here.

The queue sort design of co-scheduling is that we want a group of Pods to be treated as a unit to achieve higher efficiency, which is essential in a highly-utilized cluster. While in vanilla default scheduler, it just schedules pod by pod, so every time Pod gets re-queued, it doesn't need to consider its "sibling" pods, so it's possible to renew its enqueue time as a new item, while co-scheduling cannot, which is the embarrassing part.

Maybe with more customization for the queue plugin we could improve it?

We have some discussions in the upstream as well as this repo. I'm not quite sure I have the bandwidth to drive this in the near future. It'd be very appreciable if anyone is interested to drive the design & implementation.

@cwdsuzhou
Copy link
Member

We have some discussions in the upstream as well as this repo. I'm not quite sure I have the bandwidth to drive this in the near future. It'd be very appreciable if anyone is interested to drive the design & implementation.

Actually, we have a similar feature request about exposing more funcs in frameWorkHandler to ensure the pods belongs sorting in ActiveQueue together.

@mateuszlitwin
Copy link
Contributor Author

@Huang-Wei do you have some links to the previous discussions?

@Huang-Wei
Copy link
Contributor

@mateuszlitwin The upstream is attempting (very likely I will drive this in 1.21) to provide some efficient queueing mechanics so that developers can control the pod enqueuing behavior in a fine-grained manner.

Here are some references:

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 12, 2021
@Huang-Wei
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2021
@Huang-Wei
Copy link
Contributor

/kind bug
/priority critical-urgent

I think it's still outstanding. I came across this when testing the v0.19.8 image. Here are the reproducing steps:

  • Prepare a PodGroup with minMember=3
  • Create a deployment with replicas=2
  • Wait for the two pods of the deployment to be pending
  • Scale up the deployment to be 3
  • It's not uncommon the 3 pods get into a starving state, and cannot be scheduled overtime.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 19, 2021
@denkensk
Copy link
Member

Thanks @Huang-Wei
I will test and reproduce the problem.

@Huang-Wei
Copy link
Contributor

/assign @denkensk

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 20, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cwdsuzhou
Copy link
Member

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Aug 20, 2021
@k8s-ci-robot
Copy link
Contributor

@cwdsuzhou: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2023
@mateuszlitwin
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 14, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023
@Huang-Wei
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023
@Huang-Wei
Copy link
Contributor

TL;DR for the latest status of this issue: it's a fairness issue due to missing the machinery to sort PodGroups - similar to PodInfo, we need to refresh a PodGroupInfo's queuing time so previously-failed PodGroup's sorting order can be adjusted.

#110 (comment)

@Huang-Wei Huang-Wei modified the milestones: v1.22, v1.26 Apr 14, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 13, 2023
@mateuszlitwin
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2023
@Huang-Wei
Copy link
Contributor

Huang-Wei commented Jul 25, 2023

#559 can mitigate this, but in theory HOL can still happen. Move it to the next release.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants