-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-namespace preemption keeps repeating but low priority task keeps getting allocated before the high priority task #1855
Comments
/cc @huone1 |
I think the analysis is reasonable. The current logic in preempt action is not so reasonable for it just find candidate tasks which are lower priority than the preemptor and then preemt their resources. In fact, it should preempt the candidate tasks which are at the lowest priority. Or it may also bring the ping-pang problem. Can you help reconstruct the logic? |
@Thor-wl Hi! Could you clarify whether NamespaceOrder or PriorityClass takes precedence? For example, given an empty cluster that only has sufficient resources for a single job, namespace The current If our conclusion is that PriorityClass takes precedence, and high priority jobs should always run in front of lower priority jobs regardless of namespace, then we need to rework the allocate action and the fairsharing mechanism completely, while the issue with Otherwise, if NamespaceOrder takes precedence, then we should rework |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗 |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗 |
What happened:
A lower priority task gets preempted by a higher priority task of a different namespace, but keeps getting reallocated. It then gets preempted again, and the cycle repeats.
What you expected to happen:
Higher priority task should get allocated, and lower priority task should be pending.
How to reproduce it (as minimally and precisely as possible):
I adapted the simplified setup from this other issue.
volcano-scheduler.conf
Create a queue, two priority classes and two namespaces.
Start a low priority job using up all the cpu.
Start a high priority job.
Observe that the low priority pod will get terminated, then another pod will get started to take its place. The new low-priority pod will start, then gets terminated again, and the process repeats.
Anything else we need to know?:
This is my current understanding of the issue. In the preempt phase, we find lower priority tasks to preempt, and if we succeed, we pipeline the higher priority task. However, this pipeline status does not get carried over to the next iteration. When we open a new session, the preempted job has started a new pending pod to take its place. If we encounter the lower priority task first, and the node has sufficient resources, then we'll allocate it, which brings us back to our original state.
Note that this issue occurs only when the high-pri job's namespace is after the low-pri job's namespace. If we swap their namespaces i.e. start the low-pri job in namespace b, then start the high-pri job in namespace a, this issue does not occur. The preemption of the low-pri job and allocation of the high-pri job will succeed.
Environment:
kubectl version
): Client 1.19, Server 1.21uname -a
):The text was updated successfully, but these errors were encountered: