-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ResourceQuota when admitting workloads #696
Comments
I like this idea and I don't think we will face the problems you mention. |
Ah, I see. Thank you for clarifying. |
Uhm. In that case, we can not create ClusterQueues with multiple ResourceFlavors if batch admins select the optional mode of operation, right? |
Unless there are some annotations in the ResoureQuota, but that could be a later extension |
Sounds good. |
Could you explain this design in more detail? Sorry to bother you after such a long time. How to balance clusterqueue quota vs. namespace ResourceQuota is a bit complicated, especially when clusterqueue is a cluster-level resource. |
Would NS ResourceQuota be another admission check? |
Hi, @tenzen-y One of the import use case as below: In a namespace, there may be different types of workloads(I call it "hybrid-workload") consuming hardware accelerators:
So below situation happens:
The challenges are :
In short, I think it's valuable to address the challenges , so that user can leverage |
@kerthcet I think so too. Additional admission checks would resolve this issue. |
@panpan0000 I agree with issues by multi-dimensions quota management. However, I think that your use cases might be sufficient once you can create CluserQueue against each namespaces (tenants). Then, you can construct cohorts against multiple ClusterQueue.
Since we support naked pods, you can manage those workloads by kueue. Ideally, I would suggest implementing CutomJob controllers to manage those workloads, then you can implement the kueue workload controllers to manage those CustomJobs by kueue. |
I'm not against supporting ResourceQuota by admission checks, and I agree with the ResourceQuota support. I'm sure there are use cases in which we want to manage workloads by kueue + ResourceQuota, although, in an ideal world, all workloads are managed by the kueue's flavorQuotas. |
Also, we need to support elastic jobs to support model serving deployment with autoscaling semantics. |
What we hope to mitigate here is make job scheduling more smoothly, rather than admitted by kueue but rejected the resourceQuota admission plugin. But the tricky thing here is resourceQuota is too flexible, it has a bunch of policies and kueue can not simply sync with. So admission check might be the simplest thing we can boot with. |
I agree. An additional admission check would be better. In the adaption to the existing environments, I think that the ResourceQuota support has an advantage in terms of using all kueue features over using naked pod integration since our naked pod integration doesn't support all kueue features. However, I'm on the fence about whether we should support ResourceQuota by AdmissionCheck since using ResourceQuota is a temporary measure since as I mentioned above, ideally, all resources are managed by Kueue's flavorQuotas. @alculquicondor What do you think about supporting ResourceQuota by AdmissionCheck for easy adaptation to the existing environment? |
I don't think we should implement it. As explained, it will be possible for someone to implement it using AdmissionChecks, but I don't think we should support it in core Kueue. I would accept a controller in the cmd/experimental package though. |
The |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This would still be very helpful for multitenant scenarios, where there are multiple users of interactive computing on the same cluster and need access to individual queues. |
What would you like to be added:
Batch administrators can install ResourceQuota to set constraints for total requests in the namespace.
We should consider ResourceQuotas when admitting Workloads.
Why is this needed:
For now, we may sometime face the following problems:1. if we useSequential Admission with Ready Pods
and ResourceQuotas together, we may face deadlocks.2. Many unschedulable pods (with pending status) could be created.Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: