Kueue is a set of APIs and controller for job queueing. It is a job-level manager that decides when a job should be admitted to start (as in pods can be created) and when it should stop (as in active pods should be deleted).
Read the overview to learn more.
- Job management: Support job queueing based on priorities with different strategies:
StrictFIFO
andBestEffortFIFO
. - Resource management: Support resource fair sharing and preemption with a variety of policies between different tenants.
- Dynamic resource reclaim: A mechanism to release quota as the pods of a Job complete.
- Resource flavor fungibility: Quota borrowing or preemption in ClusterQueue and Cohort.
- Integrations: Built-in support for popular jobs, e.g. BatchJob, Kubeflow training jobs, RayJob, RayCluster, JobSet, plain Pod.
- System insight: Build-in prometheus metrics to help monitor the state of the system, as well as Conditions.
- AdmissionChecks: A mechanism for internal or external components to influence whether a workload can be admitted.
- Advanced autoscaling support: Integration with cluster-autoscaler's provisioningRequest via admissionChecks.
- All-or-nothing with ready Pods: A timeout-based implementation of All-or-nothing scheduling.
- Partial admission: Allows jobs to run with a smaller parallelism, based on available quota, if the application supports it.
-
✔️ API version: v1beta1, respecting Kubernetes Deprecation Policy
-
✔️ Up-to-date documentation.
-
✔️ Test Coverage:
-
✔️ Scalability verification via performance tests.
-
✔️ Monitoring via metrics.
-
✔️ Security: RBAC based accessibility.
-
✔️ Stable release cycle(2-3 months) for new features, bugfixes, cleanups.
-
✔️ Adopters running on production.
Based on community feedback, we continue to simplify and evolve the API to address new use cases.
Requires Kubernetes 1.25 or newer.
To install the latest release of Kueue in your cluster, run the following command:
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.9.1/manifests.yaml
The controller runs in the kueue-system
namespace.
Read the installation guide to learn more.
A minimal configuration can be set by running the examples:
kubectl apply -f examples/admin/single-clusterqueue-setup.yaml
Then you can run a job with:
kubectl create -f examples/jobs/sample-job.yaml
Learn more about:
Learn more about the architecture of Kueue with the following design docs:
- bit.ly/kueue-apis discusses the API proposal and a high level description of how Kueue operates. Join the mailing list to get document access.
- bit.ly/kueue-controller-design presents the detailed design of the controller.
This is a high-level overview of the main priorities for 2023, in expected order of release:
- Cooperative preemption support for workloads that implement checkpointing #477
- Flavor assignment strategies, e.g. minimizing cost vs minimizing borrowing #312
- Integration with cluster-autoscaler for guaranteed resource provisioning
- Integration with common custom workloads #74:
- Kubeflow (TFJob, MPIJob, etc.)
- Spark
- Ray
- Workflows (Tekton, Argo, etc.)
These are features that we aim to have in the long-term, in no particular order:
- Budget support #28
- Dashboard for management and monitoring for administrators
- Multi-cluster support
Learn how to engage with the Kubernetes community on the community page and the contributor's guide.
You can reach the maintainers of this project at:
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.