Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including: machine learning/deep learning, bioinformatics/genomics and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, PyTorch, MPI, etc, which Volcano integrates with.
Volcano builds upon a decade and a half of experience running a wide variety of high performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open source community.
Until June 2021, Volcano has been widely used around the world at a variety of industries such as Internet/Cloud/Finance/ Manufacturing/Medical. More than 20 companies or institutions are not only end users but also active contributors. Hundreds of contributors are taking active part in the code commit/PR review/issue discussion/docs update and design provision. We are looking forward to your participation.
NOTE: the scheduler is built based on kube-batch; refer to #241 and #288 for more detail.
JobFlow is a workflow engine based on volcano Job. It proposes two concepts to automate running multiple batch jobs, named JobTemplate and JobFlow, so end users can easily declare their jobs and run them using complex control primitives such as sequential or parallel execution, if-then -else statement, switch-case statement, loop execution, etc.
Volcano is an incubating project of the Cloud Native Computing Foundation (CNCF). Please consider joining the CNCF if you are an organization that wants to take an active role in supporting the growth and evolution of the cloud native ecosystem.
- Intro: Kubernetes Batch Scheduling @ KubeCon 2019 EU
- Volcano 在 Kubernetes 中运行高性能作业实践 @ ArchSummit 2019
- Volcano:基于云原生的高密计算解决方案 @ Huawei Connection 2019
- Improving Performance of Deep Learning Workloads With Volcano @ KubeCon 2019 NA
- Batch Capability of Kubernetes Intro @ KubeCon 2019 NA
- Intro: Kubernetes Batch Scheduling @ KubeCon 2019 EU
- Kubernetes 1.12+ with CRD support
You can try Volcano by one of the following two ways.
Note:
- For Kubernetes v1.17+ use CRDs under config/crd/bases (recommended)
- For Kubernetes versions < v1.16 use CRDs under config/crd/v1beta1 (deprecated)
Install Volcano on an existing Kubernetes cluster. This way is both available for x86_64 and arm64 architecture.
For x86_64:
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
For arm64:
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development-arm64.yaml
Enjoy! Volcano will create the following resources in volcano-system
namespace.
NAME READY STATUS RESTARTS AGE
pod/volcano-admission-5bd5756f79-dnr4l 1/1 Running 0 96s
pod/volcano-admission-init-4hjpx 0/1 Completed 0 96s
pod/volcano-controllers-687948d9c8-nw4b4 1/1 Running 0 96s
pod/volcano-scheduler-94998fc64-4z8kh 1/1 Running 0 96s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/volcano-admission-service ClusterIP 10.98.152.108 <none> 443/TCP 96s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/volcano-admission 1/1 1 1 96s
deployment.apps/volcano-controllers 1/1 1 1 96s
deployment.apps/volcano-scheduler 1/1 1 1 96s
NAME DESIRED CURRENT READY AGE
replicaset.apps/volcano-admission-5bd5756f79 1 1 1 96s
replicaset.apps/volcano-controllers-687948d9c8 1 1 1 96s
replicaset.apps/volcano-scheduler-94998fc64 1 1 1 96s
NAME COMPLETIONS DURATION AGE
job.batch/volcano-admission-init 1/1 48s 96s
Disable JobFlow related components if necessary. You need to add --forbid-jobflow=true to the startup parameters of the controller and admission respectively. The default is false to enable the state.
If you don't have a kubernetes cluster, try one-click install from code base:
./hack/local-up-volcano.sh
This way is only available for x86_64 temporarily.
If you want to get prometheus and grafana volcano dashboard after volcano installed, try following commands:
make TAG=latest generate-yaml
kubectl create -f _output/release/volcano-monitoring-latest.yaml
Community weekly meeting for Asia: 15:00 - 16:00 (UTC+8) Friday. (Convert to your timezone.)
Community biweekly meeting for America: 08:30 - 09:30 (UTC-8) Thursday. (Convert to your timezone.)
Community meeting for Europe is ongoing on demand now. If you have some ideas or topics to discuss, please leave message in the slack. Maintainers will contact with you and book an open meeting for that.
Resources:
If you have any question, feel free to reach out to us in the following ways: