Before v1.2 release, tensorflow-operator can only support TFJob on Kubernetes. Starting from v1.3, Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/MXNet/XGBoost jobs on Kubernetes.
- For a complete reference of the custom resource definitions, please refer to the API Definition.
- For details on API design, please refer to the v1alpha2 design doc.
- For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator
- For details on its obersibility, please refer to the monitoring design doc.
- Version >= 1.16 of Kubernetes
- Version >= 3.x of Kustomize
- Version >= 1.21.x of Kubectl
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=master"
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0"
For users who prefer to use original tensorflow controllers, please checkout v1.2-branch, we will maintain the bug fix in this branch.
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.2.0"
Please refer to the quick-start-v1.md and Kubeflow Training User Guide for more information.
Please refer to API Documentation.
- Tensorflow API Documentation
- PyTorch API Documentation
- MXNet API Documentation
- XGBoost API Documentation
You can:
- Join our Slack channel.
- Check out who is using this operator.
This is a part of Kubeflow, so please see readme in kubeflow/kubeflow to get in touch with the community.
Please refer to the DEVELOPMENT
Please refer to CHANGELOG
The following table lists the most recent few versions of the operator.
Operator Version | API Version | Kubernetes Version |
---|---|---|
v1.0.x |
v1 |
1.16+ |
v1.1.x |
v1 |
1.16+ |
v1.2.x |
v1 |
1.16+ |
v1.3.x |
v1 |
1.18+ |
latest (master HEAD) |
v1 |
1.18+ |