diff --git a/README.md b/README.md index a1494d001c7..14b609e489b 100644 --- a/README.md +++ b/README.md @@ -17,310 +17,214 @@ Katib supports Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the -users’ choice and natively supports many ML frameworks, such as TensorFlow, -MXNet, PyTorch, XGBoost, and others. +users’ choice and natively supports many ML frameworks, such as +[TensorFlow](https://www.tensorflow.org/), [Apache MXNet](https://mxnet.apache.org/), +[PyTorch](https://pytorch.org/), [XGBoost](https://xgboost.readthedocs.io/en/latest/), and others. -## Getting Started - -Follow the -[getting-started guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/) -on the Kubeflow website. - -## Name +Katib can perform training jobs using any Kubernetes +[Custom Resources](https://www.kubeflow.org/docs/components/katib/trial-template/) +with out of the box support for [Kubeflow Training Operators](https://github.com/kubeflow/tf-operator), +[Argo Workflows](https://github.com/argoproj/argo-workflows), [Tekton Pipelines](https://github.com/tektoncd/pipeline) +and many more. Katib stands for `secretary` in Arabic. -## Concepts in Katib - -For a detailed description of the concepts in Katib and AutoML, check the -[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/overview/). - -Katib has the concepts of `Experiment`, `Suggestion`, `Trial` and `Worker Job`. - -### Experiment - -An `Experiment` represents a single optimization run over a feasible space. -Each `Experiment` contains a configuration: - -1. **Objective**: What you want to optimize. -2. **Search Space**: Constraints for configurations describing the feasible space. -3. **Search Algorithm**: How to find the optimal configurations. - -Katib `Experiment` is defined as a CRD. Check the detailed guide to -[configuring and running a Katib `Experiment`](https://kubeflow.org/docs/components/katib/experiment/) -in the Kubeflow docs. - -### Suggestion - -A `Suggestion` is a set of hyperparameter values that the hyperparameter tuning -process has proposed. Katib creates a `Trial` to evaluate -the suggested set of values. - -Katib `Suggestion` is defined as a CRD. - -### Trial - -A `Trial` is one iteration of the hyperparameter tuning process. -A `Trial` corresponds to one worker job instance with a list of parameter -assignments. The list of parameter assignments corresponds to a `Suggestion`. - -Each `Experiment` runs several `Trials`. The `Experiment` runs the `Trials` until -it reaches either the objective or the configured maximum number of `Trials`. - -Katib `Trial` is defined as a CRD. - -### Worker Job - -The `Worker Job` is the process that runs to evaluate a `Trial` and calculate -its objective value. - -The `Worker Job` can be any type of Kubernetes resource or -[Kubernetes CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). -Follow the [`Trial` template guide](https://www.kubeflow.org/docs/components/katib/trial-template/#custom-resource) -to support your own Kubernetes resource in Katib. - -Katib has these CRD examples in upstream: - -- [Kubernetes `Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/) - -- [Kubeflow `TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/) - -- [Kubeflow `PyTorchJob`](https://www.kubeflow.org/docs/components/training/pytorch/) - -- [Kubeflow `MPIJob`](https://www.kubeflow.org/docs/components/training/mpi/) - -- [Kubeflow `XGBoostJob`](https://github.com/kubeflow/xgboost-operator) - -- [Tekton `Pipelines`](./examples/v1beta1/tekton) - -- [Argo `Workflows`](./examples/v1beta1/argo) +# Search Algorithms -Thus, Katib supports multiple frameworks with the help of different job kinds. - -### Search Algorithms - -Katib currently supports several search algorithms. Follow the +Katib supports several search algorithms. Follow the [Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/experiment/#search-algorithms-in-detail) -to know more about each algorithm. - -#### Hyperparameter Tuning - -- [Random Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Random_search) -- [Tree of Parzen Estimators (TPE)](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf) -- [Multivariate TPE](https://tech.preferred.jp/en/blog/multivariate-tpe-makes-optuna-even-more-powerful/) -- [Grid Search](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search) -- [Hyperband](https://arxiv.org/pdf/1603.06560.pdf) -- [Bayesian Optimization](https://arxiv.org/pdf/1012.2599.pdf) -- [Covariance Matrix Adaptation Evolution Strategy (CMA-ES)](https://arxiv.org/abs/1604.00772) -- [Sobol's Quasirandom Sequence](https://dl.acm.org/doi/10.1145/641876.641879) - -#### Neural Architecture Search - -- [Efficient Neural Architecture Search (ENAS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/enas) -- [Differentiable Architecture Search (DARTS)](https://github.com/kubeflow/katib/tree/master/pkg/suggestion/v1beta1/nas/darts) - -## Components in Katib - -Katib consists of several components as shown below. Each component is running -on Kubernetes as a deployment. Each component communicates with others via GRPC -and the API is defined at `pkg/apis/manager/v1beta1/api.proto`. - -- Katib main components: - - `katib-db-manager` - the GRPC API server of Katib which is the DB Interface. - - `katib-mysql` - the data storage backend of Katib using mysql. - - `katib-ui` - the user interface of Katib. - - `katib-controller` - the controller for the Katib CRDs in Kubernetes. - -## Web UI - -Katib provides a Web UI. -During 1.3 we've worked on a new iteration of the UI, which is rewritten in -Angular and is utilizing the common code of the other Kubeflow [dashboards](https://github.com/kubeflow/kubeflow/tree/master/components/crud-web-apps). - -The users are currently able to list, delete and create Experiments in their -cluster via this new UI as well as inspect the owned Trials. One important -missing functionalities are the ability to edit the Trial templates ConfigMaps -and view Neural Architecture Search models. Check [this Project](https://github.com/kubeflow/katib/projects/1) -to monitor the current progress. - -![katibui](./docs/images/katib-ui.png) - -To use the old Katib UI you can update the Katib image `newName` with the previous -image tag `docker.io/kubeflowkatib/katib-ui:v0.11.1` in the [Kustomize](./manifests/v1beta1/installs/katib-standalone/kustomization.yaml#L29) -manifests. - -## GRPC API documentation - -Check the [Katib v1beta1 API reference docs](https://www.kubeflow.org/docs/reference/katib/v1beta1/katib/). - -## Installation - -For standard installation of Katib with support for all job operators, -install Kubeflow. -Follow the documentation: - -- [Kubeflow installation guide](https://www.kubeflow.org/docs/started/getting-started/) -- [Kubeflow Katib guides](https://www.kubeflow.org/docs/components/katib/). - -If you install Katib with other Kubeflow components, -you can't submit Katib jobs in Kubeflow namespace. Check the -[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm) -to know more about it. - -Alternatively, if you want to install Katib manually with TF and PyTorch -operators support, follow these steps: - -Create Kubeflow namespace: - -``` -kubectl create namespace kubeflow +to know more about each algorithm and check the +[Suggestion service guide](/docs/new-algorithm-service.md) to implement your +custom algorithm. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Hyperparameter Tuning + + Neural Architecture Search + + Early Stopping +
+ Random Search + + ENAS + + Median Stop +
+ Grid Search + + DARTS + +
+ Bayesian Optimization + + +
+ TPE + + +
+ Multivariate TPE + + +
+ CMA-ES + + +
+ Sobol's Quasirandom Sequence + + +
+ HyperBand + + +
+ +To perform above algorithms Katib supports the following frameworks: + +- [Chocolate](https://github.com/AIworx-Labs/chocolate) +- [Goptuna](https://github.com/c-bata/goptuna) +- [Hyperopt](https://github.com/hyperopt/hyperopt) +- [Optuna](https://github.com/optuna/optuna) +- [Scikit Optimize](https://github.com/scikit-optimize/scikit-optimize) + +# Installation + +For the various Katib installs check the +[Kubeflow guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-setup). +Follow the next steps to install Katib standalone. + +## Prerequisites + +This is the minimal requirements to install Katib: + +- Kubernetes >= 1.17 +- `kubectl` >= 1.21 + +## Latest Version + +For the latest Katib version run this command: + +``` +kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master" +``` + +## Release Version + +For the specific Katib release (for example `v0.11.1`) run this command: + +``` +kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.11.1" +``` + +Make sure that all Katib components are running: + +``` +$ kubectl get pods -n kubeflow + +NAME READY STATUS RESTARTS AGE +katib-cert-generator-rw95w 0/1 Completed 0 35s +katib-controller-566595bdd8-hbxgf 1/1 Running 0 36s +katib-db-manager-57cd769cdb-4g99m 1/1 Running 0 36s +katib-mysql-7894994f88-5d4s5 1/1 Running 0 36s +katib-ui-5767cfccdc-pwg2x 1/1 Running 0 36s ``` -Clone Kubeflow manifest repository: +For the Katib Experiments check the [complete examples list](examples). -``` -git clone -b v1.2-branch git@github.com:kubeflow/manifests.git -Set `MANIFESTS_DIR` to the cloned folder. -export MANIFESTS_DIR= -``` - -### TF operator - -For installing TF operator, run the following: - -``` -cd "${MANIFESTS_DIR}/tf-training/tf-job-crds/base" -kustomize build . | kubectl apply -f - -cd "${MANIFESTS_DIR}/tf-training/tf-job-operator/base" -kustomize build . | kubectl apply -f - -``` - -### PyTorch operator - -For installing PyTorch operator, run the following: - -``` -cd "${MANIFESTS_DIR}/pytorch-job/pytorch-job-crds/base" -kustomize build . | kubectl apply -f - -cd "${MANIFESTS_DIR}/pytorch-job/pytorch-operator/base/" -kustomize build . | kubectl apply -f - -``` - -### Katib - -Note that your [kustomize](https://kustomize.io/) version should be >= 3.2. -To install Katib run: - -``` -git clone git@github.com:kubeflow/katib.git -make deploy -``` - -Check if all components are running successfully: - -``` -kubectl get pods -n kubeflow -``` - -Expected output: - -``` -NAME READY STATUS RESTARTS AGE -katib-controller-858d6cc48c-df9jc 1/1 Running 1 20m -katib-db-manager-7966fbdf9b-w2tn8 1/1 Running 0 20m -katib-mysql-7f8bc6956f-898f9 1/1 Running 0 20m -katib-ui-7cf9f967bf-nm72p 1/1 Running 0 20m -pytorch-operator-55f966b548-9gq9v 1/1 Running 0 20m -tf-job-operator-796b4747d8-4fh82 1/1 Running 0 21m -``` - -### Running examples +# Documentation -After deploy everything, you can run examples to verify the installation. +- Run your first Katib Experiment in the + [getting started guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm). -This is an example for TF operator: +- Learn about Katib **Concepts** in this + [guide](https://www.kubeflow.org/docs/components/katib/overview/#katib-concepts). -``` -kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/tfjob-example.yaml -``` - -This is an example for PyTorch operator: - -``` -kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/pytorchjob-example.yaml -``` - -Check the -[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#example-using-random-algorithm) -how to monitor your `Experiment` status. - -You can view your results in Katib UI. -If you used standard installation, access the Katib UI via Kubeflow dashboard. -Otherwise, port-forward the `katib-ui`: - -``` -kubectl -n kubeflow port-forward svc/katib-ui 8080:80 -``` - -You can access the Katib UI using this URL: `http://localhost:8080/katib/`. - -### Katib SDK +- Learn about Katib **Interfaces** in this + [guide](https://www.kubeflow.org/docs/components/katib/overview/#katib-interfaces). -Katib supports Python SDK: +- Learn about Katib **Components** in this + [guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-components). -- Check the [Katib v1beta1 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1). +- Know more about Katib in the [presentations and demos list](./docs/presentations.md). -Run `make generate` to update Katib SDK. - -### Cleanups - -To delete installed TF and PyTorch operator run `kubectl delete -f` -on the respective folders. - -To delete Katib run `make undeploy`. - -## Quick Start - -Please follow the -[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#examples) -to submit your first Katib experiment. - -## Community +# Community We are always growing our community and invite new users and AutoML enthusiasts to contribute to the Katib project. The following links provide information about getting involved in the community: -- If you use Katib, please update [the adopters list](ADOPTERS.md). +- Subscribe to the + [AutoML calendar](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ) + to attend Working Group bi-weekly community meetings. -- Subscribe - [to the calendar](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ) - to attend the AutoML WG community meeting. +- Check the + [AutoML and Training Working Group meeting notes](https://docs.google.com/document/d/1MChKfzrKAeFRtYqypFbMXL6ZIc_OgijjkvbqmwRV-64/edit). -- Check - [the AutoML WG meeting notes](https://docs.google.com/document/d/1MChKfzrKAeFRtYqypFbMXL6ZIc_OgijjkvbqmwRV-64/edit). +- If you use Katib, please update [the adopters list](ADOPTERS.md). -- Join - [the AutoML WG Slack channel](https://kubeflow.slack.com/archives/C018PMV53NW). +## Contributing -- Learn more about Katib in - [the presentations and demos list](./docs/presentations.md). +Please feel free to test the system! [Developer guide](./docs/developer-guide.md) +is a good starting point for our developers. -### Blog posts +## Blog posts - [Kubeflow Katib: Scalable, Portable and Cloud Native System for AutoML](https://blog.kubeflow.org/katib/) (by Andrey Velichkevich) -### Events +## Events - [AutoML and Training WG Summit. 16th of July 2021](https://docs.google.com/document/d/1vGluSPHmAqEr8k9Dmm82RcQ-MVnqbYYSfnjMGB-aPuo/edit?usp=sharing) -## Contributing - -Please feel free to test the system! -[developer-guide.md](./docs/developer-guide.md) is a good starting point -for developers. - ## Citation If you use Katib in a scientific publication, we would appreciate diff --git a/docs/images/katib-ui.png b/docs/images/katib-ui.png deleted file mode 100644 index eff91b6283e..00000000000 Binary files a/docs/images/katib-ui.png and /dev/null differ