Skip to content

Commit

Permalink
Modify README
Browse files Browse the repository at this point in the history
  • Loading branch information
andreyvelich committed Nov 24, 2020
1 parent acfe7aa commit 18aadf6
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 22 deletions.
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Katib stands for `secretary` in Arabic.
For a detailed description of the concepts in Katib and AutoML, check the
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/overview/).

Katib has the concepts of `Experiment`, `Trial`, `Worker Job` and `Suggestion`.
Katib has the concepts of `Experiment`, `Suggestion`, `Trial` and `Worker Job`.

### Experiment

Expand Down Expand Up @@ -111,17 +111,17 @@ its objective value.
The `Worker Job` can be any type of Kubernetes resource or
[Kubernetes CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
Follow the [`Trial` template guide](https://www.kubeflow.org/docs/components/katib/trial-template/#custom-resource)
to check how to support your own Kubernetes resource in Katib.
to support your own Kubernetes resource in Katib.

Katib has these CRD examples in upstream:

- [Kubernetes `Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/)

- [Kubeflow `TFJob`](/docs/components/training/tftraining/)
- [Kubeflow `TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/)

- [Kubeflow `PyTorchJob`](/docs/components/training/pytorch/)
- [Kubeflow `PyTorchJob`](https://www.kubeflow.org/docs/components/training/pytorch/)

- [Kubeflow `MPIJob`](/docs/components/training/mpi)
- [Kubeflow `MPIJob`](https://www.kubeflow.org/docs/components/training/mpi/)

- [Tekton `Pipeline`](https://github.com/tektoncd/pipeline)

Expand Down Expand Up @@ -154,10 +154,10 @@ on Kubernetes as a deployment. Each component communicates with others via GRPC
and the API is defined at `pkg/apis/manager/v1beta1/api.proto`.

- Katib main components:
- katib-db-manager: GRPC API server of Katib which is the DB Interface.
- katib-mysql: Data storage backend of Katib using mysql.
- katib-ui: User interface of Katib.
- katib-controller: Controller for Katib CRDs in Kubernetes.
- `katib-db-manager` - the GRPC API server of Katib which is the DB Interface.
- `katib-mysql` - the data storage backend of Katib using mysql.
- `katib-ui` - the user interface of Katib.
- `katib-controller` - the controller for the Katib CRDs in Kubernetes.

## Web UI

Expand Down Expand Up @@ -277,8 +277,8 @@ Check the
how to monitor your `Experiment` status.

You can view your results in Katib UI.
Access Katib UI via Kubeflow dashboard if you have used standard installation
or port-forward the `katib-ui` service if you have installed manually.
If you used standard installation, access the Katib UI via Kubeflow dashboard.
Otherwise, port-forward the `katib-ui`:

```
kubectl -n kubeflow port-forward svc/katib-ui 8080:80
Expand All @@ -288,7 +288,7 @@ You can access the Katib UI using this URL: `http://localhost:8080/katib/`.

### Katib SDK

Katib supports Python SDK for v1beta1 version.
Katib supports Python SDK:

- Check the [Katib v1beta1 SDK documentation](https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1).

Expand All @@ -299,12 +299,13 @@ Run `make generate` to update Katib SDK.
To delete installed TF and PyTorch operator run `kubectl delete -f`
on the respective folders.

To delete Katib for v1beta1 version run `make undeploy`.
To delete Katib run `make undeploy`.

## Quick Start

Please follow the
[Getting Started guide](https://www.kubeflow.org/docs/components/katib/hyperparameter/#katib-setup)
[Kubeflow documentation](https://www.kubeflow.org/docs/components/katib/hyperparameter/#examples)
to submit your first Katib experiment.

## Who are using Katib?

Expand Down
17 changes: 9 additions & 8 deletions docs/workflow-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ spec:
kind: Job
metadata:
name: random-example-2fpnqfv8
namespace: anonymous
namespace: kubeflow
spec:
template:
spec:
Expand Down Expand Up @@ -365,9 +365,10 @@ status:

## What happens after an `Experiment` CR is created

When user creates an `Experiment` CR, Katib controllers using `Experiment`
controller, `Suggestion` controller and `Trial` controller is working together
to achieve hyperparameters tuning for user's Machine learning model.
When user creates an `Experiment` CR, Katib `Experiment` controller,
`Suggestion` controller and `Trial` controller is working together to achieve
hyperparameters tuning for user's Machine learning model. The Experiment
workflow looks as follows:

<center>
<img width="100%" alt="image" src="images/katib-workflow.png">
Expand Down Expand Up @@ -398,11 +399,11 @@ to achieve hyperparameters tuning for user's Machine learning model.
Kubernetes Pods.

1. Katib Pod mutating webhook is called to inject the metrics collector sidecar
container to the candidate Pod.
container to the candidate Pods.

1. During the ML model container runs, the metrics collector container in
the same Pod tries to collect metrics from it and persists them
to the Katib DB backend.
1. During the ML model container runs, the metrics collector container
collects metrics from the injected pod and persists metrics to the Katib
DB backend.

1. When the ML model training ends, the `Trial` controller updates status
of the corresponding `Trial` CR.
Expand Down

0 comments on commit 18aadf6

Please sign in to comment.