Skip to content

Commit

Permalink
Modify guides
Browse files Browse the repository at this point in the history
  • Loading branch information
andreyvelich committed Nov 11, 2020
1 parent dac91bd commit 66842ed
Show file tree
Hide file tree
Showing 6 changed files with 160 additions and 65 deletions.
94 changes: 50 additions & 44 deletions content/en/docs/components/katib/early-stopping.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,75 @@
+++
title = "Using Early Stopping"
description = "How to use early stopping in Katib experiments"
description = "How to use an early stopping in Katib experiments"
weight = 60

+++

This page shows how you can use
This guide shows how you can use
[early stopping](https://en.wikipedia.org/wiki/Early_stopping) to improve your
Katib experiments.
Early stopping allows you to avoid overfitting when you train your model
during Katib experiments.
It helps you to save computing resources and experiment execution time by
stopping the experiment's trials before the training process is complete.
Katib experiments. Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It helps you to save computing
resources and experiment execution time by stopping the experiment's trials
before the training process is complete.

The major advantage of using early stopping in Katib, is that you don't
need to modify your
[training container package](/docs/components/hyperparameter-tuning/experiment/#packaging-your-training-code-in-a-container-image).
[training container package](/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image).
All you have to do is to change your experiment YAML file.

Early stopping works in the same way as Katib's
[metrics collector](http://localhost:1313/docs/components/hyperparameter-tuning/experiment/#metrics-collector).
It analyses required metrics from `stdout` or from the arbitrary output file and
an early stopping algorithm makes the decision if the trial needs to be stopped.
Currently, early stopping works only with `StdOut` or `File` metrics collectors.
[metrics collector](/docs/components/katib/experiment/#metrics-collector).
It analyses required metrics from the `stdout` or from the arbitrary output file
and an early stopping algorithm makes the decision if the trial needs to be
stopped. Currently, early stopping works only with
`StdOut` or `File` metrics collectors.

**Note**: Your training container must print training logs with the timestamp,
because early stopping algorithms need to know the sequence of reported metrics.
See the
[example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36)
Check the
[`MXNet` example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36)
how to add date format to your logs.

## Configure the experiment with early stopping

As a reference, you can use the YAML file of the
[early stopping example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml).

First of all, follow the [guide](/docs/components/hyperparameter-tuning/experiment/#configuring-the-experiment)
First of all, follow the
[guide](/docs/components/katib/experiment/#configuring-the-experiment)
to configure your Katib experiment.
To apply early stopping on your experiment, specify `.spec.earlyStopping`
parameter, similar to `.spec.algorithm`. See the
To apply early stopping for your experiment, specify the `.spec.earlyStopping`
parameter, similar to the `.spec.algorithm`. Refer to the
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)

- `.earlyStopping.algorithmName` - is the name of the early stopping algorithm.
- `.earlyStopping.algorithmName` - the name of the early stopping algorithm.

- `.earlyStopping.algorithmSettings`- is the settings for the early stopping algorithm.
- `.earlyStopping.algorithmSettings`- the settings for the early stopping algorithm.

Experiment's suggestion produces new trials. After that, the early stopping
algorithm generates early stopping rules for the created trials.
Once the trial reaches all the rules, it is stopped and the trial status is
transferred to `EarlyStopped`.
After that, Katib calls the suggestion again to ask for the new trials.
changed to the `EarlyStopped`. Then, Katib calls the suggestion again to
ask for the new trials.

Read more about Katib concepts in the
[overview guide](/docs/components/hyperparameter-tuning/overview/#katib-concepts).
Learn more about Katib concepts
in the [overview guide](/docs/components/katib/overview/#katib-concepts).

Follow the
[Katib configuration guide](/docs/components/hyperparameter-tuning/katib-config/#early-stopping-settings)
to see how you can specify your own image for the early stopping algorithm.
[Katib configuration guide](/docs/components/katib/katib-config/#early-stopping-settings)
to specify your own image for the early stopping algorithm.

### Early stopping algorithms in detail

Katib currently supports one early stopping algorithm.
Here’s a list of the early stopping algorithms available in Katib.
The links lead to descriptions on this page:

- [Median Stopping Rule](#median-stopping-rule)

More algorithms are under development. You can add an early stopping algorithm
to Katib yourself. See the
[developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md) to contribute.
to Katib yourself. Check the
[developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md)
to contribute.

<a id="median-stopping-rule"></a>

Expand Down Expand Up @@ -96,12 +97,12 @@ Katib supports the following early stopping settings:
<tbody>
<tr>
<td>min_trials_required</td>
<td>Minimal number of complete trials to compute median value</td>
<td>Minimal number of successful trials to compute median value</td>
<td>3</td>
</tr>
<tr>
<td>start_step</td>
<td>Number of reported intermediate results before stopping the trials</td>
<td>Number of reported intermediate results before stopping the trial</td>
<td>4</td>
</tr>
</tbody>
Expand All @@ -110,12 +111,11 @@ Katib supports the following early stopping settings:

### Submit an early stopping experiment from the UI

You can use Katib UI to submit an early stopping experiment.
Follow
[these steps](/docs/components/hyperparameter-tuning/experiment/#running-the-experiment-from-the-katib-ui)
to create the experiment from the UI.
You can use Katib UI to submit an early stopping experiment. Follow
[these steps](/docs/components/katib/experiment/#running-the-experiment-from-the-katib-ui)
to create an experiment from the UI.

Once you reach early stopping section, select the appropriate values:
Once you reach the early stopping section, select the appropriate values:

<img src="/docs/images/katib/katib-early-stopping-parameter.png"
alt="UI form to deploy an early stopping Katib experiment"
Expand All @@ -126,7 +126,7 @@ Once you reach early stopping section, select the appropriate values:
You have to install [jq](https://stedolan.github.io/jq/download/),
to run below commands.

Check early stopped trials in your experiment:
Check the early stopped trials in your experiment:

```shell
kubectl get experiment <experiment-name> -n <experiment-namespace> -o json | jq -r ".status"
Expand Down Expand Up @@ -168,31 +168,37 @@ If you check status for the early stopped trial:
kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>
```

You see the `EarlyStopped` status for the trial:
You should be able to view `EarlyStopped` status for the trial:

```shell
NAME TYPE STATUS AGE
median-stop-2ml8h96d EarlyStopped True 15m
```

As well, you can see results on the Katib UI.
Check trial statuses on the experiment monitor page:
As well, you can check the results on the Katib UI.
The trial statuses on the experiment monitor page looks as follows:

<img src="/docs/images/katib/katib-early-stopping-trials.png"
alt="UI form to view trials"
class="mt-3 mb-3 border border-info rounded">

If you click on the early stopped trial name, you see reported metrics before trial
is early stopped:
You can click on the early stopped trial name to get reported metrics before this
trial is early stopped:

<img src="/docs/images/katib/katib-early-stopping-trial-info.png"
alt="UI form to view trial info"
class="mt-3 mb-3 border border-info rounded">

## Next steps

- TODO: Add link to resume Experiment
- Learn how to
[configure and run your Katib experiments](/docs/components/katib/experiment/).

- Read about [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).
- How to
[restart your experiment and use the resume policies](/docs/components/katib/resume-experiment/).

- How to [set up environment variables](/docs/components/katib/env-variables/) for each Katib component.
- Check the
[Katib Configuration (Katib config)](/docs/components/katib/katib-config/).

- How to [set up environment variables](/docs/components/katib/env-variables/)
for each Katib component.
13 changes: 4 additions & 9 deletions content/en/docs/components/katib/experiment.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Running an experiment"
title = "Running an Experiment"
description = "How to configure and run a hyperparameter tuning or neural architecture search experiment in Katib"
weight = 30

Expand Down Expand Up @@ -815,16 +815,11 @@ View the results of the experiment in the Katib UI:
neural architecture search, check the
[introduction to Katib](/docs/components/katib/overview/).

<<<<<<< HEAD:content/en/docs/components/katib/experiment.md
- Boost your hyperparameter tuning experiment with
the [early stopping guide](/docs/components/katib/early-stopping/)

- Check the
[Katib Configuration (Katib config)](/docs/components/katib/katib-config/).
=======
* Follow the [early stopping guide](/docs/components/hyperparameter-tuning/early-stopping/)
to see how you can boost your hyperparameter tunning experiments.

* For a detailed instruction of the Katib Configuration file,
read the [Katib config page](/docs/components/hyperparameter-tuning/katib-config/).
>>>>>>> Add early stopping doc:content/en/docs/components/hyperparameter-tuning/experiment.md

- How to [set up environment variables](/docs/components/katib/env-variables/)
for each Katib component.
2 changes: 1 addition & 1 deletion content/en/docs/components/katib/hyperparameter.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Getting started with Katib"
title = "Getting Started with Katib"
description = "How to set up Katib and perform hyperparameter tuning"
weight = 20

Expand Down
97 changes: 91 additions & 6 deletions content/en/docs/components/katib/katib-config.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Katib Configuration Overview"
description = "How to make changes in Katib configuration"
weight = 90
weight = 70

+++

Expand All @@ -10,8 +10,17 @@ This guide describes
the Kubernetes
[Config Map](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) that contains information about:

1. Current [metrics collectors](/docs/components/katib/experiment/#metrics-collector) (`key = metrics-collector-sidecar`).
1. Current [algorithms](/docs/components/katib/experiment/#search-algorithms-in-detail) (suggestions) (`key = suggestion`).
1. Current
[metrics collectors](/docs/components/katib/experiment/#metrics-collector)
(`key = metrics-collector-sidecar`).

1. Current
[algorithms](/docs/components/katib/experiment/#search-algorithms-in-detail)
(suggestions) (`key = suggestion`).

1. Current
[early stopping algorithms](/docs/components/katib/early-stopping/#early-stopping-algorithms-in-detail)
(`key = early-stopping`).

The Katib Config Map must be deployed in the
[`KATIB_CORE_NAMESPACE`](/docs/components/katib/env-variables/#katib-controller)
Expand Down Expand Up @@ -119,16 +128,16 @@ suggestion: |-
}
```

All of these settings except **`image`** can be omitted. If you don't specify any other settings,
a default value is set automatically.
All of these settings except **`image`** can be omitted. If you don't specify
any other settings, a default value is set automatically.

1. `image` - a Docker image for the suggestion's container with a `random`
algorithm (**must be specified**).

Image example: `docker.io/kubeflowkatib/<suggestion-name>`

For each algorithm (suggestion) you can specify one of the following
suggestion names in Docker image:
suggestion names in the Docker image:

<div class="table-responsive">
<table class="table table-bordered">
Expand Down Expand Up @@ -216,3 +225,79 @@ a default value is set automatically.
in which case, the pod uses the
[default](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server)
service account.

**Note:** If you want to run your experiments with
[early stopping](/docs/components/katib/early-stopping/),
the suggestion's deployment must have permission to update the experiment's
trial status. If you don't specify a service account in the Katib config,
Katib controller creates required
[Kubernetes Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac)
for the suggestion.

If you need your own service account for the experiment's
suggestion with early stopping, you have to follow the rules:

- The service account name can't be equal to
`<experiment-name>-<experiment-algorithm>`

- The service account must have sufficient permissions to update
the experiment's trial status.

## Early stopping settings

These settings are related to Katib early stopping, where:

- key: `early-stopping`
- value: corresponding JSON settings for each early stopping algorithm name

If you want to use a new early stopping algorithm, you need to update the
Katib config. For example, using a `medianstop` early stopping algorithm with
all settings looks as follows:

```json
early-stopping: |-
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop",
"imagePullPolicy": "Always"
},
...
}
```

All of these settings except **`image`** can be omitted. If you don't specify
any other settings, a default value is set automatically.

1. `image` - a Docker image for the early stopping's container with a
`medianstop` algorithm (**must be specified**).

Image example: `docker.io/kubeflowkatib/<early-stopping-name>`

For each early stopping algorithm you can specify one of the following
early stopping names in the Docker image:

<div class="table-responsive">
<table class="table table-bordered">
<thead class="thead-light">
<tr>
<th>Early stopping name</th>
<th>Early stopping algorithm</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>earlystopping-medianstop</code></td>
<td><code>medianstop</code></td>
<td><a href="https://github.com/kubeflow/katib/tree/master/pkg/earlystopping/v1beta1/medianstop">Katib
Median Stopping</a> implementation</td>
</tr>
</tbody>
</table>
</div>

1. `imagePullPolicy` - an
[image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images)
for the early stopping's container with a `medianstop` algorithm.

The default value is `IfNotPresent`
17 changes: 13 additions & 4 deletions content/en/docs/components/katib/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,19 @@ weight = 10
This guide introduces the concepts of hyperparameter tuning, neural
architecture search, and the Katib system as a component of Kubeflow.

Katib is a Kubernetes-native project for automated machine learning (AutoML) —
it's a system for hyperparameter tuning and neural architecture search (NAS).
Katib supports a number of machine learning frameworks, including
TensorFlow, MXNet, PyTorch, XGBoost, and others.
Katib is a Kubernetes-native project for automated machine learning (AutoML).
Katib supports hyperparameter tuning, early stopping and
neural architecture search (NAS).
Learn more about AutoML at [fast.ai](https://www.fast.ai/2018/07/16/auto-ml2/),
[Google Cloud](https://cloud.google.com/automl),
[Microsoft Azure](https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml#automl-in-azure-machine-learning) or
[Amazon SageMaker](https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/).

Katib is the project which is agnostic to machine learning (ML) frameworks.
It can tune hyperparameters of applications written in any language
of the users' choice and natively supports many ML frameworks,
such as TensorFlow, MXNet, PyTorch, XGBoost, and others.

Katib supports a lot of various AutoML algorithms, such as
[Bayesian optimization](https://arxiv.org/pdf/1012.2599.pdf),
[Tree of Parzen Estimators](https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf),
Expand Down Expand Up @@ -75,6 +79,11 @@ hyperparameter tuning job (_experiment_). Each trial tests a different set of
hyperparameter configurations. At the end of the experiment, Katib outputs
the optimized values for the hyperparameters.

You can improve you hyperparameter tunning experiments by using
[early stopping](https://en.wikipedia.org/wiki/Early_stopping) techniques.
Follow the [early stopping guide](/docs/components/katib/early-stopping/)
for the details.

## Neural architecture search

{{% alert title="Alpha version" color="warning" %}}
Expand Down
2 changes: 1 addition & 1 deletion content/en/docs/components/katib/trial-template.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Overview of trial templates"
title = "Overview of Trial Templates"
description = "How to specify trial template parameters and support a custom resource (CRD) in Katib"
weight = 40

Expand Down

0 comments on commit 66842ed

Please sign in to comment.