kubeflow · k8s-ci-robot · Nov 13, 2020 · Nov 5, 2020 · Nov 5, 2020 · Nov 5, 2020
diff --git a/content/en/docs/components/katib/early-stopping.md b/content/en/docs/components/katib/early-stopping.md
@@ -0,0 +1,208 @@
++++
+title = "Using Early Stopping"
+description = "How to use early stopping in Katib experiments"
+weight = 60
+
++++
+
+This guide shows how you can use
+[early stopping](https://en.wikipedia.org/wiki/Early_stopping) to improve your
+Katib experiments. Early stopping allows you to avoid overfitting when you
+train your model during Katib experiments. It also helps by saving computing
+resources and reducing experiment execution time by stopping the experiment's
+trials when the target metric(s) no longer improves before the training process
+is complete.
+
+The major advantage of using early stopping in Katib is that you don't
+need to modify your
+[training container package](/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image).
+All you have to do is make necessary changes in your experiment's YAML file.
+
+Early stopping works in the same way as Katib's
+[metrics collector](/docs/components/katib/experiment/#metrics-collector).
+It analyses required metrics from the `stdout` or from the arbitrary output file
+and an early stopping algorithm makes the decision if the trial needs to be
+stopped. Currently, early stopping works only with
+`StdOut` or `File` metrics collectors.
+
+**Note**: Your training container must print training logs with the timestamp,
+because early stopping algorithms need to know the sequence of reported metrics.
+Check the
+[`MXNet` example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36)
+to learn how to add a date format to your logs.
+
+## Configure the experiment with early stopping
+
+As a reference, you can use the YAML file of the
+[early stopping example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml).
+
+1. Follow the
+   [guide](/docs/components/katib/experiment/#configuring-the-experiment)
+   to configure your Katib experiment.
+
+2. Next, to apply early stopping for your experiment, specify the `.spec.earlyStopping`
+   parameter, similar to the `.spec.algorithm`. Refer to the
+   [`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)
+   for more information.
+
+   - `.earlyStopping.algorithmName` - the name of the early stopping algorithm.
+
+   - `.earlyStopping.algorithmSettings`- the settings for the early stopping algorithm.
+
+What happens is your experiment's suggestion produces new trials. After that,
+the early stopping algorithm generates early stopping rules for the created
+trials. Once the trial reaches all the rules, it is stopped and the trial status
+is changed to the `EarlyStopped`. Then, Katib calls the suggestion again to
+ask for the new trials.
+
+Learn more about Katib concepts
+in the [overview guide](/docs/components/katib/overview/#katib-concepts).
+
+Follow the
+[Katib configuration guide](/docs/components/katib/katib-config/#early-stopping-settings)
+to specify your own image for the early stopping algorithm.
+
+### Early stopping algorithms in detail
+
+Here’s a list of the early stopping algorithms available in Katib:
+
+- [Median Stopping Rule](#median-stopping-rule)
+
+More algorithms are under development.
+
+You can add an early stopping algorithm to Katib yourself. Check the
+[developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md)
+to contribute.
+
+<a id="median-stopping-rule"></a>
+
+### Median Stopping Rule
+
+The early stopping algorithm name in Katib is `medianstop`.
+
+The median stopping rule stops a pending trial `X` at step `S` if the trial's
+best objective value by step `S` is worse than the median value of the running
+averages of all completed trials' objectives reported up to step `S`.
+
+To learn more about it, check
+[Google Vizier: A Service for Black-Box Optimization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).
+
+Katib supports the following early stopping settings:
+
+<div class="table-responsive">
+  <table class="table table-bordered">
+    <thead class="thead-light">
+      <tr>
+        <th>Setting Name</th>
+        <th>Description</th>
+        <th>Default Value</th>
+      </tr>
+    </thead>
+    <tbody>
+      <tr>
+        <td>min_trials_required</td>
+        <td>Minimal number of successful trials to compute median value</td>
+        <td>3</td>
+      </tr>
+      <tr>
+        <td>start_step</td>
+        <td>Number of reported intermediate results before stopping the trial</td>
+        <td>4</td>
+      </tr>
+    </tbody>
+  </table>
+</div>
+
+### Submit an early stopping experiment from the UI
+
+You can use Katib UI to submit an early stopping experiment. Follow
+[these steps](/docs/components/katib/experiment/#running-the-experiment-from-the-katib-ui)
+to create an experiment from the UI.
+
+Once you reach the early stopping section, select the appropriate values:
+
+<img src="/docs/images/katib/katib-early-stopping-parameter.png"
+  alt="UI form to deploy an early stopping Katib experiment"
+  class="mt-3 mb-3 border border-info rounded">
+
+## View the early stopping experiment results
+
+First, make sure you have [jq](https://stedolan.github.io/jq/download/)
+installed.
+
+Check the early stopped trials in your experiment:
+
+```shell
+kubectl get experiment <experiment-name>  -n <experiment-namespace> -o json | jq -r ".status"
+```
+
+The last part of the above command output looks similar to this:
+
+```yaml
+  . . .
+  "earlyStoppedTrialList": [
+    "median-stop-2ml8h96d",
+    "median-stop-cgjkq8zn",
+    "median-stop-pvn5p54p",
+    "median-stop-sjc9tcgc"
+  ],
+  "startTime": "2020-11-05T03:03:43Z",
+  "succeededTrialList": [
+    "median-stop-2kmh57qf",
+    "median-stop-7ccstz4z",
+    "median-stop-7sqt7556",
+    "median-stop-lgvhfch2",
+    "median-stop-mkfjtwbj",
+    "median-stop-nfmgqd7w",
+    "median-stop-nsbxw5m9",
+    "median-stop-nsmhg4p2",
+    "median-stop-rp88xflk",
+    "median-stop-xl7dlf5n",
+    "median-stop-ztc58kwq"
+  ],
+  "trials": 15,
+  "trialsEarlyStopped": 4,
+  "trialsSucceeded": 11
+}
+```
+
+Check the status of the early stopped trial by running this command:
+
+```shell
+kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>
+```
+
+and you should be able to view `EarlyStopped` status for the trial:
+
+```shell
+NAME                   TYPE           STATUS   AGE
+median-stop-2ml8h96d   EarlyStopped   True     15m
+```
+
+In addition, you can check your results on the Katib UI.
+The trial statuses on the experiment monitor page should look as follows:
+
+<img src="/docs/images/katib/katib-early-stopping-trials.png"
+  alt="UI form to view trials"
+  class="mt-3 mb-3 border border-info rounded">
+
+You can click on the early stopped trial name to get reported metrics before
+this trial is early stopped:
+
+<img src="/docs/images/katib/katib-early-stopping-trial-info.png"
+  alt="UI form to view trial info"
+  class="mt-3 mb-3 border border-info rounded">
+
+## Next steps
+
+- Learn how to
+  [configure and run your Katib experiments](/docs/components/katib/experiment/).
+
+- How to
+  [restart your experiment and use the resume policies](/docs/components/katib/resume-experiment/).
+
+- Check the
+  [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).
+
+- How to [set up environment variables](/docs/components/katib/env-variables/)
+  for each Katib component.
diff --git a/content/en/docs/components/katib/env-variables.md b/content/en/docs/components/katib/env-variables.md
@@ -1,7 +1,7 @@
 +++
 title = "Environment Variables for Katib Components"
 description = "How to set up environment variables for each Katib component"
-weight = 60
+weight = 80
 
 +++
 

diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md
@@ -1,5 +1,5 @@
 +++
-title = "Running an experiment"
+title = "Running an Experiment"
 description = "How to configure and run a hyperparameter tuning or neural architecture search experiment in Katib"
 weight = 30
 
@@ -177,8 +177,7 @@ Katib currently supports several search algorithms.
 Refer to the
 [`AlgorithmSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L22-L39).
 
-Here's a list of the search algorithms available in Katib. The links lead to
-descriptions on this page:
+Here's a list of the search algorithms available in Katib:
 
 - [Grid search](#grid-search)
 - [Random search](#random-search)
@@ -189,8 +188,9 @@ descriptions on this page:
 - [Neural Architecture Search based on ENAS](#enas)
 - [Differentiable Architecture Search (DARTS)](#darts)
 
-More algorithms are under development. You can add an algorithm to Katib
-yourself. Check the guide to
+More algorithms are under development.
+
+You can add an algorithm to Katib yourself. Check the guide to
 [adding a new algorithm](https://github.com/kubeflow/katib/blob/master/docs/new-algorithm-service.md)
 and the
 [developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md).
@@ -815,6 +815,9 @@ View the results of the experiment in the Katib UI:
   neural architecture search, check the
   [introduction to Katib](/docs/components/katib/overview/).
 
+- Boost your hyperparameter tuning experiment with
+  the [early stopping guide](/docs/components/katib/early-stopping/)
+
 - Check the
   [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).
 

diff --git a/content/en/docs/components/katib/hyperparameter.md b/content/en/docs/components/katib/hyperparameter.md
@@ -1,5 +1,5 @@
 +++
-title = "Getting started with Katib"
+title = "Getting Started with Katib"
 description = "How to set up Katib and perform hyperparameter tuning"
 weight = 20
 

diff --git a/content/en/docs/components/katib/katib-config.md b/content/en/docs/components/katib/katib-config.md
@@ -1,7 +1,7 @@
 +++
 title = "Katib Configuration Overview"
 description = "How to make changes in Katib configuration"
-weight = 50
+weight = 70
 
 +++
 
@@ -10,8 +10,17 @@ This guide describes
 the Kubernetes
 [Config Map](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) that contains information about:
 
-1. Current [metrics collectors](/docs/components/katib/experiment/#metrics-collector) (`key = metrics-collector-sidecar`).
-1. Current [algorithms](/docs/components/katib/experiment/#search-algorithms-in-detail) (suggestions) (`key = suggestion`).
+1. Current
+   [metrics collectors](/docs/components/katib/experiment/#metrics-collector)
+   (`key = metrics-collector-sidecar`).
+
+1. Current
+   [algorithms](/docs/components/katib/experiment/#search-algorithms-in-detail)
+   (suggestions) (`key = suggestion`).
+
+1. Current
+   [early stopping algorithms](/docs/components/katib/early-stopping/#early-stopping-algorithms-in-detail)
+   (`key = early-stopping`).
 
 The Katib Config Map must be deployed in the
 [`KATIB_CORE_NAMESPACE`](/docs/components/katib/env-variables/#katib-controller)
@@ -119,16 +128,16 @@ suggestion: |-
 }
 ```
 
-All of these settings except **`image`** can be omitted. If you don't specify any other settings,
-a default value is set automatically.
+All of these settings except **`image`** can be omitted. If you don't specify
+any other settings, a default value is set automatically.
 
 1. `image` - a Docker image for the suggestion's container with a `random`
    algorithm (**must be specified**).
 
    Image example: `docker.io/kubeflowkatib/<suggestion-name>`
 
    For each algorithm (suggestion) you can specify one of the following
-   suggestion names in Docker image:
+   suggestion names in the Docker image:
 
    <div class="table-responsive">
      <table class="table table-bordered">
@@ -216,3 +225,79 @@ a default value is set automatically.
    in which case, the pod uses the
    [default](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server)
    service account.
+
+   **Note:** If you want to run your experiments with
+   [early stopping](/docs/components/katib/early-stopping/),
+   the suggestion's deployment must have permission to update the experiment's
+   trial status. If you don't specify a service account in the Katib config,
+   Katib controller creates required
+   [Kubernetes Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac)
+   for the suggestion.
+
+   If you need your own service account for the experiment's
+   suggestion with early stopping, you have to follow the rules:
+
+   - The service account name can't be equal to
+     `<experiment-name>-<experiment-algorithm>`
+
+   - The service account must have sufficient permissions to update
+     the experiment's trial status.
+
+## Early stopping settings
+
+These settings are related to Katib early stopping, where:
+
+- key: `early-stopping`
+- value: corresponding JSON settings for each early stopping algorithm name
+
+If you want to use a new early stopping algorithm, you need to update the
+Katib config. For example, using a `medianstop` early stopping algorithm with
+all settings looks as follows:
+
+```json
+early-stopping: |-
+{
+  "medianstop": {
+    "image": "docker.io/kubeflowkatib/earlystopping-medianstop",
+    "imagePullPolicy": "Always"
+  },
+  ...
+}
+```
+
+All of these settings except **`image`** can be omitted. If you don't specify
+any other settings, a default value is set automatically.
+
+1. `image` - a Docker image for the early stopping's container with a
+   `medianstop` algorithm (**must be specified**).
+
+   Image example: `docker.io/kubeflowkatib/<early-stopping-name>`
+
+   For each early stopping algorithm you can specify one of the following
+   early stopping names in the Docker image:
+
+   <div class="table-responsive">
+     <table class="table table-bordered">
+       <thead class="thead-light">
+         <tr>
+           <th>Early stopping name</th>
+           <th>Early stopping algorithm</th>
+           <th>Description</th>
+         </tr>
+       </thead>
+       <tbody>
+         <tr>
+           <td><code>earlystopping-medianstop</code></td>
+           <td><code>medianstop</code></td>
+           <td><a href="https://github.com/kubeflow/katib/tree/master/pkg/earlystopping/v1beta1/medianstop">Katib
+             Median Stopping</a> implementation</td>
+         </tr>
+       </tbody>
+     </table>
+   </div>
+
+1. `imagePullPolicy` - an
+   [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images)
+   for the early stopping's container with a `medianstop` algorithm.
+
+   The default value is `IfNotPresent`