From ac3ed14c0669bdacc7224783b1c4249cd98ccfb2 Mon Sep 17 00:00:00 2001 From: avelichk Date: Wed, 28 Oct 2020 23:07:58 +0000 Subject: [PATCH 1/5] Default values for parallel and max Trial count --- .../en/docs/components/katib/experiment.md | 36 +++++++++++++++++-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 35a9210a75..5973051f7c 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -75,12 +75,42 @@ These are the fields in the experiment configuration spec: Refer to the [`ObjectiveSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L93). -- **parallelTrialCount**: The maximum number of hyperparameter sets that Katib - should train in parallel. +* **algorithm**: The search algorithm that you want Katib to use to find the + best hyperparameters or neural architecture configuration. Examples include + random search, grid search, Bayesian optimization, and more. + See the [search algorithm details](#search-algorithms) below. + +* **trialTemplate**: The template that defines the trial. + You must package your ML training code into a Docker image, as described + [above](#docker-image). You must configure the model's + hyperparameters either as command-line arguments or as environment variables, + so that Katib can automatically set the values in each trial. + + You can use one of the following job types to train your model: + + - [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) + (does not support distributed execution). + - [Kubeflow TFJob](/docs/guides/components/tftraining/) (supports + distributed execution). + - [Kubeflow PyTorchJob](/docs/guides/components/pytorch/) (supports + distributed execution). + + See the [`TrialTemplate` + type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1alpha3/experiment_types.go#L189-L203). + The template + uses the [Go template format](https://golang.org/pkg/text/template/). + + You can define the job in raw string format or you can use a + [ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/). + [Here](https://github.com/kubeflow/katib/blob/master/manifests/v1alpha3/katib-controller/trialTemplateConfigmapLabeled.yaml) is an example how to create ConfigMap with trial templates. + +* **parallelTrialCount**: The maximum number of hyperparameter sets that Katib + should train in parallel. Default value is 3. - **maxTrialCount**: The maximum number of trials to run. This is equivalent to the number of hyperparameter sets that Katib should - generate to test the model. + generate to test the model. If value is omitted, experiment is running until + objective goal is reached or experiment reaches maximum number of failed trials. - **maxFailedTrialCount**: The maximum number of failed trials before Katib should stop the experiment. From 225367036c17c46c42c4311f8849ea387e564eca Mon Sep 17 00:00:00 2001 From: avelichk Date: Thu, 5 Nov 2020 22:51:12 +0000 Subject: [PATCH 2/5] Add article --- content/en/docs/components/katib/experiment.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 5973051f7c..14eaa531e9 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -105,12 +105,13 @@ These are the fields in the experiment configuration spec: [Here](https://github.com/kubeflow/katib/blob/master/manifests/v1alpha3/katib-controller/trialTemplateConfigmapLabeled.yaml) is an example how to create ConfigMap with trial templates. * **parallelTrialCount**: The maximum number of hyperparameter sets that Katib - should train in parallel. Default value is 3. + should train in parallel. The default value is 3. - **maxTrialCount**: The maximum number of trials to run. This is equivalent to the number of hyperparameter sets that Katib should - generate to test the model. If value is omitted, experiment is running until - objective goal is reached or experiment reaches maximum number of failed trials. + generate to test the model. If value is omitted, your experiment is running + until the objective goal is reached or the experiment reaches + a maximum number of failed trials. - **maxFailedTrialCount**: The maximum number of failed trials before Katib should stop the experiment. From 83c6068a5b26b1481bcc80611c20f41befe95781 Mon Sep 17 00:00:00 2001 From: avelichk Date: Thu, 5 Nov 2020 22:53:06 +0000 Subject: [PATCH 3/5] Fix --- content/en/docs/components/katib/experiment.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 14eaa531e9..e464dd1060 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -109,9 +109,9 @@ These are the fields in the experiment configuration spec: - **maxTrialCount**: The maximum number of trials to run. This is equivalent to the number of hyperparameter sets that Katib should - generate to test the model. If value is omitted, your experiment is running - until the objective goal is reached or the experiment reaches - a maximum number of failed trials. + generate to test the model. If the `maxTrialCount` value is omitted, your + experiment is running until the objective goal is reached or the experiment + reaches a maximum number of failed trials. - **maxFailedTrialCount**: The maximum number of failed trials before Katib should stop the experiment. From a04de33b0c72aa10df4a4e5ad8cc46c6b52dc89a Mon Sep 17 00:00:00 2001 From: avelichk Date: Wed, 11 Nov 2020 13:55:25 +0000 Subject: [PATCH 4/5] Point omitted --- .../en/docs/components/katib/experiment.md | 38 ++----------------- 1 file changed, 4 insertions(+), 34 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index e464dd1060..3d84b0609b 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -75,48 +75,18 @@ These are the fields in the experiment configuration spec: Refer to the [`ObjectiveSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L93). -* **algorithm**: The search algorithm that you want Katib to use to find the - best hyperparameters or neural architecture configuration. Examples include - random search, grid search, Bayesian optimization, and more. - See the [search algorithm details](#search-algorithms) below. - -* **trialTemplate**: The template that defines the trial. - You must package your ML training code into a Docker image, as described - [above](#docker-image). You must configure the model's - hyperparameters either as command-line arguments or as environment variables, - so that Katib can automatically set the values in each trial. - - You can use one of the following job types to train your model: - - - [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) - (does not support distributed execution). - - [Kubeflow TFJob](/docs/guides/components/tftraining/) (supports - distributed execution). - - [Kubeflow PyTorchJob](/docs/guides/components/pytorch/) (supports - distributed execution). - - See the [`TrialTemplate` - type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1alpha3/experiment_types.go#L189-L203). - The template - uses the [Go template format](https://golang.org/pkg/text/template/). - - You can define the job in raw string format or you can use a - [ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/). - [Here](https://github.com/kubeflow/katib/blob/master/manifests/v1alpha3/katib-controller/trialTemplateConfigmapLabeled.yaml) is an example how to create ConfigMap with trial templates. - -* **parallelTrialCount**: The maximum number of hyperparameter sets that Katib +- **parallelTrialCount**: The maximum number of hyperparameter sets that Katib should train in parallel. The default value is 3. - **maxTrialCount**: The maximum number of trials to run. This is equivalent to the number of hyperparameter sets that Katib should - generate to test the model. If the `maxTrialCount` value is omitted, your + generate to test the model. If the `maxTrialCount` value is **omitted**, your experiment is running until the objective goal is reached or the experiment reaches a maximum number of failed trials. - **maxFailedTrialCount**: The maximum number of failed trials before Katib - should stop the experiment. - This is equivalent to the number of failed hyperparameter sets that Katib - should test. + should stop the experiment. This is equivalent to the number of failed + hyperparameter sets that Katib should test. If the number of failed trials exceeds `maxFailedTrialCount`, Katib stops the experiment with a status of `Failed`. From 9c560b3f3d4ee665459c27d33743a6c786e6517e Mon Sep 17 00:00:00 2001 From: Andrey Velichkevich Date: Wed, 11 Nov 2020 15:55:35 +0000 Subject: [PATCH 5/5] Update content/en/docs/components/katib/experiment.md Co-authored-by: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> --- content/en/docs/components/katib/experiment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 3d84b0609b..2570473a89 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -81,7 +81,7 @@ These are the fields in the experiment configuration spec: - **maxTrialCount**: The maximum number of trials to run. This is equivalent to the number of hyperparameter sets that Katib should generate to test the model. If the `maxTrialCount` value is **omitted**, your - experiment is running until the objective goal is reached or the experiment + experiment will be running until the objective goal is reached or the experiment reaches a maximum number of failed trials. - **maxFailedTrialCount**: The maximum number of failed trials before Katib