-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Katib early stopping documentation #2336
Add Katib early stopping documentation #2336
Conversation
This is very cool @andreyvelich ! |
@RFMVasconcelos Thank you! |
3757562
to
66842ed
Compare
This PR is ready. @8bitmp3 I capitalise all titles to be consistent with other guides. For example: Notebooks or KFP. |
@@ -0,0 +1,204 @@ | |||
+++ | |||
title = "Using Early Stopping" | |||
description = "How to use an early stopping in Katib experiments" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description = "How to use an early stopping in Katib experiments" | |
description = "How to use early stopping in Katib experiments" |
Katib experiments. Early stopping allows you to avoid overfitting when you | ||
train your model during Katib experiments. It helps you to save computing | ||
resources and experiment execution time by stopping the experiment's trials | ||
before the training process is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.
For example:
Katib experiments. Early stopping allows you to avoid overfitting when you | |
train your model during Katib experiments. It helps you to save computing | |
resources and experiment execution time by stopping the experiment's trials | |
before the training process is complete. | |
Katib experiments. Early stopping allows you to avoid overfitting when you | |
train your model during Katib experiments. It also helps by saving computing | |
resources and reducing experiment execution time by stopping the experiment's trials | |
when the target metric(s) no longer improves before the training process is complete. |
Notice the use of "the target metric(s)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, nice explanation!
resources and experiment execution time by stopping the experiment's trials | ||
before the training process is complete. | ||
|
||
The major advantage of using early stopping in Katib, is that you don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major advantage of using early stopping in Katib, is that you don't | |
The major advantage of using early stopping in Katib is that you don't |
The major advantage of using early stopping in Katib, is that you don't | ||
need to modify your | ||
[training container package](/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image). | ||
All you have to do is to change your experiment YAML file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All you have to do is to change your experiment YAML file. | |
All you have to do is make necessary changes in your experiment's YAML file. |
because early stopping algorithms need to know the sequence of reported metrics. | ||
Check the | ||
[`MXNet` example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36) | ||
how to add date format to your logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to add date format to your logs. | |
to learn how to add a date format to your logs. |
As a reference, you can use the YAML file of the | ||
[early stopping example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml). | ||
|
||
First of all, follow the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, follow the | |
1. Follow the |
First of all, follow the | ||
[guide](/docs/components/katib/experiment/#configuring-the-experiment) | ||
to configure your Katib experiment. | ||
To apply early stopping for your experiment, specify the `.spec.earlyStopping` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To apply early stopping for your experiment, specify the `.spec.earlyStopping` | |
2. Next, to apply early stopping for your experiment, specify the `.spec.earlyStopping` |
to configure your Katib experiment. | ||
To apply early stopping for your experiment, specify the `.spec.earlyStopping` | ||
parameter, similar to the `.spec.algorithm`. Refer to the | ||
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58) | |
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58) | |
for more information. |
|
||
- `.earlyStopping.algorithmSettings`- the settings for the early stopping algorithm. | ||
|
||
Experiment's suggestion produces new trials. After that, the early stopping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experiment's suggestion produces new trials. After that, the early stopping | |
What happens is your experiment's suggestion produces new trials. After that, the early stopping |
or "will produce... will generate..."
### Early stopping algorithms in detail | ||
|
||
Here’s a list of the early stopping algorithms available in Katib. | ||
The links lead to descriptions on this page: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The links lead to descriptions on this page: |
This may be redundant if self-evident, I think
|
||
- [Median Stopping Rule](#median-stopping-rule) | ||
|
||
More algorithms are under development. You can add an early stopping algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More algorithms are under development. You can add an early stopping algorithm | |
More algorithms are under development. | |
You can add an early stopping algorithm |
best objective value by step `S` is worse than the median value of the running | ||
averages of all completed trials' objectives reported up to step `S`. | ||
|
||
To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). | |
To learn more about it, check [Google Vizier: A Service for Black-Box Optimization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). |
You have to install [jq](https://stedolan.github.io/jq/download/), | ||
to run below commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to install [jq](https://stedolan.github.io/jq/download/), | |
to run below commands. | |
First, make sure you have [jq](https://stedolan.github.io/jq/download/) installed. |
} | ||
``` | ||
|
||
If you check status for the early stopped trial: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you check status for the early stopped trial: | |
Check the status of the early stopped trial by running this command: |
kubectl get trial median-stop-2ml8h96d -n <experiment-namespace> | ||
``` | ||
|
||
You should be able to view `EarlyStopped` status for the trial: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to view `EarlyStopped` status for the trial: | |
and you should be able to view `EarlyStopped` status for the trial: |
As well, you can check the results on the Katib UI. | ||
The trial statuses on the experiment monitor page looks as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well, you can check the results on the Katib UI. | |
The trial statuses on the experiment monitor page looks as follows: | |
In addition, you can check your results on the Katib UI. | |
The trial statuses on the experiment monitor page should look as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andreyvelich 💯 !
AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.
...Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It also helps by saving computing
resources and reducing experiment execution time by stopping the experiment's trials
when the target metric(s) no longer improves before the training process is complete.
Notice the use of "the target metric(s)"
LMKWYT
Cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @8bitmp3.
I've made changes.
/lgtm /assign @animeshsingh @Bobgy PTAL and |
Thanks @8bitmp3! |
This PR has changes in |
@andreyvelich can you make a sub folder for katib in the images folder and add katib owners there? We can merge this first |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, Bobgy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
docs/components/katib/images will be better if that's feasible, but I feel like the doc website doesn't support it Can you have a try? |
I'll try. |
Blocked by: #2312.
Related: kubeflow/katib#1360.
I added doc of using early stopping in Katib.
/assign @gaocegege @johnugeorge
/cc @8bitmp3 @RFMVasconcelos