Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Tune API in the Katib SDK #1951

Merged
merged 6 commits into from
Oct 4, 2022

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented Sep 13, 2022

Fixes: #1585.
Inspired by KFP create_component_from_func.

This API will allow user to create Katib HP Tuning Experiment without building the Experiment YAML.
This is the first small step to simplify our Kubeflow SDKs and to avoid Kubernetes complexity.

Later we can extend this functionality (support more Katib features in a simple way), give more spec options via APIs.

Also, we can provide API (e.g. katib.list_base_images) to show list of supported images (e.g. BASE_IMAGE_TENSORFLOW, BASE_IMAGE_MXNET) with their description.

cc @kubeflow/wg-training-leads @tenzen-y @anencore94 Please give your feedback on the API design.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@andreyvelich
Copy link
Member Author

/hold Hold until kubeflow/training-operator#1659 is merged.

@coveralls
Copy link

coveralls commented Sep 13, 2022

Coverage Status

Coverage remained the same at 73.431% when pulling bce47b9 on andreyvelich:sdk-create-from-func into b1e912c on kubeflow:master.

@tenzen-y
Copy link
Member

Thanks for implementing the new feature!
First, I'll review the PR submitted in training-operator repo.

@andreyvelich andreyvelich requested a review from a team September 14, 2022 11:33
Copy link
Member

@anencore94 anencore94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for awesome feature !!! users could generate katib experiment much easier now :)
I left some personal opinions.

Comment on lines +204 to +205
# Import required packages.
import time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this kind of logic is necesary, but we might find out more friendly way to specify packages by users.
However, as a minimal feature, this is ok if we could instruct users that they couldn't make a mistake to miss required packages in the objective function.

Copy link
Member Author

@andreyvelich andreyvelich Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think Kubeflow Pipelines follows the same way: https://www.kubeflow.org/docs/components/pipelines/sdk/python-function-components/#building-python-function-based-components
@zijianjoy Do you know if Pipelines SDK verifies that all function imports are defined properly in Python Lightweight component?

@anencore94 Any ideas how to verify this?
cc @tenzen-y @gaocegege @johnugeorge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to automatically validating this after this PR. How about making an issue for tracking this feature now ?
If we could make an idea for this validation in general, we may apply this for other kf components like pipelines and training-operators and so on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sound good @anencore94.


return result


def double(min: float, max: float, step: float = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about gather these hyperparameter helper function to another inner module?
For example, the users could call these functions by

import kubeflow.katib as katib

hp_search_space = {
    "a": katib.hp_search_space.int(min=10, max=20),
    "b": katib.hp_search_space.double(min=0.1, max=0.2)
}

katib_client.tune(
    name=name,
    objective=objective,
    hp_search_space=hp_search_space,
    objective_metric_name="result",
    max_trial_count=12
)

IMO, importing by katib.int makes me it's too board that we couldn't see it is for hyperparameter search space type immediately.

Copy link
Member Author

@andreyvelich andreyvelich Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even thinking to change this API in the future to support various distributions: uniform, loguniform, categorical, etc.

FYI, hyperopt uses hp module, skop uses space module, optuna uses distributions module, Ray uses tune module. I think we can learn something from it.

search module (e.g. katib.search.int(1,5)) looks good to me, but I am not sure what is the best UX here.

What others think @terrytangyuan @gaocegege @johnugeorge @c-bata @a9p @g-votte @tenzen-y ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref #1951 (comment).

IMO it feels more natural if we could support built-in lists or common numpy arrays, etc. Is there a particular motivation for bringing a separate module in the Katib SDK just to do the same thing?

Copy link
Member Author

@andreyvelich andreyvelich Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terrytangyuan List and numpy arrays might not be enough.
How we can identify search space for uniform or loguniform in the future?

e.g. Hyperopt utilises some numpy APIs, but it still has it's own functions for the search space: https://github.com/hyperopt/hyperopt/blob/master/hyperopt/pyll/stochastic.py#L41-L43

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions in that module seem to be very thin wrappers. If we could completely avoid creating additional wrappers/layers, it would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @andreyvelich.
In this PR, we can start in the minimal feature. So, we can support other distributions in other PRs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terrytangyuan Do you have any ideas how we can improve UX on this ?

Copy link
Member

@terrytangyuan terrytangyuan Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference. I am ok as is if other projects are following the same convention.

sdk/python/v1beta1/kubeflow/katib/constants/constants.py Outdated Show resolved Hide resolved
sdk/python/v1beta1/kubeflow/katib/api/katib_client.py Outdated Show resolved Hide resolved
sdk/python/v1beta1/kubeflow/katib/api/katib_client.py Outdated Show resolved Hide resolved
algorithm_name: str = "random",
objective_metric_name: str = None,
additional_metric_names: List[str] = [],
objective_type: str = "maximize",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could use "maximize" and "minimize" as an Enum class or Literal

Copy link
Member Author

@andreyvelich andreyvelich Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about it.
@anencore94 Is there a way to automatically generate enums for our Python Models ?
In our APIs type param is enum: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L96

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "automatically" means that automatically generate python Enum class from the golang file or openapi.json ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "automatically" means that automatically generate python Enum class from the golang file or openapi.json ?

Exactly. I am not sure if kube-openapi supports enum generation. Check this: kubernetes/enhancements#2887.
Maybe we should update kube-openapi version for this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no many experience with it sadly. How about make an issue with this feature, and merge this PR with hardcoded for now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @anencore94 !

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Modify packages_to_install doc
Create validate objective function
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich Thanks for your contributions!
I left a few comments.

sdk/python/v1beta1/setup.py Outdated Show resolved Hide resolved
@andreyvelich
Copy link
Member Author

Thanks everyone for review!
I addressed the remaining comments and create separate search module for HPs.
Please take a look @anencore94 @tenzen-y @johnugeorge @terrytangyuan

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich Thanks for updating this!

sdk/python/v1beta1/kubeflow/katib/api/search.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
andreyvelich and others added 2 commits September 30, 2022 17:29
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
@tenzen-y
Copy link
Member

Thanks for addressing my comments!
/lgtm

@johnugeorge
Copy link
Member

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [andreyvelich,johnugeorge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@johnugeorge
Copy link
Member

Thanks @andreyvelich for this great feature

@andreyvelich
Copy link
Member Author

Thanks everyone for your feedback and review!
I am looking forward for the new Katib usage.
/hold cancel

@google-oss-prow google-oss-prow bot merged commit 96ab64b into kubeflow:master Oct 4, 2022
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

/lgtm

@andreyvelich andreyvelich deleted the sdk-create-from-func branch October 4, 2022 21:24
@andreyvelich andreyvelich mentioned this pull request Nov 8, 2022
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate Container Build for Trial training containers
6 participants