Add Optuna based suggestion service #1613

g-votte · 2021-08-11T00:48:53Z

What this PR does / why we need it:
This PR proposes to add the suggestion service based on Optuna. Optuna provides several sampling algorithms that have not been implemented in other suggestion services in Katib, such as multi-variate TPE and constant liar.

In addition, as discussed in #1549, Optuna can offer the extension of multi-objective optimization when Katib supports that interface.

I've written the example of invoking multi-variate TPE in a separated repository, so that we can test and discuss the new algorithm based on the Optuna service. Multi-variate TPE captures the dependencies among multiple inputs and shows better performances than normal TPE in many benchmark tasks. If the example looks fine, I'd also like to add it in another PR.

Example: https://github.com/g-votte/katib-optuna-example

Checklist:

Docs included if any changes are user facing

google-cla · 2021-08-11T00:48:58Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

aws-kf-ci-bot · 2021-08-11T00:49:06Z

Hi @g-votte. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

g-votte · 2021-08-11T01:05:01Z

@googlebot I signed it!

c-bata · 2021-08-11T04:12:53Z

/ok-to-test

c-bata · 2021-08-11T04:13:23Z

/assign @c-bata

johnugeorge · 2021-08-11T14:35:44Z

/ok-to-test

andreyvelich

Thank you for this awesome contribution @g-votte!
I left few comments.
Please can you also add one example, for instance of using multivariate TPE ?

@gaocegege @johnugeorge Since we are planing to cut the release on the next week, do we want to include this feature in Katib 0.12 ?

cmd/suggestion/optuna/v1beta1/Dockerfile

pkg/suggestion/v1beta1/optuna/service.py

andreyvelich · 2021-08-11T15:06:31Z

pkg/suggestion/v1beta1/optuna/service.py

+                else:
+                    # The trial has not been suggested by the Optuna study.
+                    # A new trial object is created and reported using study.add_trial() with the assignments and the search space.
+                    optuna_trial = optuna.create_trial(


Please can you tell me the use-case when "the Trial has not been suggested by the Optuna study"?

Thanks for the question. This block is to deal with the change of suggestion logic during an experiment, e.g. Optuna-based search after a certain number of bayesianoptimization trials, but does it happen in Katib?

If it is guaranteed that the suggestion service is unchangeable during an experiment, I will change the logic so that an error is raised instead of creating and adding Optuna trials.

If it is guaranteed that the suggestion service is unchangeable during an experiment

Yeah, currently we don't provide a functionality to change Suggestion logic during Experiment run. For example, changing Suggestion algorithm.
User can modify only Experiment budget. Check this: https://www.kubeflow.org/docs/components/katib/resume-experiment/#modify-running-experiment.
cc @gaocegege @johnugeorge

In that case, should we remove this part for now?
In the future if we decide to add this feature in the controller, we can extend the Suggestion logic.
What do you think @g-votte @c-bata ?

In that case, should we remove this part for now?

Agree. Goptuna suggestion service doesn't also support such a use case now.

@andreyvelich @c-bata
Thanks for your comments. I changed the logic so that it raises an error for unknown assignments. (I also changed the test case because the previous test passes the assignments externally created as the initial trials.)
Commit: 1c0e15f

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

gaocegege · 2021-08-12T02:00:30Z

Thanks for your contribution! 🎉 👍

https://github.com/g-votte/katib-optuna-example is missing

gaocegege · 2021-08-12T02:11:17Z

Since we are planing to cut the release on the next week, do we want to include this feature in Katib 0.12 ?

This PR will not break existing features, thus I think we can have it. Maybe mark it alpha or beta, WDYT @andreyvelich

g-votte · 2021-08-12T02:14:30Z

Thanks for your contribution! 🎉 👍

https://github.com/g-votte/katib-optuna-example is missing

@gaocegege
Oops. I've just made this public.

gaocegege

LGTM.

In this PR, additionalMetricNames is used to provide multiple metrics to katib, I think we should have a new CRD design to support it.

WDYT @andreyvelich @johnugeorge

c-bata

@g-votte Thank you for your pull request. Overall looks good 💯 I put some review comments.

c-bata · 2021-08-12T03:09:09Z

pkg/suggestion/v1beta1/optuna/service.py

+        name = algorithm_spec.algorithm_name
+        settings = {s.name:s.value for s in algorithm_spec.algorithm_settings}
+
+        if name == "tpe" or name == "multivariate-tpe":


You need to update the following files to use multivariate-tpe on New Katib UI.

https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/constants/algorithms-types.const.ts

https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/constants/algorithms-settings.const.ts

https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/enumerations/algorithms.enum.ts

But it might be enough to work on this in a separated pull request.

Thanks for your information. Since changing those components is user-facing, I'd like to work on that in a separated PR.

c-bata · 2021-08-12T03:10:05Z

cmd/suggestion/optuna/v1beta1/Dockerfile

@@ -0,0 +1,31 @@
+FROM python:3.6


[nits] How about using Python 3.9?

Sounds good. I updated the Python version. Some dependencies in requirements.txt are also updated for that purpose.

Commit: a4ae6e2

c-bata · 2021-08-12T06:23:08Z

pkg/suggestion/v1beta1/optuna/service.py

+    def _get_assignments_key(self, assignments):
+        assignments = sorted(assignments, key=lambda a: a.name)
+        assignments_str = [str(a) for a in assignments]
+        return ",".join(assignments_str)


[Question] I have two questions.

Does the string representation of Assignment object hold both parameter names and parameter values?

You defined assignments_to_optuna_number as defaultdict(list). I guess the reason why you use list is for duplicated hyperparameters, right?

Yes; this should contain both the assignment's name and value, using the string representation of the Assignment class.

katib/pkg/suggestion/v1beta1/internal/trial.py

Lines 88 to 89 in abbc9c9

def __str__(self):

return "Assignment(name={}, value={})".format(self.name, self.value)

As you mention, this is to handle duplications of assignments.

Great! Sounds like it works 👍

This is a very nit-picking comment but I'd say the following code is more clear that contains both the parameter name and the value. And a bit memory efficient.

assignments_str = [f"{a.name}:{a.value}" for a in assignments]

Agree. I changed the line.
b2085a6

johnugeorge · 2021-08-12T13:00:45Z

@gaocegege our definition of additionalMetricName is different. This seems to be a new CRD design.

Since this PR doesn't affect anything else, this can go in this release also.

andreyvelich · 2021-08-12T14:19:12Z

Can we copy this example: https://github.com/g-votte/katib-optuna-example/blob/main/experiment.yaml under https://github.com/kubeflow/katib/tree/master/examples/v1beta1 with name multivariate-tpe-example.yaml ?

I think in the future PRs we can update our katib-config as you did here: https://github.com/g-votte/katib-optuna-example/blob/main/katib-config.yaml#L47-L50.

WDYT @g-votte @c-bata @gaocegege ?

gaocegege · 2021-08-13T02:00:08Z

SGTM

g-votte · 2021-08-13T02:00:26Z

@andreyvelich
Thanks for your comment. I copied the example to examples/v1beta1/.
Commit: a119fb2

Please let me know if we should not include this until katib-config.yaml is updated. I can separate the PR in that case.

CC: @c-bata @gaocegege

c-bata · 2021-08-13T05:11:28Z

pkg/suggestion/v1beta1/optuna/service.py

+
+            kwargs["multivariate"] = name == "multivariate-tpe"
+
+            sampler = optuna.samplers.TPESampler(**kwargs)


@g-votte I think it's reasonable to set kwargs["constant_liar"] = True by default. Because you know, Katib allows us to run distributed optimization easily. What do you think?

Or we can provide an algorithm setting to set constant_liar=True.

Good catch. There is less reason to disable constant liar especially with parallel optimization.
Added a line to turn on the option by default.
1d81885

andreyvelich · 2021-08-13T13:38:47Z

Please let me know if we should not include this until katib-config.yaml is updated. I can separate the PR in that case.

It's fine that we have this example, once we push docker.io/kubeflowkatib/suggestion-optuna to the Docker hub, we can update Katib config.

g-votte · 2021-08-16T00:17:49Z

@andreyvelich @c-bata @gaocegege @johnugeorge
Thanks for your swift and detailed reviews!
I think all comments are reflected. PTAL.

andreyvelich · 2021-08-16T00:58:00Z

Thank you for implementing this @g-votte!
/lgtm
Others please take a look @c-bata @gaocegege @johnugeorge.

c-bata · 2021-08-16T01:43:36Z

LGTM 🎉

johnugeorge · 2021-08-16T14:08:40Z

/lgtm

johnugeorge · 2021-08-16T14:09:37Z

/approve

google-oss-robot · 2021-08-16T14:09:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: g-votte, johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [johnugeorge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Implement Optuna service and cmd

85863d7

aws-kf-ci-bot added the needs-ok-to-test label Aug 11, 2021

google-oss-robot added the size/L label Aug 11, 2021

google-oss-robot requested review from hougangliu, johnugeorge and sperlingxx August 11, 2021 00:49

g-votte mentioned this pull request Aug 11, 2021

Support multi-objective optimization #1549

Open

aws-kf-ci-bot added ok-to-test and removed needs-ok-to-test labels Aug 11, 2021

google-oss-robot assigned c-bata Aug 11, 2021

andreyvelich reviewed Aug 11, 2021

View reviewed changes

g-votte and others added 5 commits August 12, 2021 09:04

Update pkg/suggestion/v1beta1/optuna/service.py

2890121

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Update pkg/suggestion/v1beta1/optuna/service.py

425c4d5

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Update pkg/suggestion/v1beta1/optuna/service.py

8485f3b

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Update pkg/suggestion/v1beta1/optuna/service.py

9f651d8

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

Merge the blocks of self.lock in OptunaService

6486bf3

Remove Cython installation

5dd65c8

gaocegege reviewed Aug 12, 2021

View reviewed changes

c-bata reviewed Aug 12, 2021

View reviewed changes

andreyvelich mentioned this pull request Aug 12, 2021

AutoML WG and Kubeflow 1.4 release kubeflow/manifests#1958

Closed

g-votte added 2 commits August 13, 2021 10:47

Update Python version for the Optuna suggestion service

a4ae6e2

Add the example yaml of multivarite-tpe

a119fb2

google-oss-robot added size/XL and removed size/L labels Aug 13, 2021

c-bata reviewed Aug 13, 2021

View reviewed changes

andreyvelich mentioned this pull request Aug 13, 2021

Add Optuna Suggestion to Katib ECR List kubeflow/testing#958

Merged

1 task

g-votte added 3 commits August 15, 2021 07:07

Fix the logic of handling unknown trials

1c0e15f

Use name and value instead of the string representation of assignment

b2085a6

Turn on constant liar by default

1d81885

google-oss-robot assigned andreyvelich Aug 16, 2021

google-oss-robot added the lgtm label Aug 16, 2021

google-oss-robot assigned johnugeorge Aug 16, 2021

google-oss-robot added the approved label Aug 16, 2021

google-oss-robot merged commit 7439a37 into kubeflow:master Aug 16, 2021

This was referenced Aug 16, 2021

[Release] Katib 0.12 release #1597

Closed

Add Support for Optuna in Katib kubeflow/website#2880

Merged

g-votte deleted the optuna-service branch August 17, 2021 00:15

g-votte mentioned this pull request Aug 19, 2021

Sync Algorithm Settings names for various Optimisation Frameworks #1627

Open

	def __str__(self):
	return "Assignment(name={}, value={})".format(self.name, self.value)


		kwargs["multivariate"] = name == "multivariate-tpe"

		sampler = optuna.samplers.TPESampler(**kwargs)

Add Optuna based suggestion service #1613

Add Optuna based suggestion service #1613

Conversation

g-votte commented Aug 11, 2021

google-cla bot commented Aug 11, 2021

What to do if you already signed the CLA

Individual signers

Corporate signers

aws-kf-ci-bot commented Aug 11, 2021

g-votte commented Aug 11, 2021

c-bata commented Aug 11, 2021

c-bata commented Aug 11, 2021

johnugeorge commented Aug 11, 2021

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaocegege commented Aug 12, 2021

gaocegege commented Aug 12, 2021

g-votte commented Aug 12, 2021

gaocegege left a comment

Choose a reason for hiding this comment

c-bata left a comment

Choose a reason for hiding this comment

c-bata Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c-bata Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c-bata Aug 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnugeorge commented Aug 12, 2021

andreyvelich commented Aug 12, 2021

gaocegege commented Aug 13, 2021

g-votte commented Aug 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreyvelich commented Aug 13, 2021

g-votte commented Aug 16, 2021

andreyvelich commented Aug 16, 2021

c-bata commented Aug 16, 2021

johnugeorge commented Aug 16, 2021

johnugeorge commented Aug 16, 2021

google-oss-robot commented Aug 16, 2021

c-bata Aug 12, 2021 •

edited

Loading

c-bata Aug 12, 2021 •

edited

Loading

c-bata Aug 13, 2021 •

edited

Loading