Require uninitialized optimizers for our learners #119

timokau · 2020-05-22T19:06:11Z

Description

An initialized optimizer is a tensorflow object, which (at least in
graph mode in tf1) is not deepcopy-able. Even if we were able to
deeocopy it, we probably wouldn't want to since it contains state.
Scikit-learn needs to be able to deepcopy an estimators arguments so
that it can create copies and derivatives of it.

Instead we require the uninitialized optimizer and its parameters to be
passed to our learners separately. The learner can then initialize the
optimizer as needed.

I have only done this for CmpNet so far. We'd have to do it for all estimators that take an optimizer (i.e. pretty much all of them). Before I do that, I would like some feedback on the general approach. I went with a function with out arguments since that retains the most flexibility. It may be a bit weird to users that want to override the optimizer though. An alternative would be to pass the name of a keras optimizer and a configuration dict. That would be less "weird", but also less flexible since users could no longer write their own custom estimators. @kiudee @prithagupta any thoughts?

Motivation and Context

Continuation of #116.

How Has This Been Tested?

Ran the test suite and the pre-commit hooks.

Does this close/impact existing issues?

#94, #116

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.

timokau · 2020-05-26T13:39:02Z

Rebased.

codecov · 2020-05-26T13:49:45Z

Codecov Report

Merging #119 into master will increase coverage by 0.12%.
The diff coverage is 84.44%.

@@            Coverage Diff             @@
##           master     #119      +/-   ##
==========================================
+ Coverage   59.96%   60.08%   +0.12%     
==========================================
  Files         116      116              
  Lines        7658     7662       +4     
==========================================
+ Hits         4592     4604      +12     
+ Misses       3066     3058       -8

Impacted Files	Coverage Δ
csrank/choicefunction/cmpnet_choice.py	`84.78% <ø> (ø)`
csrank/choicefunction/fate_choice.py	`91.30% <ø> (ø)`
csrank/choicefunction/feta_choice.py	`66.00% <ø> (ø)`
csrank/choicefunction/ranknet_choice.py	`81.63% <ø> (ø)`
csrank/discretechoice/cmpnet_discrete_choice.py	`86.84% <ø> (ø)`
csrank/discretechoice/fate_discrete_choice.py	`94.44% <ø> (ø)`
csrank/discretechoice/feta_discrete_choice.py	`55.00% <ø> (ø)`
csrank/discretechoice/ranknet_discrete_choice.py	`84.61% <ø> (ø)`
csrank/objectranking/cmp_net.py	`87.17% <ø> (ø)`
csrank/objectranking/fate_object_ranker.py	`93.33% <ø> (ø)`
... and 16 more

kiudee · 2020-05-30T07:58:16Z

I took a look at what skorch (a scikit-learn wrapper for PyTorch) is doing. There solution can be found here:
https://github.com/skorch-dev/skorch/blob/a621d0fa31c87adbdbb3b297328caefb09d42b8c/skorch/net.py#L504-L535
Basically, they allow users to pass in an uninitialized class, which will be initialized on fit. The user can specify parameters using the nested __ notation (which turns out to be the official way they do it in scikit-learn).
An alternative would be as you say to pass in a configuration dictionary.

kiudee · 2020-05-30T08:32:54Z

One big consideration here is hyperparameter optimization. The parameters of an optimizer (e.g. learning rate) are often optimized jointly with other parameters of the learner.

So an important use-case is for instance that RandomSearchCV is run on the model.
The way this is organized in skorch and scikit-learn, a user would be able to pass something like:

{
    'module__n_hidden_layers': ...,
    'module__n_hidden_units': ...,
    'criterion__param1': ...,
    'optimizer__learning_rate': ...,
}

And the hierarchy there can be nested.

timokau · 2020-05-30T09:16:00Z

I took a look at what skorch (a scikit-learn wrapper for PyTorch) is doing. There solution can be found here:
https://github.com/skorch-dev/skorch/blob/a621d0fa31c87adbdbb3b297328caefb09d42b8c/skorch/net.py#L504-L535
Basically, they allow users to pass in an uninitialized class, which will be initialized on fit. The user can specify parameters using the nested __ notation (which turns out to be the official way they do it in scikit-learn).
An alternative would be as you say to pass in a configuration dictionary.

The __ approach seems hacky to me. There is a natural way to pass nested configuration in python: dicts.

One big consideration here is hyperparameter optimization. The parameters of an optimizer (e.g. learning rate) are often optimized jointly with other parameters of the learner.

Of course, I don't know why I didn't think of that. That's a very good reason not to use the initializer-function approach I demonstrated in this PR. Good thing I waited for feedback first.

So an important use-case is for instance that RandomSearchCV is run on the model.
The way this is organized in skorch and scikit-learn, a user would be able to pass something like:
{
    'module__n_hidden_layers': ...,
    'module__n_hidden_units': ...,
    'criterion__param1': ...,
    'optimizer__learning_rate': ...,
}
And the hierarchy there can be nested.

It looks to me like the __ thing is a hack around the limitations of scikit-learn's hyperparameter optimization. From a brief skim over their documentation, it seems like there is no easy way to define distributions over dictionary-nested parameters. You would have to define a custom distribution function.

If that was more easily possible, you could also define different sets of hyperparameters for different optimizers and then hierarchically pick an optimizer first, then pick the hyperparameters for the optimizer.

Still, the design of scikit-learn combined with the fact that it seems to be the standard seems like a strong argument in favor of __. So even though I think dict-based configuration would be more elegant, we probably should go ahead with the sktorch approach. Do you agree?

kiudee · 2020-05-30T16:59:20Z

Indeed, that also felt very hacky to me, too.
But I also agree that it is something people are used to and which is not that difficult to adapt to as a user. So I think we should follow that.

timokau · 2020-06-05T12:45:40Z

I have pushed a proof of concept. It doesn't actually quite pass the tests yet, but it shows two issues with this approach when specifying default optimizer arguments:

adds a lot of boilerplate for passing these around
how should we handle the case when the user picks a different optimizer that doesn't take the nesterov argument?

The first one could probably be worked around by centralizing our defaults (which we should do anyways I think), but the second one is a bit ugly.

timokau · 2020-06-05T12:46:01Z

csrank/choicefunction/cmpnet_choice.py

@@ -85,6 +88,9 @@ def __init__(
            kernel_initializer=kernel_initializer,
            activation=activation,
            optimizer=optimizer,
+            optimizer__lr=optimizer__lr,
+            optimizer__momentum=optimizer__momentum,
+            optimizer__nesterov=optimizer__nesterov,


This is what I am talking about.

timokau · 2020-06-05T14:18:59Z

I see some possible solutions:

Simply drop the nesterov and momentum defaults. That is what skorch does: They only provide a default for lr and rely on the optimizer's defaults for other parameters.
Provide a parallel mechanism of passing optimizer parameters, e.g. an optimizer_config: dict attribute. We could supply a default for that which is then overridden by any manual changes. It would likely be confusing to merge & use the two configuration mechanisms though.
Set the default optimizer to None and specify in the docs that None refers to SGD with our default options.
Try to figure out which parameters apply to which optimizer, using introspection. This is hard to get right and brittle.

I think all in all, the most elegant approach would be to fall back to either the lambda-parameter or a tuple (optimizer, param_dict). Given that we want to use the __ approach, solution (1) seems best to me. Is there any particular reason that we're overriding the optimizers defaults here?

Are there other options I'm not seeing?

timokau · 2020-06-05T14:20:20Z

If we want to keep the parameters, (3) is probably best (we could also chose a string instead of None to represent that default). Or we could define our own optimizer, which is just SGD with those changed defaults.

kiudee · 2020-06-08T11:37:58Z

I agree that the cleanest solution would be (1) here. The only reason I see why we could provide different defaults is that we think that these work better out of the box.
But that is hard to guarantee in general.

timokau · 2020-06-09T08:11:04Z

So should I go ahead then and remove the nasterov and momentum defaults? What about the learning rate? Its always set to 1e-4. The default is 1e-2, so maybe that is too high for our purposes.

timokau · 2020-06-09T08:12:29Z

If we want to keep lr that is probably okay, since I expect all optimizers will accept a lr argument. Our default may not make sense for any optimizer though, and it may be somewhat cleaner to be "consistent" in not overriding the optimizers defaults.

kiudee · 2020-06-09T12:07:57Z

I would say we can leave all of those defaults to the corresponding optimizers. Usually, you need to adapt the parameters to your specific problem anyways.

As discussed in kiudee#119. The reasoning is that we don't have good reason to override those, the library user will likely have to tune them anyways (or use a different optimizer altogether). At the same time, they make the design proposed in kiudee#119 (passing uninitialized optimizers and their parameters separately) more difficult.

timokau · 2020-06-10T15:00:51Z

I moved that step to #142.

timokau · 2020-06-11T16:08:21Z

I started with the transition, but there is still quite a bit of work to be done. Still WIP.

timokau · 2020-06-14T15:54:25Z

The testsuite passes now. I had to disable one test though where the reason for failure was not obvious (FATELINEAR_DC). I'll have to look into that in more detail. Some cleanup is needed as well.

timokau · 2020-06-14T16:01:53Z

The FATELINEAR_DC issue was straightforward after all. Only documentation & a self-review remaining.

An initialized optimizer is a tensorflow object, which (at least in graph mode in tf1) is not deepcopy-able. Even if we were able to deeocopy it, we probably wouldn't want to since it contains state. Scikit-learn needs to be able to deepcopy an estimators arguments so that it can create copies and derivatives of it. Instead we require the uninitialized optimizer and its parameters to be passed to our learners separately. The learner can then initialize the optimizer as needed.

timokau · 2020-06-17T16:02:13Z

Ready for review. This is a breaking change.

timokau · 2020-06-17T16:03:07Z

Switched the milestone following semver.

kiudee

This looks good to me.

Since we now break backward compatibility, but still might want to backport patches for 1.x.y, we have to think about establishing legacy branches, which travis-ci also deploys from.

timokau · 2020-06-24T17:42:50Z

Thanks for the review!

Since we now break backward compatibility, but still might want to backport patches for 1.x.y, we have to think about establishing legacy branches, which travis-ci also deploys from.

Yes, that'd be something to explore. Branches could always be created retroactively though once we actually have to ship some fix.

timokau · 2020-06-24T17:44:56Z

Speaking of travis, there seems to be something wrong with the github integration. The check in github is still marked as in progress, but when clicking on "Details" travis tells you that its long finished. Is it possible to restart the job?

kiudee · 2020-06-25T09:13:52Z

After restarting the job, it fails now with:

'Keras requires TensorFlow 2.2 or higher. '

Newer keras versions delegate to tf.keras and therefore need tf2. See https://github.com/keras-team/keras/releases/tag/2.4.0.

timokau · 2020-06-26T13:53:36Z

I have pinned keras and travis is happy. Please have another look.

timokau · 2020-06-27T13:42:59Z

It turns out that the optimizer=uninitialized_optimizer pattern still doesn't pass scikit-learn's estimator check. There seems to be no good reason for this though, I did some digging and opened an upstream issue: scikit-learn/scikit-learn#17756

I think this is still good to merge. The test is likely just overly restrictive.

This was referenced May 22, 2020

[WIP] Make our estimators compatible with scikit-learn #116

Draft

Add pygmo to requirements-dev #120

Merged

timokau force-pushed the no-direct-optimizer branch from a4e49c7 to 536d2e2 Compare May 26, 2020 13:36

timokau mentioned this pull request May 30, 2020

Migrate away from tf1 #125

Closed

kiudee marked this pull request as draft June 1, 2020 13:09

kiudee added this to the 1.3 milestone Jun 5, 2020

timokau force-pushed the no-direct-optimizer branch from 536d2e2 to 556b75d Compare June 5, 2020 12:43

timokau commented Jun 5, 2020

View reviewed changes

timokau mentioned this pull request Jun 10, 2020

Do not override the optimizer's default parameters #142

Merged

7 tasks

timokau force-pushed the no-direct-optimizer branch 4 times, most recently from ec39655 to e90699d Compare June 11, 2020 16:06

timokau force-pushed the no-direct-optimizer branch from 4dc2376 to 96518a4 Compare June 14, 2020 15:59

timokau force-pushed the no-direct-optimizer branch from d1a4c36 to 57a3490 Compare June 17, 2020 15:58

timokau force-pushed the no-direct-optimizer branch from 57a3490 to 95a0ad2 Compare June 17, 2020 15:59

timokau changed the title ~~[WIP] Do not directly pass an optimizer to our estimators~~ Require uninitialized optimizers for our learners Jun 17, 2020

timokau marked this pull request as ready for review June 17, 2020 16:01

timokau requested a review from kiudee June 17, 2020 16:02

timokau modified the milestones: 1.3, 2.0 Jun 17, 2020

kiudee previously approved these changes Jun 24, 2020

View reviewed changes

timokau dismissed kiudee’s stale review via 14d05bf June 26, 2020 13:21

Pin keras to <2.4

702bbcc

Newer keras versions delegate to tf.keras and therefore need tf2. See https://github.com/keras-team/keras/releases/tag/2.4.0.

timokau force-pushed the no-direct-optimizer branch from 14d05bf to 702bbcc Compare June 26, 2020 13:36

kiudee approved these changes Jun 27, 2020

View reviewed changes

timokau merged commit 8c85937 into kiudee:master Jul 1, 2020

timokau deleted the no-direct-optimizer branch July 1, 2020 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require uninitialized optimizers for our learners #119

Require uninitialized optimizers for our learners #119

timokau commented May 22, 2020 •

edited

Loading

timokau commented May 26, 2020

codecov bot commented May 26, 2020 •

edited

Loading

kiudee commented May 30, 2020 •

edited

Loading

kiudee commented May 30, 2020

timokau commented May 30, 2020

kiudee commented May 30, 2020

timokau commented Jun 5, 2020

timokau Jun 5, 2020

timokau commented Jun 5, 2020

timokau commented Jun 5, 2020

kiudee commented Jun 8, 2020

timokau commented Jun 9, 2020

timokau commented Jun 9, 2020

kiudee commented Jun 9, 2020

timokau commented Jun 10, 2020

timokau commented Jun 11, 2020

timokau commented Jun 14, 2020

timokau commented Jun 14, 2020

timokau commented Jun 17, 2020

timokau commented Jun 17, 2020

kiudee left a comment

timokau commented Jun 24, 2020

timokau commented Jun 24, 2020

kiudee commented Jun 25, 2020

timokau commented Jun 26, 2020

timokau commented Jun 27, 2020

Require uninitialized optimizers for our learners #119

Require uninitialized optimizers for our learners #119

Conversation

timokau commented May 22, 2020 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Does this close/impact existing issues?

Types of changes

Checklist:

timokau commented May 26, 2020

codecov bot commented May 26, 2020 • edited Loading

Codecov Report

kiudee commented May 30, 2020 • edited Loading

kiudee commented May 30, 2020

timokau commented May 30, 2020

kiudee commented May 30, 2020

timokau commented Jun 5, 2020

timokau Jun 5, 2020

Choose a reason for hiding this comment

timokau commented Jun 5, 2020

timokau commented Jun 5, 2020

kiudee commented Jun 8, 2020

timokau commented Jun 9, 2020

timokau commented Jun 9, 2020

kiudee commented Jun 9, 2020

timokau commented Jun 10, 2020

timokau commented Jun 11, 2020

timokau commented Jun 14, 2020

timokau commented Jun 14, 2020

timokau commented Jun 17, 2020

timokau commented Jun 17, 2020

kiudee left a comment

Choose a reason for hiding this comment

timokau commented Jun 24, 2020

timokau commented Jun 24, 2020

kiudee commented Jun 25, 2020

timokau commented Jun 26, 2020

timokau commented Jun 27, 2020

timokau commented May 22, 2020 •

edited

Loading

codecov bot commented May 26, 2020 •

edited

Loading

kiudee commented May 30, 2020 •

edited

Loading