-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require uninitialized optimizers for our learners #119
Conversation
a4e49c7
to
536d2e2
Compare
Rebased. |
Codecov Report
@@ Coverage Diff @@
## master #119 +/- ##
==========================================
+ Coverage 59.96% 60.08% +0.12%
==========================================
Files 116 116
Lines 7658 7662 +4
==========================================
+ Hits 4592 4604 +12
+ Misses 3066 3058 -8
|
I took a look at what skorch (a scikit-learn wrapper for PyTorch) is doing. There solution can be found here: |
One big consideration here is hyperparameter optimization. The parameters of an optimizer (e.g. learning rate) are often optimized jointly with other parameters of the learner. So an important use-case is for instance that RandomSearchCV is run on the model. {
'module__n_hidden_layers': ...,
'module__n_hidden_units': ...,
'criterion__param1': ...,
'optimizer__learning_rate': ...,
} And the hierarchy there can be nested. |
The
Of course, I don't know why I didn't think of that. That's a very good reason not to use the initializer-function approach I demonstrated in this PR. Good thing I waited for feedback first.
It looks to me like the If that was more easily possible, you could also define different sets of hyperparameters for different optimizers and then hierarchically pick an optimizer first, then pick the hyperparameters for the optimizer. Still, the design of scikit-learn combined with the fact that it seems to be the standard seems like a strong argument in favor of |
Indeed, that also felt very hacky to me, too. |
536d2e2
to
556b75d
Compare
I have pushed a proof of concept. It doesn't actually quite pass the tests yet, but it shows two issues with this approach when specifying default optimizer arguments:
The first one could probably be worked around by centralizing our defaults (which we should do anyways I think), but the second one is a bit ugly. |
@@ -85,6 +88,9 @@ def __init__( | |||
kernel_initializer=kernel_initializer, | |||
activation=activation, | |||
optimizer=optimizer, | |||
optimizer__lr=optimizer__lr, | |||
optimizer__momentum=optimizer__momentum, | |||
optimizer__nesterov=optimizer__nesterov, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I am talking about.
I see some possible solutions:
I think all in all, the most elegant approach would be to fall back to either the lambda-parameter or a tuple Are there other options I'm not seeing? |
If we want to keep the parameters, (3) is probably best (we could also chose a string instead of |
I agree that the cleanest solution would be (1) here. The only reason I see why we could provide different defaults is that we think that these work better out of the box. |
So should I go ahead then and remove the |
If we want to keep |
I would say we can leave all of those defaults to the corresponding optimizers. Usually, you need to adapt the parameters to your specific problem anyways. |
As discussed in kiudee#119. The reasoning is that we don't have good reason to override those, the library user will likely have to tune them anyways (or use a different optimizer altogether). At the same time, they make the design proposed in kiudee#119 (passing uninitialized optimizers and their parameters separately) more difficult.
As discussed in kiudee#119. The reasoning is that we don't have good reason to override those, the library user will likely have to tune them anyways (or use a different optimizer altogether). At the same time, they make the design proposed in kiudee#119 (passing uninitialized optimizers and their parameters separately) more difficult.
I moved that step to #142. |
ec39655
to
e90699d
Compare
I started with the transition, but there is still quite a bit of work to be done. Still WIP. |
The testsuite passes now. I had to disable one test though where the reason for failure was not obvious ( |
4dc2376
to
96518a4
Compare
The |
d1a4c36
to
57a3490
Compare
An initialized optimizer is a tensorflow object, which (at least in graph mode in tf1) is not deepcopy-able. Even if we were able to deeocopy it, we probably wouldn't want to since it contains state. Scikit-learn needs to be able to deepcopy an estimators arguments so that it can create copies and derivatives of it. Instead we require the uninitialized optimizer and its parameters to be passed to our learners separately. The learner can then initialize the optimizer as needed.
57a3490
to
95a0ad2
Compare
Ready for review. This is a breaking change. |
Switched the milestone following semver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Since we now break backward compatibility, but still might want to backport patches for 1.x.y, we have to think about establishing legacy branches, which travis-ci also deploys from.
Thanks for the review!
Yes, that'd be something to explore. Branches could always be created retroactively though once we actually have to ship some fix. |
Speaking of travis, there seems to be something wrong with the github integration. The check in github is still marked as in progress, but when clicking on "Details" travis tells you that its long finished. Is it possible to restart the job? |
After restarting the job, it fails now with:
|
Newer keras versions delegate to tf.keras and therefore need tf2. See https://github.com/keras-team/keras/releases/tag/2.4.0.
14d05bf
to
702bbcc
Compare
I have pinned keras and travis is happy. Please have another look. |
It turns out that the I think this is still good to merge. The test is likely just overly restrictive. |
Description
An initialized optimizer is a tensorflow object, which (at least in
graph mode in tf1) is not deepcopy-able. Even if we were able to
deeocopy it, we probably wouldn't want to since it contains state.
Scikit-learn needs to be able to deepcopy an estimators arguments so
that it can create copies and derivatives of it.
Instead we require the uninitialized optimizer and its parameters to be
passed to our learners separately. The learner can then initialize the
optimizer as needed.
I have only done this forCmpNet
so far. We'd have to do it for all estimators that take an optimizer (i.e. pretty much all of them). Before I do that, I would like some feedback on the general approach. I went with a function with out arguments since that retains the most flexibility. It may be a bit weird to users that want to override the optimizer though. An alternative would be to pass the name of a keras optimizer and a configuration dict. That would be less "weird", but also less flexible since users could no longer write their own custom estimators. @kiudee @prithagupta any thoughts?Motivation and Context
Continuation of #116.
How Has This Been Tested?
Ran the test suite and the pre-commit hooks.
Does this close/impact existing issues?
#94, #116
Types of changes
Checklist: