-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make eps a parameter of optimisers #1819
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only other comment is to call the field epsilon
since that's the name we chose in Optimisers.jl (to avoid confusion with the function of the same name).
9a21a7a
to
bb0e1e7
Compare
@darsnack I didn't want to change the value of eps Flux is currently using, so as to not change behavior. So I prefer not to use Using Also note that I renamed eps -> epsilon following your other suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I didn't realize the type was hard coded to Float64
.
Merge? |
bors r+ |
Build failed: |
Failure seems related to #1808 (@ToucheSir)? |
Yes, #1804 specifically. I still haven't been able to repro the issue locally (will try replicating the Buildkite env when I have time in a couple of days). Feel free to |
Merge this? buildkite failure looks unrelated to this PR, and all other tests pass. |
@CarloLucibello @DhairyaLGandhi I think an org owner needs to merge. |
This was a bit too wide a change for the api and can break a bunch of code... |
How so? It adds an optional positional argument at the end that has the same default value as before. |
It adds a field to the optimisers, any code using the default "struct" constructors would break. |
Sorry, I should've caught that after #1778. Shall we quarantine both off for 0.13? |
#1778 is fine still because the disruption is limited to one type with a clear way of making a case for a non breaking type. |
Sorry I didn't realize that. To be clear, what you mean is that a call like:
no longer works, because it now needs to be replaced by:
right? A possible solution is to make another PR (overwriting this one), which makes ϵ an optional keyword argument. Would that work?
|
Right
…On Wed, Dec 29, 2021, 23:15 cossio ***@***.***> wrote:
Sorry I didn't realize that. To be clear, what you mean is that a call
like:
RMSProp(η, ρ, IdDict())
no longer works, because it now needs to be replaced by:
RMSProp(η, ρ, ϵ, IdDict())
right?
A possible solution is to make another PR (overwriting this one), which
makes ϵ an optional keyword argument. Would that work?
RMSProp(η, ρ, IdDict(); ϵ = ϵ)
—
Reply to this email directly, view it on GitHub
<#1819 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJOZVVLO32YVQXESJYSF6JLUTNCNFANCNFSM5K4BPGPA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: <FluxML/Flux
.***@***.***>
|
Well, I'm still not sure if adding struct fields constitutes a SemVer break, as (de)serialization (among other things) is affected. If anyone is in the know about this, please do leave a comment. |
@DhairyaLGandhi thanks for the explanation, sorry I missed that. How frequently is the |
It's got less to do with frequency and more about necessity and API compactness. In cases where we need literal copies of gradients (for synchronisation for example) over different models it's helpful to store states and gradients as the same type as literal Dicts for simplicity, and making users add in an epsilon fiels (especially when not needed in the optimisation step) is unnecessary. |
I get that, but I'm saying if the frequency is low enough (i.e. a handful of users), then it's fair to ask that handful to add the new field into their code on a breaking release. Anyone manually passing in the |
I don't see why making epsilon a keyword would complicate the API? It would be an optional keyword that most users need not even notice. |
Passing in Dict explicitly is more than passing a bunch of arguments though, it is arguably more related to the use case of the optimisers than the value of an epsilon. It's already very rare to need to change epsilon. |
I meant that if you are manually passing in the dict, then you are already writing |
But even a long list of arguments to the constructor maintains a rational order (going by what is likely to be used and requires changing for the majority of use cases), which epsilon in the position in the pr does not since it comes before the storage. It should be after. |
Sure, a PR to fix the order is okay for me. |
I would favour grouping the hparams together. Perhaps if the IdDict was the first parameter in the default constructor? |
I'd prefer
over
In fact, having all hyper-params as keyword arguments wouldn't be bad I think,
but that's another matter. |
For |
Any decision here? |
I prefer the first proposal in #1819 (comment). Since |
I agree. I can make a PR. |
Makes epsilon a parameter of each optimiser. Closes #1818.
PR Checklist