-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load flat parameters without mutation or restructure
#2026
Comments
You can try, the simplest way to write this with
(At 1st order, the gradient of getindex makes a zero vector the same size as the whole model, and avoiding that allocation is one of the reasons that there are gradient rules here at all. The mutation of this array of zeros is why getindex at 2nd order is a problem.) In FluxML/Optimisers.jl#54, I do think the logic was broadly correct for 2nd order derivatives -- each order reverses the arrows between flat and structured. What goes wrong is something about exactly how Tangent types are constructed, or perhaps how they are converted to & from Zygote's types. It ought to be possible to straighten that out. |
ps. Zygote has many difficulties with 2nd derivatives, see https://github.com/FluxML/Zygote.jl/labels/second%20order . Some of these are about its handling of Tangent types / translation to & from ChainRules, which may be related. The most reliable option tends to be ForwardDiff over Zygote (as e.g. in |
thank you for your help!
will take a look at them — the second derivative is of a scalar wrt a large parameter vector, which would make reverse mode a good candidate here, right? two other lingering questions:
|
If In practice ForwardDiff is very simple, robust & low-overhead. But it only knows about arrays, and it does not work with BLAS for matrix mult.
I think that mostly you shouldn't worry about this cost, at least to start. Taking the gradient of a model with Zygote will typically allocate as much as a few complete copies. It would not be hard to write a
I expected this to work, as |
haven't checked correctness yet, but happy to report that a second-order using Zygote for both yields error |
It should be possible to use views. It incurs the same |
is there a way to create a copy of an existing model with a new, flat parameters vector, without mutating, and without using
restructure
? the reason for the latter two is because i need the higher order derivatives wrt the flat parameter vector.i've been fumbling about different threads (#1979 was particularly relevant) to find a solution without success, and would appreciate any pointers!
The text was updated successfully, but these errors were encountered: