-
-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gradients with aliased variables #991
Comments
I guess the most disturbing is 5., shouldn't return [1] => (parent = [1],) # this is xt
:(Main.x) => [1]
[1] => [1] # this is x instead? |
putting aliased memory in Params feels like its not going to be ok. |
(never mind, the thing I was missing is scribbling the wrong variables on my napkin) |
For a user define struct we have julia> struct A; x; end
julia> x = rand(2); a = A(x);
julia> Base.sum(a) = sum(a.x)
julia> gradient(() -> sum(a), Params([x])).grads
IdDict{Any, Any} with 1 entry:
[0.573261, 0.457937] => 2-element Fill{Float64}: entries equal to 1.0 while for Adjoint something is wrong julia> xt = Adjoint(x)
1×2 adjoint(::Vector{Float64}) with eltype Float64:
0.573261 0.457937
julia> gradient(() -> sum(xt), Params([x])).grads
IdDict{Any, Any} with 2 entries:
[0.573261, 0.457937] => nothing
:(Main.xt) => 1×2 Fill{Float64}: entries equal to 1.0 |
This seems expected... The grads actually also track global params as a GlobalRef to capture tied variables. |
They make sense, but that doesn't make them right/useful. I tried creating similar problems with explicit params yesterday, and I just could not find an example that didn't work. So rather than spend time fixing this issue, we could transition to explicit params across the ecosystem. |
Seems hard to not consider last example in #991 (comment) a bug. I don't even know precisely why it happens, probably when we hit an
I'm not totally sure explicit gradient is a convenient fit for every situation, I'd like to see a diverse set of use cases where it replaces julia> gradient(x -> sum(a), x)
(nothing,)
julia> gradient(x -> sum(xt), x)
(nothing,) |
I think this illustrates why I consider explicit params better. It's obvious why the last example returned One option is to add some kind of post-processing step where |
For example, something like FluxML/Flux.jl#1592 works out nicely. Similar to the examples above, if we have m1 = Dense(5, 2)
m2 = Dense(transpose(m1.weight))
m = Chain(m1, m2)
dm = gradient(m -> sum(m(ones(Float32, 5))), m)[1] Zygote will see the weight of
The last equation is automatically done by simple optimizers like gradient descent provided you use lazy wrappers like I guess it isn't automatic for complex optimizers that track momentum, etc. But it seems like then we should be handling it on the optimizer side, not the AD. This is where I think explicit params is nicer. What I wrote above is true for implicit params as well (e.g. Example 3 in the main issue) when |
I was trying to figure out how to properly handle and update Flux's layers with tied weights ( FluxML/Flux.jl#1592).
So first of all I wanted to check how Zygote handles aliased objects. Here are 6 examples. Maybe it's all expected and intended but I find the last 3 in particular a bit surprising.
@oxinabox is this what we want?
The text was updated successfully, but these errors were encountered: