-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient of sum #900
Comments
This is really an issue with Flux, since the gradient is fine. |
I think it is an issue of Zygote, it returns FillArray as the gradient of sum. That is very unfortunate, a lot people around me bump to this bug. https://github.com/FluxML/Zygote.jl/pull/191/files |
But I think it's fine to say that the gradient of |
To my mind, this is perfectly acceptable behaviour -- it's similar in flavour to returning |
When a user writes backward rules, he always assume the adjoint of the output has the same type as the output. Simply because it is impossible to handle various adjoint types. Here if you have a function @cossio 's issue might be related to Flux rather than chain rule. I just want to point out that random output type is a very important source of uncertainties in one's Zygote program. |
Right -- I think this is the crux of the problem. Users should not be making this assumption because Zygote has never promised to provide a gradient of the same type as the argument, nor could it feasibly do so. Probably the documentation should be changed to reflect this. For example, general Perhaps we should consider some additional functionality to "canonicalise" representations. We've actually been discussing implementing that kind of functionality for |
It is good to know someone is taking it seriously. I can see it is hard to unify the adjoint, like the I am also thinking how much performance loss if we force the adjoint to have the same type as the output. Julia language used to return normal array type if you call zero on another type (see issue), but now it also changes the behavior julia> zero([1,2,3]')
1×3 adjoint(::Vector{Int64}) with eltype Int64:
0 0 0 For composite types, I can imagine it requires users' effort to wrap it with the right type. Wondering if there is a fundamental reason for not doing so? Also, I am interested to know if there is a benchmark to show returning the same type sacrifises the performance a lot. I am just curious the cons and pros for doing so, without strong bias toward one over another. |
Are you specifically referring to
I'm not aware of any particular benchmarks -- would be interesting to see some. |
I mean more general types. |
Okay. For more general types this isn't a matter of performance, it's about achieving the correct / desirable behaviour. There are a couple of reasons why you can't use a type to represent its own tangent in general. Consider an arbitrary Firstly, as an AD package author, you're not allowed to define Secondly, it's not always the case that the set of points that a given type can represent is the same as its tangent space. For example, suppose that struct Foo
x::Float64
lb::Float64
ub::Float64
function Foo(x, lb, ub)
if x < lb || x > ub
throw(error("x must be between lb and ub"))
end
return new(x, lb, ub)
end
end I've used an inner construct to constrain the value of Does this make sense? Of course, there are "special cases" where neither of the above isn't true, such as |
Thanks for you reply. Benefit a lot from your answer! In the second point, this example is excellent. I also have such issue in NiLang. This is because the constructor did some checks beyond type information. I think the ideal solution is to add something like |
Well, strictly speaking, you just need to be able to add tangents to make AD work i.e. accumulation on the reverse-pass in reverse-mode. Scaling is an added bonus really. Granted, you could take
I think you actually can do this with some existing hack that people disapprove of. @oxinabox you mentioned this to me a while ago I think. I really don't favour that solution though, since it doesn't make sense to me semantically. For example, I really don't want to have objects floating around that claim to be constrained, but aren't actually constrained. This is why I prefer the edit: in short, while I agree that it's probably technically possible (as in, you could make Julia do it) to represent the tangents / cotangents of any given type by another object of the same type, it feels like a hack. I would rather embrace the fact that the (co)tangents of a given type cannot in general by represented by elements of that type, and design AD systems with that in mind. edit2: there's a separate question as to whether we ought to rename |
Yeah, youcan do it, it is something like
Yeah, sometimes inner constructors are enforcing invarients.
the Attempting to update |
This name is so much more intuitive! I suddenly understand what you meant previously. Now I tend to agree that the solution of using another type might be better. But I have a new question, why not stick to the rule that type
I didn't know this. learnt a new way to hack Julia, cheers! Thank both of you for detailed explaining. Note: |
This is a good point. Have opened an issue :)
Good point. To my mind, the distinction here is performance. Firstly, correctness: I think it's clear that it is, on some level, not incorrect to return a This brings us to performance: my feeling is that there are likely notable performance benefits to allowing the (co)tangent of an Code Complexity: We already allow things like Honestly, I'm torn regarding the usefulness of All this being said, something like this functionality could allow us to have the best of both worlds. To be honest, I think that we probably need it anyway to ensure that rule-writers can get the type that they need to write their rule without having to think too much about all of the types that a particular tangent could possibly take, and once you've got something like this, it matters less if you've got a few different types that could represent a tangent. |
Nice, thanks. If forcing the grdient types being the same as the input is not a proper solution. There are other ways to helping users finding potential bugs of type mismatch. I am thinking about adding a debug option to help user identifing type mismatch. function rrule(::typeof(inv), x::AbstractArray)
Ω = inv(x)
function inv_pullback(ΔΩ)
return NO_FIELDS, -Ω' * ΔΩ * Ω'
end
return Ω, inv_pullback
end We can insert some code for debugging function rrule(::typeof(inv), x::AbstractArray)
Ω = inv(x)
function inv_pullback(ΔΩ)
fs, g = NO_FIELDS, -Ω' * ΔΩ * Ω'
@debug begin
if typeof(g) <: Union{Zero, ...} # special types
pass
elseif ((typeof(x) <: Array || isprimitivetype(typeof(x))) && typeof(x) != typeof(g))
"warn: array/scalar type mismatch: function = inv, input type = $(typeof(x)), gradient type = $(typeof(g))"
elseif (Tangent{typeof(x)} != typeof(g) || field_mismatch(x, g))
"warn: tagent type mismatch: function = inv, input type = $(typeof(x)), gradient type = $(typeof(g)), expecting a tangent type"
end
return fs, g
end
end
return Ω, inv_pullback
end When a user run the code in debug mode, he will see some useful information to help debugging. Does this make sense? |
Throws the following error:
The text was updated successfully, but these errors were encountered: