Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit gradient failing with matrices #137

Closed
theogf opened this issue Jul 17, 2020 · 7 comments · Fixed by #141
Closed

Implicit gradient failing with matrices #137

theogf opened this issue Jul 17, 2020 · 7 comments · Fixed by #141

Comments

@theogf
Copy link
Member

theogf commented Jul 17, 2020

Here is a MWE:

using KernelFunctions, Flux, LinearAlgebra
k = transform(SqExponentialKernel(), 2.0)
ps = Flux.params(k)
X = rand(10, 1); x = vec(X)
A = rand(10, 10)
g = gradient(ps) do
  tr(kernelmatrix(k, X, obsdim = 1) * A)
end
g[ps[1]] == nothing

g2 = gradient(k) do k
  tr(kernelmatrix(k, X, obsdim = 1) * A)
end
g2[1].transform.s != nothing

g3 = gradient(ps) do
  tr(kernelmatrix(k, x) * A)
end
g3[ps[1]] != nothing

I think this is related to FluxML/Zygote.jl#692
Any idea on how to solve this @willtebbutt ? It is probably connected to the ColVecs structure

@willtebbutt
Copy link
Member

This is outside my area of expertise I'm afraid.

@theogf
Copy link
Member Author

theogf commented Jul 17, 2020

I think there is a general issue with the adjoint of ColVecs/RowVecs, do you know who could help with it?

@willtebbutt
Copy link
Member

willtebbutt commented Jul 17, 2020

Have you tried wrapping everything in a let block? Globals are hard, so it's possible that Zygote is buggy w.r.t. them.

edit: I'm not sure exactly how the ColVecs etc pullbacks would affect this. If they work under usual circumstances, I would expect them to work here 🤷‍♂️

@theogf
Copy link
Member Author

theogf commented Jul 17, 2020

You mean this ?

let kernel = k
  g = gradient(ps) do
    tr(kernelmatrix(kernel, X, obsdim = 1) * A)
  end
end

@willtebbutt
Copy link
Member

Nah, just

using KernelFunctions, Flux, LinearAlgebra

let

k = transform(SqExponentialKernel(), 2.0)
ps = Flux.params(k)
X = rand(10, 1); x = vec(X)
A = rand(10, 10)
g = gradient(ps) do
  tr(kernelmatrix(k, X, obsdim = 1) * A)
end
g[ps[1]] == nothing

g2 = gradient(k) do k
  tr(kernelmatrix(k, X, obsdim = 1) * A)
end
g2[1].transform.s != nothing

g3 = gradient(ps) do
  tr(kernelmatrix(k, x) * A)
end
g3[ps[1]] != nothing

end

@theogf
Copy link
Member Author

theogf commented Jul 17, 2020

Nope same behavior

@theogf
Copy link
Member Author

theogf commented Jul 26, 2020

I found a fix \o/ ! I think we should avoid relying on Base.map, removing it and replacing it directly by _map solves the problem.
I think this is connected to FluxML/Zygote.jl#522 which you and Mike already looked at apparently.
Also it looks like the adjoints for ColVecs and RowVecs are not necessary. I will make a PR with a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants