Improvements to cholesky rrules #630

sethaxen · 2022-06-11T15:38:44Z

This PR makes a number of improvements to the Cholesky-related rrules:

The rrule for getproperty(::Cholesky, ::Symbol) now returns a Tangent with a factors entry instead of :U or :L (fixes part of Wrong call on cholesky rrule #611)
Use rdiv! instead of BLAS.trsm! (fixes Need a GPU compatible rrule for Cholesky #629)
Support complex numbers and complex PD matrices (also cholesky(::Quaternion) and cholesky(::Diagonal{<:Quaternion}), though this is untested
Fix cholesky(::Number) and cholesky(::Diagonal) for failed factorization
Remove rules for 1-arg methods (the 1-arg methods fall back to the 2-arg methods, so they should be hit anyways)
Remove specialization for Thunk cotangents (just unthunk received cotangents, which is a no-op for Tangent)
Add missing tests

sethaxen · 2022-06-11T15:43:07Z

src/rulesets/LinearAlgebra/factorization.jl

+    else  # C.uplo === 'L'
+        L = C.L
+        L̄ = eltype(L) <: Real ? real(tril(Δfactors)) : tril(Δfactors)
+        mul!(Ā, L', L̄)
+        LinearAlgebra.copytri!(Ā, 'L', true)
+        eltype(Ā) <: Complex && _realifydiag!(Ā)
+        rdiv!(Ā, L)
+        ldiv!(L', Ā)
+    end


Since cholesky doesn't have an uplo argument, we have no way of making Cholesky have an uplo of L, so this branch is unreachable and untestable. Maybe it's better to put a warning here to tell a user to open an issue, because they've found a magical way to reach this branch.

sethaxen · 2022-06-11T15:57:31Z

* [x]  The rrule for `getproperty(::Cholesky, ::Symbol)` now returns a `Tangent` with a `factors` entry instead of `:U` or `:L` (fixes part of [Wrong call on `cholesky` `rrule` #611](https://github.com/JuliaDiff/ChainRules.jl/issues/611))

This could potentially be breaking. e.g. this would break this code in DistributionsAD :
https://github.com/TuringLang/DistributionsAD.jl/blob/44a57e974e386ab576a0251967cf7e57e42c63f7/src/common.jl#L3-L29.

devmotion · 2022-06-11T17:21:34Z

Support [...] PD matrices

Ie cholesky(::AbstractPDMat) will hit an rrule here now? We don't want these to be handled by a generic method but instead AD should just follow and differentiate the optimized implementations in PDMats. We could add opt-outs (there's an open issue for det as well) but these seem still a bit unsatisfying to me - if the upstream rules would be less generic, everything would just work without having to think about AD in PDMats (and having to know about these definitions in CR).

sethaxen · 2022-06-11T17:45:54Z

Support [...] PD matrices

Ie cholesky(::AbstractPDMat) will hit an rrule here now?

Sorry, language was unclear and has been fixed. Previously only real positive definite (not PDMats) matrices were supported. Now we support also complex positive definite matrices. We still have the constraints of strided and diagonal matrices or Symmetric/Hermitian wrappers of them. No types defined in PDMats should hit these rules.

sethaxen · 2022-06-11T20:26:33Z

@Red-Portal can you check that with these rules your issue would be resolved?

Red-Portal · 2022-06-12T06:14:58Z

Hi, just checked, and everything seems good except for the use of copytri! with the conjugate option set to true. This triggers scalar indexing for CuArrays. Zygote's adjoint seems to be good without conjugation. Could we get around it?

sethaxen · 2022-06-12T09:35:01Z

Zygote's adjoint seems to be good without conjugation. Could we get around it?

The adjoint is necessary for the rule to work for complex arrays. Ideally we would have a solution that works for both. We could do this:

copy!(LowerTriangular(Ā), UpperTriangular(Ā)')

but this is quite a bit slower than copytri! for large arrays.

GPUArrays has its own copytri! implementation that is missing two of the options in LinearAlgebra.copytri!, which is why as soon as we use one of those options, we fall back to the one in Base. https://github.com/JuliaGPU/GPUArrays.jl/blob/fc0d327ecc2fd0b3b73427cf6f491591aa096b75/src/host/linalg.jl#L35-L59 This seems like something that should be fixed in GPUArrays.

Red-Portal · 2022-06-12T12:00:34Z

I have opened a PR on GPUArrays addressing the issue.

src/rulesets/LinearAlgebra/factorization.jl

devmotion · 2022-06-14T20:41:20Z

src/rulesets/LinearAlgebra/factorization.jl

-        Ā = BLAS.trsm!('R', 'U', 'C', 'N', one(eltype(Ā)) / 2, U.data, Ā)
+    function cholesky_HermOrSym_pullback(ΔC)
+        Ā = _cholesky_pullback_shared_code(C, unthunk(ΔC))
+        rmul!(Ā, one(eltype(Ā)) / 2)


What's the reason for not using

Suggested change

rmul!(Ā, one(eltype(Ā)) / 2)

rdiv!(Ā, 2)

or

Suggested change

rmul!(Ā, one(eltype(Ā)) / 2)

ldiv!(2, Ā)

? That seems more direct and simpler.

rdiv! performs elementwise divisions, so $n^2$ division operations, whereas rmul! by the reciprocal performs $n^2$ elementwise multiplications and a single division. Division is generally more expensive than multiplication, so this is cheaper e.g.

julia> using BenchmarkTools julia> foo(x, a) = rmul!(copy(x), inv(a)); julia> bar(x, a) = rdiv!(copy(x), a); julia> x = randn(100, 100); julia> @btime foo($x, 2.0); 4.718 μs (2 allocations: 78.17 KiB) julia> @btime bar($x, 2.0); 9.003 μs (2 allocations: 78.17 KiB)

But isn't that something that should be optimized in base? We want to divide by 2, so the natural thing to do would be to use a division operator instead of manually working with eltype.

BTW interestingly the difference seems to be smaller on my computer:

julia> @btime foo($x, 2.0); 4.690 μs (2 allocations: 78.17 KiB) julia> @btime bar($x, 2.0); 7.098 μs (2 allocations: 78.17 KiB) julia> VERSION v"1.7.3"

FWIW, rmul! used this way (to perform a division) is quite common throughout this codebase. e.g. all throughout https://github.com/JuliaDiff/ChainRules.jl/blob/main/src/rulesets/LinearAlgebra/dense.jl and https://github.com/JuliaDiff/ChainRules.jl/blob/main/src/rulesets/LinearAlgebra/lapack.jl. Base Julia itself uses this strategy: https://github.com/JuliaLang/julia/blob/b4eb88a71f8c2d8343b21d8fdd1ec403073a222c/stdlib/LinearAlgebra/src/dense.jl#L1595

So I don't think it's unreasonable to use it here for the extra performance. That being said, this is not the computational bottleneck.

test/rulesets/LinearAlgebra/factorization.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

Red-Portal · 2022-06-14T21:33:35Z

GPUArrays.jl #413 has been merged and I just checked that this PR works fine on the GPU as is.

src/rulesets/LinearAlgebra/factorization.jl

test/rulesets/LinearAlgebra/factorization.jl

oxinabox

LGTM, a few minor things.
Merge when happy

I wonder if we should be overloading getproperty(::Tangent{<:Cholesky}, sym) to so that dX.U still works etc.
Given Zygote loses primal types, and that is our biggest user, it seems nonpressing.
We might want an issues where we think about this.

Co-authored-by: Frames Catherine White <oxinabox@ucc.asn.au>

sethaxen · 2022-06-16T18:28:29Z

Thanks for the review @oxinabox. What are your thoughts on whether this is breaking?

* [x]  The rrule for `getproperty(::Cholesky, ::Symbol)` now returns a `Tangent` with a `factors` entry instead of `:U` or `:L` (fixes part of [Wrong call on `cholesky` `rrule` #611](https://github.com/JuliaDiff/ChainRules.jl/issues/611))
This could potentially be breaking. e.g. this would break this code in DistributionsAD : https://github.com/TuringLang/DistributionsAD.jl/blob/44a57e974e386ab576a0251967cf7e57e42c63f7/src/common.jl#L3-L29.

devmotion · 2022-06-17T06:08:03Z

IMO, as large parts of DistributionsAD, this is a workaround, so I'd be happy if it would be removed. AFAICT it is mainly/only used to define rules for cholesky for Tracker and ReverseDiff. It's a bit horrifying that loading DistributionsAD changes cholesky for these packages, so the sooner it's gone the better I would say.

sethaxen · 2022-06-17T09:46:27Z

So @devmotion would you then say that this PR should be considered non-breaking because DistributionsAD is doing something risky anyways?

devmotion · 2022-06-17T09:54:50Z

Yes, I think DistributionsAD shouldn't hold back this PR. To me it seems the only fix required there will be replacing U in https://github.com/TuringLang/DistributionsAD.jl/blob/48c43f8e8062ba95542330735593b5275117e592/src/common.jl#L10 and https://github.com/TuringLang/DistributionsAD.jl/blob/48c43f8e8062ba95542330735593b5275117e592/src/common.jl#L24 with factors. That is the right thing anyway (for the time being until this stuff is removed completely) since the primal function returns the factors as first element of a tuple. I guess it is just not done currently since otherwise the CR would have returned ZeroTangent (due to the bug fixed in the PR here).

Edit: I opened a PR: TuringLang/DistributionsAD.jl#226

rofinn · 2022-06-20T22:00:24Z

NOTE: This release also breaks Nabla.jl.

https://github.com/invenia/Nabla.jl/runs/6959631070?check_suite_focus=true#step:6:390

Since I'm not familiar enough with either package to clearly identify what broke... I'm gonna temporarily restore the original branch so I can bisect the commits.

rofinn · 2022-06-22T00:34:15Z

Currently relying on piracy, but I've narrowed the specific changes to two specific changes that would probably be easy to re-add to make it non-breaking? invenia/Nabla.jl#217

sethaxen added 16 commits June 10, 2022 23:30

Rewrite getproperty rule to store factors

b543b0e

Work with factors directly

cce75a0

Create tangent with factors

ebe03e0

Simplify and generalize cholesky number rule

352f878

Use default tangent

a3c4aab

Generalize diagonal cholesky to Hermitian

08b0d31

Simplify cholesky(::Diagonal) tests

6fd7d4a

Generalize and simplify cholesky(::StridedMatrix)

1f57176

Fixes for Hermitian matrices

c8237d2

Generalize to complex Hermitian matrices

b563422

Remove unnecessary single-arg rule

a3a0bb9

Reformat

59b2a04

Check that check kwarg correctly passed

c52f7bc

Support failed factorizations

48bc296

Remove specializations for Thunks

648c5a2

Release unnecessary constraints on factors

b4b8f9f

github-actions bot added the needs version bump Version needs to be incremented or set to -DEV in Project.toml label Jun 11, 2022

sethaxen commented Jun 11, 2022

View reviewed changes

sethaxen marked this pull request as ready for review June 11, 2022 16:17

sethaxen added 6 commits June 11, 2022 21:00

Decrease step size

636edac

Check complex cotangent for real primal works

43dd3f5

Fix diagonal rule for failed factorization

183e1d8

Release type constraint of Diagonal

86a2fc1

Refer to real instead off complex

c6497aa

Increment patch number

74a6de0

github-actions bot removed the needs version bump Version needs to be incremented or set to -DEV in Project.toml label Jun 11, 2022

sethaxen requested a review from devmotion June 11, 2022 20:24

sethaxen requested a review from mzgubic June 11, 2022 20:25

Avoid unnecessary copies

1edba1a

Red-Portal mentioned this pull request Jun 12, 2022

add conjugate option to copytri! JuliaGPU/GPUArrays.jl#413

Merged

devmotion reviewed Jun 14, 2022

View reviewed changes

Update src/rulesets/LinearAlgebra/factorization.jl

f33812c

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>