Add Enzyme extension #377

wsmoses · 2023-09-22T01:46:29Z

requires current Enzyme main for a custom rules fix

wsmoses · 2023-09-22T01:49:37Z

Sample call:

using Enzyme

using LinearSolve, LinearAlgebra

n = 4
A = rand(n, n);
dA = zeros(n, n);
b1 = rand(n);
db1 = zeros(n);
b2 = rand(n);
db2 = zeros(n);

function f(A, b1, b2; alg = LUFactorization())
    prob = LinearProblem(A, b1)

    sol1 = solve(prob, alg)

    s1 = sol1.u
    norm(s1)
end

f(A, b1, b2) # Uses BLAS

Enzyme.autodiff(Reverse, f, Duplicated(A, dA), Duplicated(b1, db1), Duplicated(b2, db2))

@show dA, db1, db2

codecov · 2023-09-22T02:56:38Z

Codecov Report

Merging #377 (89e10df) into main (5a25b7d) will increase coverage by 48.24%.
Report is 6 commits behind head on main.
The diff coverage is 1.11%.

@@             Coverage Diff             @@
##             main     #377       +/-   ##
===========================================
+ Coverage   20.01%   68.25%   +48.24%     
===========================================
  Files          14       24       +10     
  Lines        1444     1884      +440     
===========================================
+ Hits          289     1286      +997     
+ Misses       1155      598      -557

Files Changed	Coverage Δ
ext/LinearSolveEnzymeExt.jl	`0.00% <0.00%> (ø)`
src/init.jl	`57.14% <50.00%> (-17.86%)`	⬇️

... and 22 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ChrisRackauckas · 2023-09-22T15:13:53Z

It looks like this only handles the case of solve, but not solve!. So I presume this case would still not work:

using LinearSolve, LinearAlgebra
# using MKL_jll

n = 100
A = rand(n, n)
b1 = rand(n);
b2 = rand(n);

function f(A, b1, b2; alg = LUFactorization())
    prob = LinearProblem(A, b1)

    linsolve = init(prob, alg)
    sol1 = solve!(linsolve)

    s1 = copy(sol1.u)

    linsolve.b = b2
    sol2 = solve!(linsolve)

    s2 = copy(sol2.u)
    norm(s1 + s2)
end

f(A, b1, b2) # Uses BLAS
f(A, b1, b2; alg=RFLUFactorization()) # Uses loops
f(A, b1, b2; alg=MKLLUFactorization()) # Requires `using MKL_jll`

using Enzyme

dA = zero(A)
db1 = zero(b1)
db2 = zero(b2)
Enzyme.autodiff(Reverse, f, Duplicated(A,dA), 
                Duplicated(b1, db1), Duplicated(b2, db2))

which is EnzymeAD/Enzyme.jl#1065.

I at least added a test for the solve case, but the most common case is on solve! so it would be good to figure out how to do that. It's the same thing except solve!(cache) has cache.A and cache.b1, where cache.isfresh == true means A is already factorized. Is there a way to define the derivative w.r.t. fields of the mutable cache? Or should this be done with a solve!_up type thing?

wsmoses · 2023-09-22T21:35:38Z

Pushed extension for solve! and init now.

While was at it, also added batch mode support.

ChrisRackauckas · 2023-09-22T22:23:52Z

ext/LinearSolveEnzymeExt.jl

+function EnzymeCore.EnzymeRules.augmented_primal(config, func::Const{typeof(LinearSolve.init)}, ::Type{RT}, prob::EnzymeCore.Annotation{LP}, alg::Const; kwargs...) where {RT, LP <: LinearSolve.LinearProblem}
+    res = func.val(prob.val, alg.val; kwargs...)
+    dres = if EnzymeRules.width(config) == 1
+        func.val(prob.dval, alg.val; kwargs...)
+    else
+        (func.val(dval, alg.val; kwargs...) for dval in prob.dval)
+    end
+    return EnzymeCore.EnzymeRules.AugmentedReturn(res, dres, nothing)
+end
+
+function EnzymeCore.EnzymeRules.reverse(config, func::Const{typeof(LinearSolve.init)}, ::Type{RT}, cache, prob::EnzymeCore.Annotation{LP}, alg::Const; kwargs...) where {RT, LP <: LinearSolve.LinearProblem}
+    return (nothing, nothing)
+end


Why is this one required? It seems like it doesn't do much?

Init hits that global variable stuff, so we need a rule for corresponding shadow initialization.

ChrisRackauckas · 2023-09-22T22:27:24Z

While was at it, also added batch mode support.

What in here was required for batch mode support?

ChrisRackauckas · 2023-09-22T22:27:55Z

ext/LinearSolveEnzymeExt.jl

+        (dr.u for dr in dres)
+    end
+
+    cache = (copy(linsolve.val.A), res, resvals)


Is this copy necessary?

wsmoses · 2023-09-22T22:28:18Z

Not specializing to just duplicated but also supporting batchduplicated, which has dval as a tuple of shadows

ChrisRackauckas · 2023-09-22T22:29:46Z

As a tuple, does that have an issue scaling to say batch of a 100 or 1000 things?

wsmoses · 2023-09-22T22:30:34Z

For conservative correctness yes. A may be modified between the forward and reverse pass. The overwritten set of bools says if the outermost struct pointer is overwritten and has no information about internal members being overwritten. As Julia and other Alias analysis is improved (or we have an ImmutableArray type or something), this can be elided in the future.

…

On Fri, Sep 22, 2023 at 5:28 PM Christopher Rackauckas < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In ext/LinearSolveEnzymeExt.jl <#377 (comment)>: > + + if EnzymeRules.width(config) == 1 + dres.u .= 0 + else + for dr in dres + dr.u .= 0 + end + end + + resvals = if EnzymeRules.width(config) == 1 + dres.u + else + (dr.u for dr in dres) + end + + cache = (copy(linsolve.val.A), res, resvals) Is this copy necessary? — Reply to this email directly, view it on GitHub <#377 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTUXH63K3U4YYGH6FJEALX3YGHPANCNFSM6AAAAAA5CMNQWY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ChrisRackauckas · 2023-09-22T22:31:21Z

ext/LinearSolveEnzymeExt.jl

+    end
+
+    for (dA, db, dy) in zip(dAs, dbs, dys)
+        invprob = LinearSolve.LinearProblem(transpose(A), dy)


In the forward pass the matrix A is factorized, so in theory we don't need to factorize it again, just transpose A from the forward pass. Is there a way to grab that?

wsmoses · 2023-09-22T22:32:25Z

It supports being used in arbitrary sizes. In practice of course some sizes could be better than others. Eg for vectorization sake a power of two. Likewise, if a computation can be reused for all batch elements that could improve perf. Eg if transpose(A) generated a new matrix and not a view we could do that once for all batches.

…

On Fri, Sep 22, 2023 at 5:29 PM Christopher Rackauckas < ***@***.***> wrote: As a tuple, does that have an issue scaling to say batch of a 100 or 1000 things? — Reply to this email directly, view it on GitHub <#377 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTUXAF3EZFCL2WXMKTA4TX3YGOJANCNFSM6AAAAAA5CMNQWY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

wsmoses · 2023-09-22T22:33:37Z

Sure, if you know a better set of things to cache, we can choose those instead. I don’t know much about the internals of solve so I went for this form.

…

On Fri, Sep 22, 2023 at 5:31 PM Christopher Rackauckas < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In ext/LinearSolveEnzymeExt.jl <#377 (comment)>: > + end + + dAs = if EnzymeRules.width(config) == 1 + (linsolve.dval.A,) + else + (dval.A for dval in linsolve.dval) + end + + dbs = if EnzymeRules.width(config) == 1 + (linsolve.dval.b,) + else + (dval.b for dval in linsolve.dval) + end + + for (dA, db, dy) in zip(dAs, dbs, dys) + invprob = LinearSolve.LinearProblem(transpose(A), dy) In the forward pass the matrix A is factorized, so in theory we don't need to factorize it again, just transpose A from the forward pass. Is there a way to grab that? — Reply to this email directly, view it on GitHub <#377 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTUXBE5PMQ2DDPQRP7SSTX3YGUJANCNFSM6AAAAAA5CMNQWY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ChrisRackauckas · 2023-09-22T22:38:20Z

The key that I'm pointing out here is similar to the top of https://docs.sciml.ai/LinearSolve/stable/tutorials/caching_interface/. But here, what solve! is doing is solving:

_A = lu!(A)
_A \ b1

and then the backpass is:

_At = lu!(A')
_At \ db1

but we also have that (essentially) _At = _A', or at least it can be computed in O(n) time, whereas a factorization is O(n^3) and thus lu! is one of the most expensive operations.

So what I'm wondering is if it's safe to assume that linsolve is the same linsolve object from the forward pass, or if it may have been further mutated.

wsmoses · 2023-09-22T23:43:46Z

The key that I'm pointing out here is similar to the top of https://docs.sciml.ai/LinearSolve/stable/tutorials/caching_interface/. But here, what solve! is doing is solving:
_A = lu!(A)

_A \ b1
and then the backpass is:
_At = lu!(A')

_At \ db1
but we also have that (essentially) _At = _A', or at least it can be computed in O(n) time, whereas a factorization is O(n^3) and thus lu! is one of the most expensive operations.

So what I'm wondering is if it's safe to assume that linsolve is the same linsolve object from the forward pass, or if it may have been further mutated.

It's the same Julia object, but it's possible it's fields may have been modified. If it's immutable, then it's the same.

wsmoses · 2023-09-22T23:44:58Z

Even if it's overwritten, however, you can still add whatever is relevant from he LU into the cache and use that as a starting point

ChrisRackauckas · 2023-09-23T03:33:32Z

Awesome, I'll leave that as a follow-up, no need to handle it in this PR. But the tests do need to get fixed.

ChrisRackauckas · 2023-09-24T02:10:56Z

The transpose of the factorization is the factorization of the transpose:

using LinearAlgebra
A = rand(4,4)
luA = lu(A)

At = transpose(A)
luAt = lu(At)

b = rand(4)

x  = A \ b
x2 = A' \ b
x3 = luA \ b
x4 = luAt \ b
x5 = luA' \ b

x ≈ x3
x2 ≈ x4 ≈ x5

Confirmed from https://web.mit.edu/18.06/www/Spring17/Transposes.pdf. We can use this to generalize and optimize a bit.

wsmoses · 2023-09-24T02:34:17Z

ext/LinearSolveEnzymeExt.jl

+
+function EnzymeCore.EnzymeRules.reverse(config, func::Const{typeof(LinearSolve.solve!)}, ::Type{RT}, cache, linsolve::EnzymeCore.Annotation{LP}; kwargs...) where {RT, LP <: LinearSolve.LinearCache}
+    y, dys = cache
+    _linsolve = linsolve.val


This is still wrong, because linsolve still couldve been overwritten from forward to reverse. You need to cache it.

okay was just about to ask that, thanks. I think with that this may be completed. Though check the batch syntax in the test: the test still errors with BatchDuplicated and I'm not sure what to do there.

what is the error log from?

ERROR: TypeError: in ccall argument 6, expected Tuple{Float64, Float64}, got a value of type Float64 Stacktrace: [1] macro expansion @ C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\compiler.jl:9774 [inlined] [2] enzyme_call @ C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\compiler.jl:9452 [inlined] [3] CombinedAdjointThunk @ C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\compiler.jl:9415 [inlined] [4] autodiff @ C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\Enzyme.jl:213 [inlined] [5] autodiff @ C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\Enzyme.jl:236 [inlined] [6] autodiff(::ReverseMode{false, FFIABI}, ::typeof(f), ::BatchDuplicated{Matrix{Float64}, 2}, ::BatchDuplicated{Vector{Float64}, 2}) @ Enzyme C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\Enzyme.jl:222 [7] top-level scope @ c:\Users\accou\.julia\dev\LinearSolve\test\enzyme.jl:36

Oh thats an easy one [which we sohuld fix]. You can't use an active return right now in batch mode (which also makes little sense here since you'd back propagate the same value to each). Just wrap that func in a closure that stores it to a vector or something

Makes sense, yeah the test was a bit dumb but just a quick sanity check 😓. Fixing that gives:

ERROR: Enzyme execution failed. Enzyme: Augmented forward pass custom rule Tuple{EnzymeCore.EnzymeRules.ConfigWidth{2, true, true, (false, false, false)}, Const{typeof(init)}, Type{BatchDuplicated{LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LUFactorization{RowMaximum}, LU{Float64, Matrix{Float64}, Vector{Int64}}, IdentityOperator, IdentityOperator, Float64, Bool}, 2}}, BatchDuplicated{LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, 2}, Const{LUFactorization{RowMaximum}}} return type mismatch, expected EnzymeCore.EnzymeRules.AugmentedReturn{LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LUFactorization{RowMaximum}, LU{Float64, Matrix{Float64}, Vector{Int64}}, IdentityOperator, IdentityOperator, Float64, Bool}, Tuple{LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LUFactorization{RowMaximum}, LU{Float64, Matrix{Float64}, Vector{Int64}}, IdentityOperator, IdentityOperator, Float64, Bool}, LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LUFactorization{RowMaximum}, LU{Float64, Matrix{Float64}, Vector{Int64}}, IdentityOperator, IdentityOperator, Float64, Bool}}, Any} found EnzymeCore.EnzymeRules.AugmentedReturn{LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LUFactorization{RowMaximum}, LU{Float64, Matrix{Float64}, Vector{Int64}}, IdentityOperator, IdentityOperator, Float64, Bool}, Base.Generator{Tuple{LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}}, LinearSolveEnzymeExt.var"#2#5"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Const{typeof(init)}, Const{LUFactorization{RowMaximum}}}}, Tuple{Base.Generator{Base.Generator{Tuple{LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}}, LinearSolveEnzymeExt.var"#2#5"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Const{typeof(init)}, Const{LUFactorization{RowMaximum}}}}, LinearSolveEnzymeExt.var"#3#6"}, Base.Generator{Base.Generator{Tuple{LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, LinearProblem{Nothing, true, Matrix{Float64}, Vector{Float64}, SciMLBase.NullParameters, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}}, LinearSolveEnzymeExt.var"#2#5"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Const{typeof(init)}, Const{LUFactorization{RowMaximum}}}}, LinearSolveEnzymeExt.var"#4#7"}}} Stacktrace: [1] #solve#5 @ C:\Users\accou\.julia\dev\LinearSolve\src\common.jl:193 [2] solve @ C:\Users\accou\.julia\dev\LinearSolve\src\common.jl:190 [3] #fbatch#207 @ c:\Users\accou\.julia\dev\LinearSolve\test\enzyme.jl:39 [4] fbatch @ c:\Users\accou\.julia\dev\LinearSolve\test\enzyme.jl:36 [5] fbatch @ c:\Users\accou\.julia\dev\LinearSolve\test\enzyme.jl:0 Stacktrace: [1] throwerr(cstr::Cstring) @ Enzyme.Compiler C:\Users\accou\.julia\packages\Enzyme\VS5jo\src\compiler.jl:3066

ChrisRackauckas · 2023-09-24T03:08:56Z

The solving twice tests are a bit odd:

julia> db1
4-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0

julia> db2
4-element Vector{Float64}:
  2.1215949279204196
 -3.7095838683317943
 -1.2286715744423384
  5.967859589815037

It doubles db2 and has db1 = 0. I think it's because the solve!(linsolve).u aliases between the two. The forward pass is fine because of the copy, but the Enzyme rule likely needs to copy something as well?

ChrisRackauckas · 2023-09-24T12:39:56Z

We can skip over that last test to merge, but do you know why that one algorithm would be treated so differently by Enzyme? I would've thought it didn't care if we're capturing stuff in rules, but it treats this algorithm particularly differently:

https://github.com/SciML/LinearSolve.jl/actions/runs/6290016689/job/17077077461?pr=377#step:6:807

ChrisRackauckas force-pushed the master branch 2 times, most recently from 004eb9d to 70e8599 Compare September 22, 2023 13:16

wsmoses added 2 commits September 22, 2023 09:26

Add Enzyme extension

bb6d623

Add actual file

9f8d18f

ChrisRackauckas force-pushed the master branch from 70e8599 to 9f8d18f Compare September 22, 2023 13:26

ChrisRackauckas added 3 commits September 22, 2023 10:18

fix typo

391b602

more v1.9

ce7ffc0

add a test for Enzyme rule correctness

a08386d

Extend

9273a20

ChrisRackauckas reviewed Sep 22, 2023

View reviewed changes

add some batch tests

84c5196

wsmoses added 2 commits September 22, 2023 23:43

Fix

bb93d68

Cache before LU in place

9d19db2

simplify test

f9b0784

wsmoses force-pushed the master branch from 04b683a to f9b0784 Compare September 24, 2023 02:22

ChrisRackauckas added 3 commits September 23, 2023 22:30

fix multiple solve handling

3b39753

fix multiple solve handling

cbb5f1d

fix other algorithms

9630121

wsmoses commented Sep 24, 2023

View reviewed changes

ChrisRackauckas added 2 commits September 23, 2023 23:00

getting very close

b0d228d

push batch test updates

c2ad2db

type stable

54f0722

ChrisRackauckas mentioned this pull request Sep 24, 2023

LLVM failed verification in custom batch rule EnzymeAD/Enzyme.jl#1075

Closed

wsmoses added 2 commits September 23, 2023 23:26

fix mutated db

e4f0785

More caching

d69af77

wsmoses force-pushed the master branch from 3d25e6f to d69af77 Compare September 24, 2023 08:40

Remove batch test

be91ba2

Remove Krylov test for now

89e10df

ChrisRackauckas merged commit 1e6150e into SciML:main Sep 24, 2023
15 of 17 checks passed

ChrisRackauckas mentioned this pull request Sep 24, 2023

Fix Enzyme incompatibilities with Krylov methods EnzymeAD/Enzyme.jl#1077

Closed

wsmoses deleted the master branch September 24, 2023 16:01

ChrisRackauckas mentioned this pull request Sep 24, 2023

Fix Enzyme incompatibilities with Krylov methods #381

Open

sharanry mentioned this pull request Nov 5, 2023

Add forward enzyme rules for init and solve #416

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Enzyme extension #377

Add Enzyme extension #377

wsmoses commented Sep 22, 2023 •

edited

Loading

wsmoses commented Sep 22, 2023 •

edited

Loading

codecov bot commented Sep 22, 2023 •

edited

Loading

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023

ChrisRackauckas Sep 22, 2023

wsmoses Sep 22, 2023

ChrisRackauckas Sep 22, 2023

ChrisRackauckas commented Sep 22, 2023

ChrisRackauckas Sep 22, 2023

wsmoses commented Sep 22, 2023

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023 via email

ChrisRackauckas Sep 22, 2023

wsmoses commented Sep 22, 2023 via email

wsmoses commented Sep 22, 2023 via email

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023

wsmoses commented Sep 22, 2023

ChrisRackauckas commented Sep 23, 2023

ChrisRackauckas commented Sep 24, 2023

wsmoses Sep 24, 2023

ChrisRackauckas Sep 24, 2023

wsmoses Sep 24, 2023

ChrisRackauckas Sep 24, 2023

wsmoses Sep 24, 2023

ChrisRackauckas Sep 24, 2023

ChrisRackauckas commented Sep 24, 2023

ChrisRackauckas commented Sep 24, 2023

Add Enzyme extension #377

Add Enzyme extension #377

Conversation

wsmoses commented Sep 22, 2023 • edited Loading

wsmoses commented Sep 22, 2023 • edited Loading

codecov bot commented Sep 22, 2023 • edited Loading

Codecov Report

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisRackauckas commented Sep 22, 2023

Choose a reason for hiding this comment

wsmoses commented Sep 22, 2023

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023 via email

Choose a reason for hiding this comment

wsmoses commented Sep 22, 2023 via email

wsmoses commented Sep 22, 2023 via email

ChrisRackauckas commented Sep 22, 2023

wsmoses commented Sep 22, 2023

wsmoses commented Sep 22, 2023

ChrisRackauckas commented Sep 23, 2023

ChrisRackauckas commented Sep 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisRackauckas commented Sep 24, 2023

ChrisRackauckas commented Sep 24, 2023

wsmoses commented Sep 22, 2023 •

edited

Loading

wsmoses commented Sep 22, 2023 •

edited

Loading

codecov bot commented Sep 22, 2023 •

edited

Loading