Custom stacking for StaticArrays #564

gdalle · 2024-10-10T14:27:46Z

Partial answer to #563

Base.stack of staticarrays creates non-static array JuliaArrays/StaticArrays.jl#1272

Versions

Bump DI to v0.6.10

DI core

Customize the stacking functions used to turn tuples of arrays t into Jacobian/Hessian blocks

DI extensions

StaticArrays (new extension):

Implement better column stacking for t::NTuple{_,<:SArray} by using hcat(map(vec, t)...) instead of stack(vec, t). Benchmarks show a x2 speedup on the example from Add direct Enzyme support SciML/NonlinearSolve.jl#476, and now each block is an SMatrix.

codecov-commenter · 2024-10-10T14:33:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.00%. Comparing base (3698dbe) to head (71e1f45).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #564   +/-   ##
=======================================
  Coverage   98.00%   98.00%           
=======================================
  Files         106      108    +2     
  Lines        4808     4812    +4     
=======================================
+ Hits         4712     4716    +4     
  Misses         96       96

Flag	Coverage Δ
DI	`98.66% <100.00%> (+<0.01%)`	⬆️
DIT	`96.68% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mcabbott · 2024-10-10T15:48:02Z

This PR adds an extension in order to have these two paths,

function DI.stack_vec_col(t::NTuple{B,<:SArray}) where {B}
     return hcat(map(vec, t)...)
 end
 
 stack_vec_col(t::NTuple) = stack(vec, t; dims=2)

Is this clearly better than just always using hcat? The splat would be terrible with long vectors but you only support a tuple anyway, so is should be free. For example:

julia> tm = ntuple(i -> fill(i,2,3), 10);

julia> @btime stack(vec, $tm; dims=1);
  142.494 ns (12 allocations: 912 bytes)

julia> @btime hcat(map(vec, $tm)...);
  101.484 ns (12 allocations: 912 bytes)

gdalle · 2024-10-10T15:51:23Z

It's clearly better in the case of static arrays:

julia> using StaticArrays, BenchmarkTools

julia> ts = ntuple(i -> @SMatrix(ones(2,3)), 10);

julia> @btime stack(vec, $ts; dims=2);
  311.713 ns (1 allocation: 544 bytes)

julia> @btime hcat(map(vec, $ts)...);
  7.246 ns (0 allocations: 0 bytes)

gdalle · 2024-10-10T15:55:20Z

Sorry I read your comment the wrong way. I did some more thorough benchmarks in this issue and stack does come out on top for usual arrays it seems?

mcabbott · 2024-10-10T16:13:50Z

Wow that's quite hard to decode. (Probably I should have used dims=2 above, slightly faster codepath, same as no dims). But indeed stack is faster than hcat on that example:

julia> tm = ntuple(i -> rand(100, 100), 10);

julia> res1 = @btime stack(vec, $tm);  # called "bad stack, function" at link
  14.250 μs (13 allocations: 781.64 KiB)

julia> res2 = @btime stack(vec, $tm, dims=2);  # version with dims as in PR 
  14.250 μs (13 allocations: 781.64 KiB)

julia> res3 = @btime hcat(map(vec, $tm)...);  # called "good stack, function" at link
  64.416 μs (13 allocations: 781.64 KiB)

julia> res1 == res2 == res3
true

Whether that's true in general I don't know, perhaps I'm somewhat surprised. And if it is, whether it's worth the complexity is your call.

Note that vec(::Matrix) allocates. If you care a lot about this operation you might also consider lazier reshape:

julia> myvec(x) = Base.ReshapedArray(x, (length(x),), ());  # not sure 3rd argument is optimal!

julia> res4 = @btime hcat(map(myvec, $tm)...);  # faster
  38.792 μs (3 allocations: 781.33 KiB)

julia> @btime stack(myvec, $tm);  # slower
  22.084 μs (3 allocations: 781.33 KiB)
  
julia> res1 == res4
true

gdalle · 2024-10-10T16:20:25Z

Whether that's true in general I don't know, perhaps I'm somewhat surprised. And if it is, whether it's worth the complexity is your call.

In the current state of things, stack is used everywhere. This very short PR switches to hcat only for SArrays, which is definitely better. There's a question of whether we should switch everywhere, but I'll put it on the back burner for now since the answer is less obvious from the benchmarks.

By the way, any ideas on how to implement stack(...; dims=1) with cat operations? For columns it was easy because hcat turns vectors into matrices, but vcat just makes longer vectors. Would you transpose first, or do yet another trick?

Note that vec(::Matrix) allocates. If you care a lot about this operation you might also consider lazier reshape:

Wow, I didn't know that. I naively thought that a Matrix was just a vector in a trench coat, hence this would be free.
Any reason not to use reshape in your suggestion?

gdalle added 14 commits October 10, 2024 11:13

Improve type stability tests and benchmarking

9be236a

Remove first_order and second_order

387c01d

Docs

e6b9d33

Zero allocs

2ea0c77

Fixes

60b1a15

Call count

371eadf

Fix

a9d986b

Fix

9fb00b6

Add count calls

dddf71e

Default count calls

544a63d

Fix

cd6e1f6

Custom stacking for StaticArrays

fc39679

Merge remote-tracking branch 'origin/main' into gd/mapred

ed03f8b

Bump

1b1f9ab

gdalle added 2 commits October 10, 2024 17:17

Clearer modulo

85d3195

Woops

720be01

gdalle force-pushed the gd/mapred branch from ebbf6a8 to 720be01 Compare October 10, 2024 15:58

Undo mo1

71e1f45

gdalle merged commit 7607ec2 into main Oct 10, 2024
43 checks passed

gdalle deleted the gd/mapred branch October 10, 2024 16:46

This was referenced Oct 10, 2024

Add direct Enzyme support SciML/NonlinearSolve.jl#476

Closed

vec(::Matrix) allocates #570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom stacking for StaticArrays #564

Custom stacking for StaticArrays #564

gdalle commented Oct 10, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading

mcabbott commented Oct 10, 2024

gdalle commented Oct 10, 2024

gdalle commented Oct 10, 2024

mcabbott commented Oct 10, 2024

gdalle commented Oct 10, 2024 •

edited

Loading

Custom stacking for StaticArrays #564

Custom stacking for StaticArrays #564

Conversation

gdalle commented Oct 10, 2024 • edited Loading

codecov-commenter commented Oct 10, 2024 • edited Loading

Codecov Report

mcabbott commented Oct 10, 2024

gdalle commented Oct 10, 2024

gdalle commented Oct 10, 2024

mcabbott commented Oct 10, 2024

gdalle commented Oct 10, 2024 • edited Loading

gdalle commented Oct 10, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading

gdalle commented Oct 10, 2024 •

edited

Loading