Broadcast views to avoid allocations. #353

maleadt · 2018-08-31T07:12:00Z

Improves #351, ref JuliaLang/METADATA.jl#17515 (comment)

KristofferC · 2018-08-31T14:42:08Z

Can the zero allocations be tested?

jrevels · 2018-08-31T14:51:21Z

This unfortunately doesn't seem to do the trick.

Here's a simple benchmark (it only tests one of these seed! methods, but should be representative):

using ForwardDiff, BenchmarkTools
x = rand(1000);
cfg = ForwardDiff.GradientConfig(nothing, x);
@btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));

Before #351:

julia> @btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));
  12.895 ns (0 allocations: 0 bytes)

After #351:

julia> @btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));
  1.078 μs (16 allocations: 3.80 KiB)

After this PR:

julia> @btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));
  1.032 μs (16 allocations: 3.69 KiB)

Unfortunately, I think the best thing to do at the moment is to revert (EDIT: this part of) #351 so we can tag ForwardDiff v0.9.0 (which is needed for people to get a version of ForwardDiff working on Julia v1.0). Then we can take our time trying to get this working without regressions.

Sorry again, I really should have caught this when reviewing #351 originally...

This reverts commit 19003af. See #353 (comment)

maleadt · 2018-09-06T12:53:39Z

Turns out the overhead came from the tuple slicing. Avoiding that, I can get the timings down most of the way, but there's still some overhead due to broadcast + view allocation:

seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, seed::Partials{N,V}) where {T, V, N} in Main.Loop at /tmp/test.jl:9
  4.115 μs (1 allocation: 96 bytes)
seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, seed::Partials{N,V}) where {T, V, N} in Base.Broadcast at /tmp/test.jl:50
  2.912 μs (1 allocation: 96 bytes)

seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, seeds::Tuple{Vararg{Partials{N,V},N}}) where {T, V, N} in Main.Loop at /tmp/test.jl:17
  81.289 ns (1 allocation: 896 bytes)
seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, seeds::Tuple{Vararg{Partials{N,V},N}}) where {T, V, N} in Base.Broadcast at /tmp/test.jl:56
  126.638 ns (3 allocations: 992 bytes)

seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seed::Partials{N,V}) where {T, V, N} in Main.Loop at /tmp/test.jl:25
  63.850 ns (1 allocation: 96 bytes)
seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seed::Partials{N,V}) where {T, V, N} in Base.Broadcast at /tmp/test.jl:62
  89.895 ns (3 allocations: 192 bytes)

seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seeds::Tuple{Vararg{Partials{N,V},N}}) where {T, V, N} in Main.Loop at /tmp/test.jl:35
  84.639 ns (1 allocation: 896 bytes)
seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seeds::Tuple{Vararg{Partials{N,V},N}}) where {T, V, N} in Base.Broadcast at /tmp/test.jl:70
  121.852 ns (3 allocations: 992 bytes)

seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seeds::Tuple{Vararg{Partials{N,V},N}}, chunksize) where {T, V, N} in Main.Loop at /tmp/test.jl:35
  64.874 ns (1 allocation: 896 bytes)
seed!(duals::AbstractArray{Dual{T,V,N},N1} where N1, x, index, seeds::Tuple{Vararg{Partials{N,V},N}}, chunksize) where {T, V, N} in Base.Broadcast at /tmp/test.jl:70
  135.899 ns (3 allocations: 992 bytes)

https://gist.github.com/maleadt/d794ff3de39c4e5a94e98370288fcf09

How performance sensitive is seed!?

jrevels · 2018-09-10T15:07:49Z

Nice!

How performance sensitive is seed!?

It gets called in the "inner loop" of the pertubation chunking algorithms for API functions e.g. ForwardDiff.gradient!. Timing-wise, the performance impact of seed! is hugely dependent on the target function and input/output dimensions. We unfortunately can't let it allocate, since doing so breaks an existing ForwardDiff use case (repeated/rapid computation of small Jacobians, desirable for e.g. real-time control/optimization where memory is limited).

maleadt · 2018-09-10T15:16:03Z

We unfortunately can't let it allocate

Then this can't work until JuliaLang/julia#14955

jrevels · 2018-09-10T15:21:57Z

Would it be possible to specialize this implementation on CuArrays' types so they can at least get the desired behavior? The big downside there would obviously be that the overloads would need to live in a separate companion package (unless we'd make ForwardDiff depend on CuArrays, or CuArrays depend on ForwardDiff, neither of which is desirable).

maleadt · 2018-09-10T16:35:39Z

That seems like needlessly duplicating functionality that's actually pretty generic.
What about a HasFastViews trait?

jrevels · 2018-09-10T16:48:39Z

What about a HasFastViews trait?

Sounds good to me. That still requires a companion package though, right? Or do you mean a trait like this in Base? If the latter, we could avoid having to wait for a 1.1 release by just making a lightweight package that ForwardDiff/CuArrays/etc. could all share as a dependency (e.g. ExtraTraits.jl).

charleskawczynski · 2020-11-09T04:34:45Z

Now that JuliaLang/julia#14955 is resolved, can we revive this work? It seems that the solution in #406 yields zero allocations:

julia> using ForwardDiff, BenchmarkTools
[ Info: Precompiling ForwardDiff [f6369f11-7733-5829-9624-2563aa707210]

julia> x = rand(1000);

julia> cfg = ForwardDiff.GradientConfig(nothing, x);

julia> @btime ForwardDiff.seed!($(cfg.duals), $x, $(cfg.seeds));
  55.533 ns (0 allocations: 0 bytes)

charleskawczynski · 2021-02-09T16:06:44Z

I think we can close this now that #472 has merged.

Broadcast views to avoid allocations.

09e8dc5

jrevels added a commit that referenced this pull request Aug 31, 2018

Revert "Use broadcast expressions instead of some loops."

eecf116

This reverts commit 19003af. See #353 (comment)

jrevels mentioned this pull request Aug 31, 2018

Revert "Use broadcast expressions instead of some loops." #354

Merged

jrevels added a commit that referenced this pull request Aug 31, 2018

Revert "Use broadcast expressions instead of some loops." (#354)

6ebe333

This reverts commit 19003af. See #353 (comment)

Avoid slicing (and hence copying) typically large tuples.

88c2861

ChrisRackauckas mentioned this pull request Jul 13, 2019

WIP: GPU-compatibility of Jacobians #406

Closed

charleskawczynski mentioned this pull request Nov 9, 2020

Make seed! and extract_jacobian! gpu-friendly #472

Merged

ChrisRackauckas closed this Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcast views to avoid allocations. #353

Broadcast views to avoid allocations. #353

maleadt commented Aug 31, 2018

KristofferC commented Aug 31, 2018

jrevels commented Aug 31, 2018 •

edited

Loading

maleadt commented Sep 6, 2018

jrevels commented Sep 10, 2018

maleadt commented Sep 10, 2018

jrevels commented Sep 10, 2018

maleadt commented Sep 10, 2018 •

edited

Loading

jrevels commented Sep 10, 2018

charleskawczynski commented Nov 9, 2020

charleskawczynski commented Feb 9, 2021

Broadcast views to avoid allocations. #353

Broadcast views to avoid allocations. #353

Conversation

maleadt commented Aug 31, 2018

KristofferC commented Aug 31, 2018

jrevels commented Aug 31, 2018 • edited Loading

maleadt commented Sep 6, 2018

jrevels commented Sep 10, 2018

maleadt commented Sep 10, 2018

jrevels commented Sep 10, 2018

maleadt commented Sep 10, 2018 • edited Loading

jrevels commented Sep 10, 2018

charleskawczynski commented Nov 9, 2020

charleskawczynski commented Feb 9, 2021

jrevels commented Aug 31, 2018 •

edited

Loading

maleadt commented Sep 10, 2018 •

edited

Loading