Current method of caching arrays seems to be inefficient in some scenarios. #92

KristofferC · 2016-02-01T12:53:53Z

I was profiling my code a bit and after doing some optimization I noticed that for jacobian computations, ~60-70% of the time is spend in the cache functions of ForwardDiff.

Here is a profile screenshot with some annotations:

It seems that a lot of the time in the cache method is spent in ht_keyindex in dict.jl which I believe is a key lookup. This line and the one after will both look up the key in the dictionary which is unnecessary. I am not sure there is a good way to solve the double lookup right now, JuliaLang/julia#12157 is relevant.

Another could be reconsider the whole cache method so that instead of using a dict some other data structure is used. The dict method is very flexible since it is possible to insert and retrieve arrays of arbitrary type and size but it seems that maybe the lookups are a bit too slow..

If you want to repeat the profile data:

Add these packages:

Pkg.clone("https://github.com/KristofferC/Voigt.jl")
Pkg.clone("https://github.com/KristofferC/ConstLab.jl")
Pkg.add("Parameters")
Pkg.add("Devectorize")

Include this file: https://gist.github.com/KristofferC/71a0797b5b9bf42b8a43

And benchmark with (hopefully I didn't forget anything):

E = 200000.0
ν = 0.3
σy = 800.0
n = 2.0
l = 1.e-2
Hl = 0.1E
Hg = 4.e7
q = 0.1
D = 1.0
tstar = 1000.0
angles = [20, 40]
const mp = CrystPlastMP(E, ν, σy, n, Hl, Hg, q, D, l, tstar, angles)
const ms = CrystPlastMS(length(angles))

comp_res!(y, x) = compute_residual!(y, x, zeros(6), zeros(2), 0.1, mp, ms)
jac = ForwardDiff.jacobian(comp_res!, output_length = 10)
jac(zeros(10))

@profile for i = 1:10^5 jac(zeros(10)) end

The text was updated successfully, but these errors were encountered:

KristofferC · 2016-02-01T14:03:55Z

I experimented a bit and with some ugly hacking and slashing in the code I now run with

type JacobianCache{T, Q}
    workvec::Vector{T}
    output::Vector{T}
    partials::Vector{Q}
end

as cache (only vector mode works so far).

This (naturally) eliminates the key lookup overhead:

The negative effect of this is that I need to specify the output size in order to allocate the correct types for the gradient numbers.

I also specified the input size upon creating the function that computes the jacobian but with resize! on the fields in the cache that might not be needed.

I believe that in the large majority of cases the user has a known constant input size -> known constant output size. It might make sense to try to optimize that case a bit extra if it is possible even if it requires the user to explicitly give the output size from the start.

KristofferC · 2016-04-04T20:23:44Z

If we already use @generated to instantiate new functions for the same types that are needed for the cache, why not instantiate the arrays that are needed as well? They will work just like static variables for the function.

Like:

@generated function foo{N,T}(a::NTuple{N, T}, x)
    println("Creating buffer")
    cache_arr = zeros(T, N)
    return quote
        print($cache_arr)
        fill!($cache_arr, x)
        print($cache_arr)
    end
end

julia> foo((1.0,2.0,3.0), 2.0)
Creating buffer
[0.0,0.0,0.0][2.0,2.0,2.0]
julia> foo((1.0,2.0,3.0), 1.0)
[2.0,2.0,2.0][1.0,1.0,1.0]
julia> foo((1.0,2.0,3.0,4.0), 1.0)
Creating buffer
[0.0,0.0,0.0,0.0][1.0,1.0,1.0,1.0]

KristofferC · 2016-04-04T20:49:12Z

With a diff like: https://gist.github.com/KristofferC/5e2169d0057df82494cbdf6fb341bfba

and using:

function brown_almost_linear(x::Vector)
    fvec = similar(x)
    n = length(x)
    sum1 = sum(x) - (n+1)
    for k = 1:(n-1)
        fvec[k] = x[k] + sum1
    end
    fvec[n] = prod(x) - 1
    return fvec
end

g = ForwardDiff.jacobian(brown_almost_linear)

I get (with all tests passing):

julia> @time for i in 1:10^5 g(rand(5)) end
  0.091972 seconds (600.00 k allocations: 86.975 MB, 5.09% gc time)

and on master

julia> @time for i in 1:10^5 g(rand(5)) end
  0.293641 seconds (800.00 k allocations: 91.553 MB, 2.07% gc time)

There is probably some reason why some compiler person doesn't want you to do this but it seems quite powerful...

jrevels · 2016-04-04T22:53:17Z

If I understand you correctly, what you're suggesting is similar to what I started to do in my original refactor last year, but it was decided to abandon that approach since the memory cached in the @generated function couldn't easily be freed by the user.

There's probably a sane way to combine these approaches, such that we can make cache retrieval a compile-time lookup instead of a runtime lookup, but still allocate the memory such that it can be freed by the user when necessary.

KristofferC · 2016-04-05T06:09:36Z

It could be possible to swap to the current cache lookup for some input length large enough. For small input lengths the size of the cached array is probably in the same order of magnitude as the generated code which doesn't get freed anyway.

KristofferC changed the title ~~Current method of caching array seems to be a bottleneck.~~ Current method of caching arrays seems to be inefficient in some scenarios. Feb 1, 2016

KristofferC mentioned this issue Feb 2, 2016

WIP/RFC optionally use faster cache #99

Closed

5 tasks

jrevels mentioned this issue Feb 8, 2016

Giant rewrite for Julia v0.5 #102

Merged

5 tasks

KristofferC closed this as completed Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current method of caching arrays seems to be inefficient in some scenarios. #92

Current method of caching arrays seems to be inefficient in some scenarios. #92

KristofferC commented Feb 1, 2016

KristofferC commented Feb 1, 2016

KristofferC commented Apr 4, 2016

KristofferC commented Apr 4, 2016

jrevels commented Apr 4, 2016

KristofferC commented Apr 5, 2016

Current method of caching arrays seems to be inefficient in some scenarios. #92

Current method of caching arrays seems to be inefficient in some scenarios. #92

Comments

KristofferC commented Feb 1, 2016

KristofferC commented Feb 1, 2016

KristofferC commented Apr 4, 2016

KristofferC commented Apr 4, 2016

jrevels commented Apr 4, 2016

KristofferC commented Apr 5, 2016