-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current method of caching arrays seems to be inefficient in some scenarios. #92
Comments
I experimented a bit and with some ugly hacking and slashing in the code I now run with type JacobianCache{T, Q}
workvec::Vector{T}
output::Vector{T}
partials::Vector{Q}
end as cache (only vector mode works so far). This (naturally) eliminates the key lookup overhead: The negative effect of this is that I need to specify the output size in order to allocate the correct types for the gradient numbers. I also specified the input size upon creating the function that computes the jacobian but with I believe that in the large majority of cases the user has a known constant input size -> known constant output size. It might make sense to try to optimize that case a bit extra if it is possible even if it requires the user to explicitly give the output size from the start. |
If we already use Like: @generated function foo{N,T}(a::NTuple{N, T}, x)
println("Creating buffer")
cache_arr = zeros(T, N)
return quote
print($cache_arr)
fill!($cache_arr, x)
print($cache_arr)
end
end julia> foo((1.0,2.0,3.0), 2.0)
Creating buffer
[0.0,0.0,0.0][2.0,2.0,2.0]
julia> foo((1.0,2.0,3.0), 1.0)
[2.0,2.0,2.0][1.0,1.0,1.0]
julia> foo((1.0,2.0,3.0,4.0), 1.0)
Creating buffer
[0.0,0.0,0.0,0.0][1.0,1.0,1.0,1.0] |
With a diff like: https://gist.github.com/KristofferC/5e2169d0057df82494cbdf6fb341bfba and using: function brown_almost_linear(x::Vector)
fvec = similar(x)
n = length(x)
sum1 = sum(x) - (n+1)
for k = 1:(n-1)
fvec[k] = x[k] + sum1
end
fvec[n] = prod(x) - 1
return fvec
end
g = ForwardDiff.jacobian(brown_almost_linear) I get (with all tests passing): julia> @time for i in 1:10^5 g(rand(5)) end
0.091972 seconds (600.00 k allocations: 86.975 MB, 5.09% gc time) and on master julia> @time for i in 1:10^5 g(rand(5)) end
0.293641 seconds (800.00 k allocations: 91.553 MB, 2.07% gc time) There is probably some reason why some compiler person doesn't want you to do this but it seems quite powerful... |
If I understand you correctly, what you're suggesting is similar to what I started to do in my original refactor last year, but it was decided to abandon that approach since the memory cached in the There's probably a sane way to combine these approaches, such that we can make cache retrieval a compile-time lookup instead of a runtime lookup, but still allocate the memory such that it can be freed by the user when necessary. |
It could be possible to swap to the current cache lookup for some input length large enough. For small input lengths the size of the cached array is probably in the same order of magnitude as the generated code which doesn't get freed anyway. |
I was profiling my code a bit and after doing some optimization I noticed that for jacobian computations, ~60-70% of the time is spend in the cache functions of ForwardDiff.
Here is a profile screenshot with some annotations:
It seems that a lot of the time in the cache method is spent in
ht_keyindex
indict.jl
which I believe is a key lookup. This line and the one after will both look up the key in the dictionary which is unnecessary. I am not sure there is a good way to solve the double lookup right now, JuliaLang/julia#12157 is relevant.Another could be reconsider the whole cache method so that instead of using a dict some other data structure is used. The dict method is very flexible since it is possible to insert and retrieve arrays of arbitrary type and size but it seems that maybe the lookups are a bit too slow..
If you want to repeat the profile data:
Add these packages:
Include this file: https://gist.github.com/KristofferC/71a0797b5b9bf42b8a43
And benchmark with (hopefully I didn't forget anything):
The text was updated successfully, but these errors were encountered: