-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Allow chunk-sizing options and fix leaky generated functions from #27 #34
Conversation
…d rm dangerously leaky methods. jacobian, hessian, and tensor methods rm'd until they can support the same API structure as ForwardDiff.gradient
…the event that chunk_size equals length(x)
…e API. Also, split fad_api code into multiple files for easier review
@Scidom @mlubin Here's an overview of what things look like in this branch right now: I brought back the chunk-based calculation for gradients, extended it to Jacobians, and came up with a tiling algorithm to implement chunk-based calculation of Hessians as well. The tiling algorithm can be extended to Tensors, but it would require a significant amount of work (generalizing to rank 3 makes my head hurt). I'd rather put off supporting chunk-computing Tensors until after #27 lands, so that we can get everything merged and I can get benchmark results from an "official" version of ForwardDiff. Following @mlubin's advice, I ended up implementing a julia> mycache = HessianCache();
julia> hessian(f, x, cache=mycache) ...where I really like this caching solution for that reason. However, I haven't been able to implement it in a way that matches the performance of the code in the julia> using ForwardDiff
julia> f(x) = sum(sin, x) + prod(tan, x) * sum(sqrt, x);
julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)
julia> h = ForwardDiff.hessian(f)
hessf (generic function with 1 method)
julia> x = rand(100);
julia> @time g(x); # after warmup
0.000123 seconds (1.71 k allocations: 558.609 KB)
julia> @time h(x); # after warmup
0.014564 seconds (23.52 k allocations: 24.293 MB, 19.97% gc time)
julia> x = rand(1000);
julia> @time g(x); # after warmup
0.019061 seconds (20.00 k allocations: 46.601 MB, 32.08% gc time)
julia> @time h(x); # after warmup
12.934909 seconds (3.02 M allocations: 22.491 GB, 16.72% gc time) Compare the above to the same benchmark on the ⋮
julia> x = rand(100);
julia> @time g(x); # after warmup
0.000108 seconds (1.51 k allocations: 555.359 KB)
julia> @time h(x); # after warmup
0.013671 seconds (3.83 k allocations: 23.992 MB, 20.54% gc time)
julia> x = rand(1000);
julia> @time g(x); # after warmup
0.017159 seconds (17.01 k allocations: 46.555 MB, 33.04% gc time)
julia> @time h(x); # after warmup
12.971819 seconds (40.03 k allocations: 22.446 GB, 15.92% gc time) The disparity in the reported time/memory usage doesn't bother me a huge amount, but the disparity in allocation count is quite large, and I fear it could be an indicator of potential poor performance. I'd like to do more intensive profiling to figure out the source - the main difference code-wise between these two branches (w.r.t to this benchmark, which doesn't touch the chunk-based code) is how caching is handled, so I suspect it's something there. I'll try some more stuff tomorrow and report back. |
Could you point to the lines with the caching logic? Sounds like there's type instability somewhere. |
All the caching stuff is defined in cache.jl. There's definitely type instability/ambiguity internally, but I was hoping it was close enough to "surface level" that it wouldn't impact performance (e.g. what we were discussing with having type instability at the API level). The main problem, I imagine (but still need to test) comes from the fact that I'm using ambiguously-typed One way I've thought of circumventing this would be to make the fields of immutable ForwardDiffCache{F}
fad_type::Type{F}
workvec_cache::Function
partials_cache::Function
zeros_cache::Function
end
function ForwardDiffCache{F}(::Type{F})
@generated function workvec_cache(args...)
...
end
@generated function partials_cache(args...)
...
end
@generated function zeros_cache(args...)
...
end
return ForwardDiffCache(F, workvec_cache, partials_cache, zeros_cache)
end Then, I could make the caching more-or-less type stable, since the input type to the generated closures (e.g. |
If 1) and 2) end up being true (which I'm not at all convinced of until I can test it), then this would actually be a cool pattern to handle type-inferencable, type-variadic, and non-leaky memoization in Julia (which I don't believe is covered by Memoize.jl, according to the inferencing warning at the bottom of the README). |
|
||
_load_gradvec_with_x_zeros!(gradvec, x, gradzeros) | ||
|
||
for i in 1:N:xlen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if xlen
isn't divisible by N
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then an exception is thrown:
function check_chunk_size(x::Vector, chunk_size::Int)
@assert length(x) % chunk_size == 0 "Length of input vector is indivisible by chunk size (length(x) = $(length(x)), chunk size = $chunk_size)"
end
See here.
… it makes for a cleaner implementation
@mlubin @Scidom The latest commits restructure the code to fix the type instabilities in the caching layer, which were bubbling up to the other methods and making them allocation-heavy. Performance is now comparable to |
…d Python (comparing ForwardDiff.jl with AlgoPy)
Allow chunk-sizing options, and a more robust caching layer to fix leaky generated functions
Basically what the title says. This PR attempts to address some issues with #27, and is being done here since there will be intermittent breakage and a smaller PR will be easier to review.