Merge pull request #34 from JuliaDiff/api-refactor

Allow chunk-sizing options, and a more robust caching layer to fix leaky generated functions
JuliaDiff · Aug 14, 2015 · 4a4285d · 4a4285d
2 parents 28bab4e + 49474a6
commit 4a4285d
Show file tree

Hide file tree

Showing 26 changed files with 1,655 additions and 880 deletions.
diff --git a/README.md b/README.md
@@ -1,22 +1,64 @@
-[![Build Status](https://travis-ci.org/JuliaDiff/ForwardDiff.jl.svg?branch=nduals-refactor)](https://travis-ci.org/JuliaDiff/ForwardDiff.jl) [![Coverage Status](https://coveralls.io/repos/JuliaDiff/ForwardDiff.jl/badge.svg?branch=nduals-refactor&service=github)](https://coveralls.io/github/JuliaDiff/ForwardDiff.jl?branch=nduals-refactor)
+[![Build Status](https://travis-ci.org/JuliaDiff/ForwardDiff.jl.svg?branch=api-refactor)](https://travis-ci.org/JuliaDiff/ForwardDiff.jl) [![Coverage Status](https://coveralls.io/repos/JuliaDiff/ForwardDiff.jl/badge.svg?branch=api-refactor&service=github)](https://coveralls.io/github/JuliaDiff/ForwardDiff.jl?branch=api-refactor)
 
 # ForwardDiff.jl
 
 The `ForwardDiff` package provides a type-based implementation of forward mode automatic differentiation (FAD) in Julia. [The wikipedia page on automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is a useful resource for learning about the advantages of FAD techniques over other common differentiation methods (such as [finite differencing](https://en.wikipedia.org/wiki/Numerical_differentiation)).
 
 ## What can I do with this package?
 
-This package contains methods to efficiently take derivatives, Jacobians, and Hessians of native Julia functions (or any callable object, really). While performance varies depending on the functions you evaluate, this package generally outperforms non-AD methods in memory usage, speed, and accuracy.
+This package contains methods to take derivatives, gradients, Jacobians, and Hessians of native Julia functions (or any callable object, really). While performance varies depending on the functions you evaluate, this package generally outperforms non-AD methods in memory usage, speed, and accuracy.
 
-A third-order generalization of the Hessian is also implemented (see `tensor` below). 
+A third-order generalization of the Hessian is also implemented (see the `tensor` method). 
 
 For now, we only support for functions involving `T<:Real`s, but we believe extension to numbers of type `T<:Complex` is possible.
 
-## Usage
+## Usage Example
 
----
-#### Derivative of `f: R → R` or `f: R → Rᵐ¹ × Rᵐ² × ⋯ × Rᵐⁱ`
----
+```julia
+julia> using ForwardDiff
+
+julia> f(x::Vector) = sum(sin, x) + prod(tan, x) * sum(sqrt, x);
+
+julia> x = rand(5) # small size for example's sake
+5-element Array{Float64,1}:
+ 0.986403
+ 0.140913
+ 0.294963
+ 0.837125
+ 0.650451
+
+julia> g = ForwardDiff.gradient(f); # g = ∇f
+
+julia> g(x)
+5-element Array{Float64,1}:
+ 1.01358
+ 2.50014
+ 1.72574
+ 1.10139
+ 1.2445
+
+julia> j = ForwardDiff.jacobian(g); # j = J(∇f)
+
+julia> j(x)
+5x5 Array{Float64,2}:
+ 0.585111  3.48083  1.7706    0.994057  1.03257
+ 3.48083   1.06079  5.79299   3.25245   3.37871
+ 1.7706    5.79299  0.423981  1.65416   1.71818
+ 0.994057  3.25245  1.65416   0.251396  0.964566
+ 1.03257   3.37871  1.71818   0.964566  0.140689
+
+julia> ForwardDiff.hessian(f, x) # H(f)(x) == J(∇f)(x), as expected 
+5x5 Array{Float64,2}:
+ 0.585111  3.48083  1.7706    0.994057  1.03257
+ 3.48083   1.06079  5.79299   3.25245   3.37871
+ 1.7706    5.79299  0.423981  1.65416   1.71818
+ 0.994057  3.25245  1.65416   0.251396  0.964566
+ 1.03257   3.37871  1.71818   0.964566  0.140689
+ ```
+
+## API
+
+#### Derivative of `f(x::Number) → Number` or `f(x::Number) → Array`
 
 - **`derivative!(output::Array, f, x::Number)`**
 
@@ -30,9 +72,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
 
     Return the function `f'`. If `mutates=false`, then the returned function has the form `derivf(x) -> derivative(f, x)`. If `mutates = true`, then the returned function has the form `derivf!(output, x) -> derivative!(output, f, x)`.
 
----
-#### Gradient of `f: Rⁿ → R`
----
+#### Gradient of `f(x::Vector) → Number`
 
 - **`gradient!(output::Vector, f, x::Vector)`**
 
@@ -46,9 +86,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
 
     Return the function `∇f`. If `mutates=false`, then the returned function has the form `gradf(x) -> gradient(f, x)`. If `mutates = true`, then the returned function has the form `gradf!(output, x) -> gradient!(output, f, x)`. By default, `mutates` is set to `false`. `ForwardDiff` must be used as a qualifier when calling `gradient` to avoid conflict with `Base.gradient`.
 
----
-#### Jacobian of `f: Rⁿ → Rᵐ`
----
+#### Jacobian of `f(x:Vector) → Vector`
 
 - **`jacobian!(output::Matrix, f, x::Vector)`**
 
@@ -62,9 +100,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
 
     Return the function `J(f)`. If `mutates=false`, then the returned function has the form `jacf(x) -> jacobian(f, x)`. If `mutates = true`, then the returned function has the form `jacf!(output, x) -> jacobian!(output, f, x)`. By default, `mutates` is set to `false`.
 
----
-#### Hessian of `f: Rⁿ → R`
----
+#### Hessian of `f(x::Vector) → Number`
 
 - **`hessian!(output::Matrix, f, x::Vector)`**
 
@@ -78,9 +114,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
 
     Return the function `H(f)`. If `mutates=false`, then the returned function has the form `hessf(x) -> hessian(f, x, S)`. If `mutates = true`, then the returned function has the form `hessf!(output, x) -> hessian!(output, f, x)`. By default, `mutates` is set to `false`.
 
----
-#### Third-order Taylor series term of `f: Rⁿ → R`
----
+#### Third-order Taylor series term of `f(x::Vector) → Number`
 
 [This Math StackExchange post](http://math.stackexchange.com/questions/556951/third-order-term-in-taylor-series) actually has an answer that explains this term fairly clearly.
 
@@ -95,3 +129,85 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
 - **`tensor(f; mutates=false)`**
 
     Return the function ``∑D³f``. If `mutates=false`, then the returned function has the form `tensf(x) -> tensor(f, x)`. If `mutates = true`, then the returned function has the form `tensf!(output, x) -> tensor!(output, f, x)`. By default, `mutates` is set to `false`.
+
+## Performance Tips
+
+#### Type stability
+
+Make sure that your target function is [type-stable](http://julia.readthedocs.org/en/latest/manual/performance-tips/#write-type-stable-functions). Type instability in the target function can cause slowness, and in some cases, errors. If you get an error that looks like this:
+
+```julia
+ERROR: TypeError: typeassert: ...
+```
+
+It might be because Julia's type inference can't predict the output of your target function to be the output expected by ForwardDiff (these expectations are outlined in the API above).
+
+#### Caching Options
+
+If you're going to be repeatedly evaluating the gradient/Hessian/etc. of a function, it may be worth it to generate the corresponding method beforehand rather than call `hessian(f, x)` a bunch of times. For example, this:
+
+```julia
+inputs = [rand(1000) for i in 1:100]
+for x in inputs
+    hess = hessian(f, x)
+    ... # do something with hess
+end
+```
+
+should really be written like this:
+
+```julia
+h = hessian(f) # generate H(f) first
+inputs = [rand(1000) for i in 1:100]
+for x in inputs
+    hess = h(x)
+    ... # do something with hess
+end
+```
+
+The reason this is the case is that `hessian(f, x)` (and `hessian!(output, f, x)`, as well as the other methods like `gradient`/`jacobian`/etc.) must create various temporary "work arrays" over the course of evaluation. Generating `h = hessian(f)` *first* allows the temporary arrays to be cached in subsequent calls to `h`, saving both time and memory over the course of the loop.
+
+This caching can also be handled manually, if one wishes, by utilizing the provided `ForwardDiffCache` type and the `cache` keyword argument:
+
+```julia
+my_cache = ForwardDiffCache() # make new cache to pass in to our function
+inputs = [rand(1000) for i in 1:100]
+for x in inputs
+    # just as efficient as pre-generating h because it can reuse my_cache
+    hess = hessian(f, x, cache=my_cache) 
+    ... # do something with hess
+end
+```
+
+The `cache` keyword argument is supported for all ForwardDiff methods which take in a target function and input.
+
+Manually passing in a `FowardDiffCache` is useful when `f` might change over the context of the calculation, in which case you wouldn't want to have to keep re-generating `hessian(f)` over and over.
+
+#### Chunk-based calculation modes
+
+If the dimensions you're working in are very large, ForwardDiff supports calculation of gradients/Jacobians/etc. by "chunks" of the input vector rather than performing the entire computation at once. This mode is triggered by use of the `chunk_size` keyword argument, like so:
+
+```julia
+x = rand(1000000)
+jacobian(f, x, chunk_size=10) # perform calculation only 10 elements at a time to save memory
+```
+
+This also applies to pre-computed FAD methods, and mutating methods:
+
+```julia
+j! = jacobian(f, mutates=true)
+j!(output, x, chunk_size=10)
+```
+
+The given `chunk_size` must always evenly divide the length of the input vector:
+
+```julia
+julia> hessian(f, rand(100), chunk_size=11)
+ERROR: AssertionError: Length of input vector is indivisible by chunk size (length(x) = 100, chunk size = 11)
+ in check_chunk_size at /Users/jarrettrevels/.julia/ForwardDiff/src/fad_api/fad_api.jl:19
+ in _calc_hessian! at /Users/jarrettrevels/.julia/ForwardDiff/src/fad_api/hessian.jl:48
+```
+
+Thus, chunking of input vectors whose length is a prime number is unsupported. We're currently working on removing this limitation.
+
+Note that it is generally always much faster to **not** provide a `chunk_size`. This option is provided for the cases in which performing an entire calculation at once would consume too much memory.
diff --git a/benchmarks/benchmarks.jl b/benchmarks/benchmarks.jl
@@ -0,0 +1,61 @@
+using ForwardDiff
+
+##################
+# Test functions #
+##################
+sqr(i) = i^2
+twopicos(i) = cos(2*π*i)
+
+function ackley(x)
+    len_recip = 1/length(x)
+    sum_sqrs = zero(eltype(x))
+    sum_cos = sum_sqrs
+    for i in x
+        sum_cos += twopicos(i)
+        sum_sqrs += sqr(i)
+    end
+    return -20 * exp(-0.2 * sqrt(len_recip*sum_sqrs)) - exp(len_recip * sum_cos) + 20 + e
+end
+
+#############################
+# Benchmark utility methods #
+#############################
+# Usage:
+#
+# benchmark ackley where length(x) = 10:10:100, taking the minimum of 4 trials:
+# bench_fad(ackley, 10:10:100, 4)
+#
+# benchmark ackley where len(x) = 400, taking the minimum of 8 trials:
+# bench_fad(ackley, 400, 4)
+
+function bench_fad(f, range, repeat=3)
+    g = ForwardDiff.gradient(f)
+    h = ForwardDiff.hessian(f)
+
+    # warm up
+    bench_range(f, range, 1)
+    bench_range(g, range, 1)
+    bench_range(h, range, 1)
+
+    # actual
+    return Dict(
+        :ftimes => bench_range(f,range,repeat),
+        :gtimes => bench_range(g,range,repeat),
+        :htimes => bench_range(h,range,repeat)
+    )
+end
+
+function bench_func(f, xlen, repeat)
+    x=rand(xlen)
+    min_time = Inf
+    for i in 1:repeat
+        this_time = (tic(); f(x); toq())
+        min_time = min(this_time, min_time)
+    end
+    return min_time
+end
+
+function bench_range(f, range, repeat=3)
+    return [bench_func(f, xlen, repeat) for xlen in range]
+end
+
diff --git a/benchmarks/benchmarks.py b/benchmarks/benchmarks.py
@@ -0,0 +1,49 @@
+import numpy, algopy, math, timeit
+
+################
+# AD functions #
+################
+def gradient(f):
+    def gradf(x):
+        y = algopy.UTPM.init_jacobian(x)
+        return algopy.UTPM.extract_jacobian(f(y))
+    return gradf
+
+def hessian(f):
+    def hessf(x):
+        y = algopy.UTPM.init_hessian(x)
+        return algopy.UTPM.extract_hessian(x.size, f(y))
+    return hessf
+
+##################
+# Test functions #
+##################
+def ackley(x):
+    sum_sqrs = sum(i**2 for i in x)
+    sum_cos = sum(algopy.cos(2*math.pi*i) for i in x)
+    len_recip = 1/len(x)
+    return (-20 * algopy.exp(-0.2*algopy.sqrt(len_recip*sum_sqrs)) - 
+            algopy.exp(0.5*sum_cos) + 20 + math.e)
+
+#############################
+# Benchmark utility methods #
+#############################
+# Usage:
+#
+# benchmark ackley where len(x) = range(10,100,10), taking the minimum of 4 trials:
+# bench_fad(ackley, range(10,100,10), 4)
+#
+# benchmark ackley where len(x) = 400, taking the minimum of 8 trials:
+# bench_fad(ackley, (400,), 8)
+
+def bench_fad(f, itr, repeat):
+    fname = f.__name__
+    import_stmt = 'import numpy, algopy, math; from __main__ import ' + fname + ', gradient, hessian;'
+    return {'ftimes': bench_range(fname + '(x)', import_stmt, itr, repeat),
+            'gtimes': bench_range('g(x)', import_stmt + 'g = gradient(' + fname + ');', itr, repeat), 
+            'htimes': bench_range('h(x)', import_stmt + 'h = hessian(' + fname + ');', itr, repeat)}
+
+def bench_range(stmt, setup, itr, repeat):
+    x_stmt = lambda xlen: 'x = numpy.random.rand(' + str(xlen) + ')'
+    return [min(timeit.repeat(stmt, setup=(setup + x_stmt(i)), number=1, repeat=repeat)) for i in itr]
+
diff --git a/src/ForwardDiff.jl b/src/ForwardDiff.jl
@@ -35,12 +35,11 @@ module ForwardDiff
     #     @eval import Base.$(fsym);
     # end
 
-    include("ForwardDiffNum.jl")
-    include("GradientNum.jl")
-    include("HessianNum.jl")
-    include("TensorNum.jl")
-    include("fad_api.jl")
-    include("deprecated.jl")
+    include("ForwardDiffNumber.jl")
+    include("GradientNumber.jl")
+    include("HessianNumber.jl")
+    include("TensorNumber.jl")
+    include("fad_api/fad_api.jl")
 
     export derivative!,
            derivative,
@@ -50,6 +49,7 @@ module ForwardDiff
            hessian!,
            hessian,
            tensor!,
-           tensor
+           tensor,
+           ForwardDiffCache
 
 end # module ForwardDiff