Skip to content

Commit

Permalink
Merge pull request #34 from JuliaDiff/api-refactor
Browse files Browse the repository at this point in the history
Allow chunk-sizing options, and a more robust caching layer to fix leaky generated functions
  • Loading branch information
jrevels committed Aug 14, 2015
2 parents 28bab4e + 49474a6 commit 4a4285d
Show file tree
Hide file tree
Showing 26 changed files with 1,655 additions and 880 deletions.
154 changes: 135 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,64 @@
[![Build Status](https://travis-ci.org/JuliaDiff/ForwardDiff.jl.svg?branch=nduals-refactor)](https://travis-ci.org/JuliaDiff/ForwardDiff.jl) [![Coverage Status](https://coveralls.io/repos/JuliaDiff/ForwardDiff.jl/badge.svg?branch=nduals-refactor&service=github)](https://coveralls.io/github/JuliaDiff/ForwardDiff.jl?branch=nduals-refactor)
[![Build Status](https://travis-ci.org/JuliaDiff/ForwardDiff.jl.svg?branch=api-refactor)](https://travis-ci.org/JuliaDiff/ForwardDiff.jl) [![Coverage Status](https://coveralls.io/repos/JuliaDiff/ForwardDiff.jl/badge.svg?branch=api-refactor&service=github)](https://coveralls.io/github/JuliaDiff/ForwardDiff.jl?branch=api-refactor)

# ForwardDiff.jl

The `ForwardDiff` package provides a type-based implementation of forward mode automatic differentiation (FAD) in Julia. [The wikipedia page on automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is a useful resource for learning about the advantages of FAD techniques over other common differentiation methods (such as [finite differencing](https://en.wikipedia.org/wiki/Numerical_differentiation)).

## What can I do with this package?

This package contains methods to efficiently take derivatives, Jacobians, and Hessians of native Julia functions (or any callable object, really). While performance varies depending on the functions you evaluate, this package generally outperforms non-AD methods in memory usage, speed, and accuracy.
This package contains methods to take derivatives, gradients, Jacobians, and Hessians of native Julia functions (or any callable object, really). While performance varies depending on the functions you evaluate, this package generally outperforms non-AD methods in memory usage, speed, and accuracy.

A third-order generalization of the Hessian is also implemented (see `tensor` below).
A third-order generalization of the Hessian is also implemented (see the `tensor` method).

For now, we only support for functions involving `T<:Real`s, but we believe extension to numbers of type `T<:Complex` is possible.

## Usage
## Usage Example

---
#### Derivative of `f: R → R` or `f: R → Rᵐ¹ × Rᵐ² × ⋯ × Rᵐⁱ`
---
```julia
julia> using ForwardDiff

julia> f(x::Vector) = sum(sin, x) + prod(tan, x) * sum(sqrt, x);

julia> x = rand(5) # small size for example's sake
5-element Array{Float64,1}:
0.986403
0.140913
0.294963
0.837125
0.650451

julia> g = ForwardDiff.gradient(f); # g = ∇f

julia> g(x)
5-element Array{Float64,1}:
1.01358
2.50014
1.72574
1.10139
1.2445

julia> j = ForwardDiff.jacobian(g); # j = J(∇f)

julia> j(x)
5x5 Array{Float64,2}:
0.585111 3.48083 1.7706 0.994057 1.03257
3.48083 1.06079 5.79299 3.25245 3.37871
1.7706 5.79299 0.423981 1.65416 1.71818
0.994057 3.25245 1.65416 0.251396 0.964566
1.03257 3.37871 1.71818 0.964566 0.140689

julia> ForwardDiff.hessian(f, x) # H(f)(x) == J(∇f)(x), as expected
5x5 Array{Float64,2}:
0.585111 3.48083 1.7706 0.994057 1.03257
3.48083 1.06079 5.79299 3.25245 3.37871
1.7706 5.79299 0.423981 1.65416 1.71818
0.994057 3.25245 1.65416 0.251396 0.964566
1.03257 3.37871 1.71818 0.964566 0.140689
```

## API

#### Derivative of `f(x::Number) → Number` or `f(x::Number) → Array`

- **`derivative!(output::Array, f, x::Number)`**

Expand All @@ -30,9 +72,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte

Return the function `f'`. If `mutates=false`, then the returned function has the form `derivf(x) -> derivative(f, x)`. If `mutates = true`, then the returned function has the form `derivf!(output, x) -> derivative!(output, f, x)`.

---
#### Gradient of `f: Rⁿ → R`
---
#### Gradient of `f(x::Vector) → Number`

- **`gradient!(output::Vector, f, x::Vector)`**

Expand All @@ -46,9 +86,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte

Return the function `∇f`. If `mutates=false`, then the returned function has the form `gradf(x) -> gradient(f, x)`. If `mutates = true`, then the returned function has the form `gradf!(output, x) -> gradient!(output, f, x)`. By default, `mutates` is set to `false`. `ForwardDiff` must be used as a qualifier when calling `gradient` to avoid conflict with `Base.gradient`.

---
#### Jacobian of `f: Rⁿ → Rᵐ`
---
#### Jacobian of `f(x:Vector) → Vector`

- **`jacobian!(output::Matrix, f, x::Vector)`**

Expand All @@ -62,9 +100,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte

Return the function `J(f)`. If `mutates=false`, then the returned function has the form `jacf(x) -> jacobian(f, x)`. If `mutates = true`, then the returned function has the form `jacf!(output, x) -> jacobian!(output, f, x)`. By default, `mutates` is set to `false`.

---
#### Hessian of `f: Rⁿ → R`
---
#### Hessian of `f(x::Vector) → Number`

- **`hessian!(output::Matrix, f, x::Vector)`**

Expand All @@ -78,9 +114,7 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte

Return the function `H(f)`. If `mutates=false`, then the returned function has the form `hessf(x) -> hessian(f, x, S)`. If `mutates = true`, then the returned function has the form `hessf!(output, x) -> hessian!(output, f, x)`. By default, `mutates` is set to `false`.

---
#### Third-order Taylor series term of `f: Rⁿ → R`
---
#### Third-order Taylor series term of `f(x::Vector) → Number`

[This Math StackExchange post](http://math.stackexchange.com/questions/556951/third-order-term-in-taylor-series) actually has an answer that explains this term fairly clearly.

Expand All @@ -95,3 +129,85 @@ For now, we only support for functions involving `T<:Real`s, but we believe exte
- **`tensor(f; mutates=false)`**

Return the function ``∑D³f``. If `mutates=false`, then the returned function has the form `tensf(x) -> tensor(f, x)`. If `mutates = true`, then the returned function has the form `tensf!(output, x) -> tensor!(output, f, x)`. By default, `mutates` is set to `false`.

## Performance Tips

#### Type stability

Make sure that your target function is [type-stable](http://julia.readthedocs.org/en/latest/manual/performance-tips/#write-type-stable-functions). Type instability in the target function can cause slowness, and in some cases, errors. If you get an error that looks like this:

```julia
ERROR: TypeError: typeassert: ...
```

It might be because Julia's type inference can't predict the output of your target function to be the output expected by ForwardDiff (these expectations are outlined in the API above).

#### Caching Options

If you're going to be repeatedly evaluating the gradient/Hessian/etc. of a function, it may be worth it to generate the corresponding method beforehand rather than call `hessian(f, x)` a bunch of times. For example, this:

```julia
inputs = [rand(1000) for i in 1:100]
for x in inputs
hess = hessian(f, x)
... # do something with hess
end
```

should really be written like this:

```julia
h = hessian(f) # generate H(f) first
inputs = [rand(1000) for i in 1:100]
for x in inputs
hess = h(x)
... # do something with hess
end
```

The reason this is the case is that `hessian(f, x)` (and `hessian!(output, f, x)`, as well as the other methods like `gradient`/`jacobian`/etc.) must create various temporary "work arrays" over the course of evaluation. Generating `h = hessian(f)` *first* allows the temporary arrays to be cached in subsequent calls to `h`, saving both time and memory over the course of the loop.

This caching can also be handled manually, if one wishes, by utilizing the provided `ForwardDiffCache` type and the `cache` keyword argument:

```julia
my_cache = ForwardDiffCache() # make new cache to pass in to our function
inputs = [rand(1000) for i in 1:100]
for x in inputs
# just as efficient as pre-generating h because it can reuse my_cache
hess = hessian(f, x, cache=my_cache)
... # do something with hess
end
```

The `cache` keyword argument is supported for all ForwardDiff methods which take in a target function and input.

Manually passing in a `FowardDiffCache` is useful when `f` might change over the context of the calculation, in which case you wouldn't want to have to keep re-generating `hessian(f)` over and over.

#### Chunk-based calculation modes

If the dimensions you're working in are very large, ForwardDiff supports calculation of gradients/Jacobians/etc. by "chunks" of the input vector rather than performing the entire computation at once. This mode is triggered by use of the `chunk_size` keyword argument, like so:

```julia
x = rand(1000000)
jacobian(f, x, chunk_size=10) # perform calculation only 10 elements at a time to save memory
```

This also applies to pre-computed FAD methods, and mutating methods:

```julia
j! = jacobian(f, mutates=true)
j!(output, x, chunk_size=10)
```

The given `chunk_size` must always evenly divide the length of the input vector:

```julia
julia> hessian(f, rand(100), chunk_size=11)
ERROR: AssertionError: Length of input vector is indivisible by chunk size (length(x) = 100, chunk size = 11)
in check_chunk_size at /Users/jarrettrevels/.julia/ForwardDiff/src/fad_api/fad_api.jl:19
in _calc_hessian! at /Users/jarrettrevels/.julia/ForwardDiff/src/fad_api/hessian.jl:48
```

Thus, chunking of input vectors whose length is a prime number is unsupported. We're currently working on removing this limitation.

Note that it is generally always much faster to **not** provide a `chunk_size`. This option is provided for the cases in which performing an entire calculation at once would consume too much memory.
61 changes: 61 additions & 0 deletions benchmarks/benchmarks.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
using ForwardDiff

##################
# Test functions #
##################
sqr(i) = i^2
twopicos(i) = cos(2*π*i)

function ackley(x)
len_recip = 1/length(x)
sum_sqrs = zero(eltype(x))
sum_cos = sum_sqrs
for i in x
sum_cos += twopicos(i)
sum_sqrs += sqr(i)
end
return -20 * exp(-0.2 * sqrt(len_recip*sum_sqrs)) - exp(len_recip * sum_cos) + 20 + e
end

#############################
# Benchmark utility methods #
#############################
# Usage:
#
# benchmark ackley where length(x) = 10:10:100, taking the minimum of 4 trials:
# bench_fad(ackley, 10:10:100, 4)
#
# benchmark ackley where len(x) = 400, taking the minimum of 8 trials:
# bench_fad(ackley, 400, 4)

function bench_fad(f, range, repeat=3)
g = ForwardDiff.gradient(f)
h = ForwardDiff.hessian(f)

# warm up
bench_range(f, range, 1)
bench_range(g, range, 1)
bench_range(h, range, 1)

# actual
return Dict(
:ftimes => bench_range(f,range,repeat),
:gtimes => bench_range(g,range,repeat),
:htimes => bench_range(h,range,repeat)
)
end

function bench_func(f, xlen, repeat)
x=rand(xlen)
min_time = Inf
for i in 1:repeat
this_time = (tic(); f(x); toq())
min_time = min(this_time, min_time)
end
return min_time
end

function bench_range(f, range, repeat=3)
return [bench_func(f, xlen, repeat) for xlen in range]
end

49 changes: 49 additions & 0 deletions benchmarks/benchmarks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import numpy, algopy, math, timeit

################
# AD functions #
################
def gradient(f):
def gradf(x):
y = algopy.UTPM.init_jacobian(x)
return algopy.UTPM.extract_jacobian(f(y))
return gradf

def hessian(f):
def hessf(x):
y = algopy.UTPM.init_hessian(x)
return algopy.UTPM.extract_hessian(x.size, f(y))
return hessf

##################
# Test functions #
##################
def ackley(x):
sum_sqrs = sum(i**2 for i in x)
sum_cos = sum(algopy.cos(2*math.pi*i) for i in x)
len_recip = 1/len(x)
return (-20 * algopy.exp(-0.2*algopy.sqrt(len_recip*sum_sqrs)) -
algopy.exp(0.5*sum_cos) + 20 + math.e)

#############################
# Benchmark utility methods #
#############################
# Usage:
#
# benchmark ackley where len(x) = range(10,100,10), taking the minimum of 4 trials:
# bench_fad(ackley, range(10,100,10), 4)
#
# benchmark ackley where len(x) = 400, taking the minimum of 8 trials:
# bench_fad(ackley, (400,), 8)

def bench_fad(f, itr, repeat):
fname = f.__name__
import_stmt = 'import numpy, algopy, math; from __main__ import ' + fname + ', gradient, hessian;'
return {'ftimes': bench_range(fname + '(x)', import_stmt, itr, repeat),
'gtimes': bench_range('g(x)', import_stmt + 'g = gradient(' + fname + ');', itr, repeat),
'htimes': bench_range('h(x)', import_stmt + 'h = hessian(' + fname + ');', itr, repeat)}

def bench_range(stmt, setup, itr, repeat):
x_stmt = lambda xlen: 'x = numpy.random.rand(' + str(xlen) + ')'
return [min(timeit.repeat(stmt, setup=(setup + x_stmt(i)), number=1, repeat=repeat)) for i in itr]

14 changes: 7 additions & 7 deletions src/ForwardDiff.jl
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,11 @@ module ForwardDiff
# @eval import Base.$(fsym);
# end

include("ForwardDiffNum.jl")
include("GradientNum.jl")
include("HessianNum.jl")
include("TensorNum.jl")
include("fad_api.jl")
include("deprecated.jl")
include("ForwardDiffNumber.jl")
include("GradientNumber.jl")
include("HessianNumber.jl")
include("TensorNumber.jl")
include("fad_api/fad_api.jl")

export derivative!,
derivative,
Expand All @@ -50,6 +49,7 @@ module ForwardDiff
hessian!,
hessian,
tensor!,
tensor
tensor,
ForwardDiffCache

end # module ForwardDiff
Loading

0 comments on commit 4a4285d

Please sign in to comment.