This repository contains (currently) three different implementations for automatic differentiation, all being runtime based (in contrast to a CT symbolic differentiation library like astGrad or other hypothetical CT based libraries…).
See the playground.nim file for an example use case using gradient descent based on either of the two libraries to find the rotation angle of a set of points.
The first is simply a port of Andrej Karpathy’s micrograd, which is a library defining a ref type which builds a directed acyclic graph for the performed computation of the instance of the type, each time including a closure storing the derivative. From the generated graph afterwards we can differentiate in backward mode by traversing the graph backwards through the closures. This library was not intended to be efficient by Andrej and neither is this one.
I extended this version to include a wide range of trigonometric and other functions to make it a bit more useful for optimization problems (but likely it will be too slow for real world applications).
import micrograd, strformat
let x = initValue(0.5)
let tn = tanh(x)
tn.backward()
echo "AD: ∂(tanh(x))/∂x|_{x=0.5} = ", &"{x.grad:.4f}"
echo "∂(tanh(x))/∂x|_{x=0.5} = [1 / cosh²(x)]_{x=0.5} = ", &"{1.0 / (cosh(0.5)**2):.4f}"
# and for a use case where the backward mode shines:
proc bar[T](x1, x2: T): T =
result = tanh(x1) + sinh(x2)
let x1 = initValue(0.5)
let x2 = initValue(1.5)
let res = bar(x1, x2)
res.backward()
echo "AD: ∂f/∂x1|_{x=0.5} = ", &"{x1.grad:.4f}"
echo "AD: ∂f/∂x2|_{x=1.5} = ", &"{x2.grad:.4f}"
As we can see in the code, after we have computed the result involving
our Values
, we call the backward
procedure. This computes the
backward pass for with respect to all inputs. As everything is a
ref object
the variables x1
and x2
in the second example are
automatically updated to contain the correct gradient ∂f_∂xi
afterwards.
The second is a straight forward implementation of automatic differentiation using dual numbers, dual_autograd. It is heavily inspired by a Julia implementation of Julius Pfrommer from KIT. We have a type containing the primal (the “value” of the normal variable) and its derivative. Using operator overloading for this type we can then define each operation by its normal math result and its derivative. As such, computing any expression automatically yields the derivative in forward mode.
As long as the computation is reasonably run in forward mode, this implementation is actually rather efficient and should be usable for practical problems.
The same list of functions with derivatives supported for micrograd
is also supported for dual_autograd
.
import dual_autograd, strformat
let arg = D(0.5, 1.0)
let x = tanh(arg)
echo "AD: ∂(tanh(x))/∂x|_{x=0.5} = ", &"{x.d:.4f}"
echo "∂(tanh(x))/∂x|_{x=0.5} = [1 / cosh²(x)]_{x=0.5} = ", &"{1.0 / (cosh(0.5)**2):.4f}"
# and now for a case where forward pass is more useful:
proc foo[T](x: T): (T, T) =
result = (tanh(x), sinh(x))
let ∂fi_∂x = foo(arg)
echo ∂fi_∂x
which outputs:
AD: ∂(tanh(x))/∂x|_{x=0.5} = 0.7864 ∂(tanh(x))/∂x|_{x=0.5} = [1 / cosh²(x)]_{x=0.5} = 0.7864 ((p: 0.462117, ∂: 0.786448), (p: 0.521095, ∂: 1.12763))
As we can see in the second function (while the implementation is
trivial of course), for a single procedure call we get the derivatives
to of the function with respect to the argument x
for both outputs
at the same time.
The third implementation is by far the simplest one in terms of code size. It make use of the equivalence between a forward pass automatic differentiation and an improved version of a finite difference like approach to the definition of a derivative (namely based on a Taylor expansion) using complex numbers. It is the complex step derivative approximation. The implementation here complex_step.nim is inspired by the paper of Martins, Sturdz and Alonso in 2003 titled:
“The Complex-Step Derivative Approximation”
It is essentially equivalent to the dual numbers implementation, but
lifts most of the implementation details from the definition of Nim’s
Complex
type. It overrides the behavior of the abs
function and
adds comparison operators for Complex
numbers, which only compare
along the real axis.
import complex_step, strformat
let z = cmplx(0.5)
let res = derivative(z, tanh(z))
echo "AD: ∂(tanh(x))/∂x|_{x=0.5} = ", &"{res:.4f}"
echo "∂(tanh(x))/∂x|_{x=0.5} = [1 / cosh²(x)]_{x=0.5} = ", &"{1.0 / (cosh(0.5)**2):.4f}"
# and the equivalent of `foo` using the `dual_autograd` above:
# and now for a case where forward pass is more useful:
proc foo[T](x: T): (T, T) =
result = (tanh(x), sinh(x))
let arg = cmplx(0.5, h_eps)
## Note: currently we don't have a way to make it nice to work with procs returning
## tuples, so we need to do the "magic" manually:
let der = foo(arg)
let ∂f1_∂x = der[0] / h_eps
let ∂f2_∂x = der[1] / h_eps
echo ∂f1_∂x.im, " and ", ∂f2_∂x.im
Given a multivariate function
In forward mode we compute a single column of
To make it more understandable in code terms:
proc f(x1, x2, x3, ...: float): (f1, f2, f3, ...) =
result = ...
proc forward(f: Function, x1, x2, ...: float, by)
proc backward(f: Function, f1, f2, ...: float, by)
# assuming a hypothetical forward pass autograd, we compute the
# derivative with respect to *one* input variable, e.g. `x1`
let ∂fi_∂x1 = forward(f, x1, x2, ..., by = x1)
# which returns a vector of _all_ derivatives (∂f1/∂x1, ∂f2/∂x1, ∂f3/∂x1, ...)
# i.e. the first column of the Jacobian
# whereas in backward mode it would be:
let ∂f1_∂xi = backward(f, f1, f2, ..., by = f1)
# which would give use _all_ derivatives (∂f1/∂x1, ∂f1/∂x2, ∂f1/∂x3, ...)
# i.e. the first row of the Jacobian
(Note: I need to think about this pseudo notation when I’m less tired again. :) )
Because of this the two different modes are useful for different purposes. For functions that have more inputs than outputs backward passes are very useful, as all derivatives can be computed with respect to the output in very few (N = number of output) passes. In the inverse case of a few inputs, but many outputs the forward pass is more efficient for the same reason.
Backward passes have become so popular (“backpropagation”) due to neural networks, because of the typical layout of neural networks in machine learning. These typically have a very large number of inputs, but very few outputs. As such the efficient thing to do is to compute the backward pass instead of the forward pass!