AutoDiff #331

emilemathieu · 2017-08-18T15:55:38Z

Hi all !

I was wondering, why did you choose to use chunk-wise forward AD, and not use http://www.juliadiff.org ?

Best,
Emile

emilemathieu · 2017-08-18T16:01:55Z

My bad, it seems to be indeed using ForwardDiff.

yebai · 2017-08-18T16:22:05Z

My bad, it seems to be indeed using ForwardDiff

That's right, we are using ForwardDiff.

emilemathieu · 2017-08-28T23:11:20Z

In the wiki it is written:
Turing.jl is currently using ForwardDiff.jl for automatic differentiation, which is believed to be the main reason of slowness of Turing.jl. This AD backend will be replaced by a reverse-mode AD implementation when Julia has a mature one.

What about ReverseDiff.jl or https://github.com/denizyuret/AutoGrad.jl @yebai ?

yebai · 2017-08-29T07:08:01Z

What about ReverseDiff.jl or https://github.com/denizyuret/AutoGrad.jl

Yes we've looked into this library. I think @xukai92 also did some benchmarking on its performance. The result does suggest backward AD is more efficient than forward AD on big models. Unfortunately this library is currently not compatible with Distributions.jl, and lacks support for many common numerical functions.

emilemathieu · 2017-08-29T14:54:10Z

Thanks for your answer. And what about ReverseDiff.jl ?

ChrisRackauckas · 2017-08-29T14:58:46Z

ReverseDiff.jl also has a very hard upper bound on ForwardDiff v0.5 which will "break" (or at least pull back) many libraries like Optim.jl, DifferentialEquations.jl, NLsolve.jl, ... i.e. I wouldn't even suggest having it installed anymore since it will cause a ton of problems.

AutoGrad has some issues with broadcast-composed functions since it uses dispatch overloading. So I think it's still stuck on v0.5.

The true answer will be Cassette.jl when it comes out. I suggest just sticking to ForwardDiff.jl until it does because the current solutions are somewhat fragile.

emilemathieu · 2017-11-09T15:40:58Z

In our case, we want to compute the gradient of the joint probability which is a function from the cartesian product of the model random variables' support to R, therefore R^n -> R with n=sum{n_i} (n_i being the dimensionality of the ith rv). Reverse differentiation should be more efficient than forward differentiation from a theoretical point of view, isn't it ?

@ChrisRackauckas Do you know whether Jarrett will rewrite ReverseDiff.jl using Cassette.jl, or Cassette.jl itself will allow reverse differentiation ? Would you also have an idea of when Cassette.jl is planned to be in beta ?

ChrisRackauckas · 2017-11-09T15:45:09Z

Cassette.jl is just a graph construction algorithm. ReverseDiff.jl will be rewritten to use Cassette.jl. I don't know of any timelines, other than "soon enough".

yebai · 2018-01-18T16:48:55Z

Hi @ChrisRackauckas, We want to make some plans for backward autodiff integration. There are also some updates from both the Cassette and AutoGrad packages side. Most notably,

it seems that AutoGrad now works for Julia 0.6
Cassette won’t directly support autodiff anymore (judging from the repo’s readme file)

What are your current thoughts/suggestions on this issue? Are there any hard reasons why we can’t merge the AutoGrad package and ReverseDiff package since they overlap so much? Thanks!

xukai92 · 2018-01-18T16:53:40Z

@yebai I was looking into adapting AutoGrad, with some discussions with their developer here: denizyuret/AutoGrad.jl#42

xukai92 · 2018-01-18T16:54:59Z

In the end they say:

I understand. Rec <: Real in the general case may break other use cases.
However you could create a fork/branch of AutoGrad.jl with Rec <: Real as a
short term solution. @jrevels is working on a Cassette based solution
where a Rec type will not be necessary and this problem will be solved.

However I haven't managed to make my proposal of subtyping Rec work yet..

ChrisRackauckas · 2018-01-18T16:58:06Z

Cassette is a contextual dispatch system which can make autodifferentiation. Capstan.jl is where that implementation will live.

Are there any hard reasons why we can’t merge the AutoGrad package and ReverseDiff package since they overlap so much? Thanks!

Autograd requires registered primatives. ReverseDiff using the dispatch mechanism to push derivative information back. They are conceptually different.

Given that ReverseDiff.jl won't be going away since the Cassette work is now going to stay in a separate repo, and that ReverseDiff.jl now no longer has any hard upper bounds, I would suggest trying to get ReverseDiff.jl working. It'll go legacy at some point but should continue to work.

xukai92 · 2018-01-18T17:19:01Z

Thanks for the answer!

I'm a bit confused about your points on the difference between AutoGrad.jl and ReverseDiff.jl. Do you mean ReverseDiff.jl doesn't need any primitives at all?

Another two question on ReverseDiff.jl

Is it compatible with Distributions.jl
Is it GPU supported?

ChrisRackauckas · 2018-01-18T17:28:41Z

Is it GPU supported?

In theory it supports any AbstractArray. So yes, in theory. In practice, it's not well-tested.

Is it compatible with Distributions.jl

It should be, since it would get the derivatives forward there via ForwardDiff.

I'm a bit confused about your points on the difference between AutoGrad.jl and ReverseDiff.jl. Do you mean ReverseDiff.jl doesn't need any primitives at all?

It has a set of generic primatives via dual numbers of ForwardDiff.jl, and uses that to compose functions together. AutoGrad is much more strict on the functional form it allows and requires registering a lot of higher level functions.

yebai · 2018-01-18T17:36:27Z

Very helpful - thank all for the answers!

@ChrisRackauckas what’s the current plan for Capstan, e.g. is there any estimated (even informally) release time? I’ve heard that Julia 1.0 is coming out soon from one Julia creator. Could we help to speed up the development of Capstan?

ChrisRackauckas · 2018-01-18T17:46:41Z

It already works, but it's slow as mentioned here JuliaLabs/Cassette.jl#5 due to JuliaLang/julia#5402 . That's planned to be fixed in a Julia v1.x, so if you don't want to wait then just setup ReverseDiff.jl for now.

jrevels · 2018-01-18T18:29:30Z

Just saw this thread; my comment here might be relevant.

Just FYI, I don't really plan on doing much ReverseDiff maintenance in the future, so that I can focus on Cassette/Capstan stuff. Once Capstan is released (or even just in beta), I'd be happy to help Turing.jl put it through its paces.

xukai92 · 2018-01-19T22:18:54Z

There is still a difficulty using ReverseDiff.jl with Distributions.jl - it's not possible to construct a distribution with ReverseDiff.TrackedArray.

E.g. I cannot AD through f below

f(x) = logpdf.(Normal(x, 1), 1)

This is simply because Distritbuion.jl only support constructing distributions with Real, in which case ForwardDiff works as Dual <: Real.

yebai · 2018-01-24T14:48:35Z

@xukai92 The following seems work for me. I guess the issue is that ReverseDiff.GradientConfig only accept array-type inputs.

using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile
using Distributions

f(x) = logpdf.(Normal(x[1], 1), 1) # x is an array
gradient(f, [1.0])

xukai92 · 2018-01-24T17:18:17Z

Emmm. I see what you the trick. Then it seems that in our Turing.jl program one should write something like this

@model test() = begin
   x ~ Normal(0, 1)
   1 ~ Normal(x[1], 1)
end

Need to test it on another computer of mine later - will update when I do that.

yebai · 2018-01-24T19:37:51Z

@ChrisRackauckas @jrevels Are there any APIs for passing scalar to target function when using ReverseDiff? (see the comment from Kai: #331 (comment))

jrevels · 2018-01-24T21:26:39Z

Are there any APIs for passing scalar to target function when using ReverseDiff?

No, see JuliaDiff/ReverseDiff.jl#40.

You can, of course, always (un)wrap the scalar input e.g.

x = rand()
y = rand(4)
f(x::Real, y::AbstractArray) = # some implementation returning a scalar
ReverseDiff.gradient((x, y) -> f(x[1], y), (fill(x), y))

xukai92 · 2018-01-25T02:24:01Z

@yebai The test below works on my local Turing.jl by simply changing all the types in VarInfo which were Real to Any when necessary (I pushed my local temporarily-fixed code in the branch reverse-diff)

using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s[1]))
  x[1] ~ Normal(m[1], sqrt(s[1]))
  x[2] ~ Normal(m[1], sqrt(s[1]))
  return s, m
end

mf = gdemo([1.5, 2.0])
vi = mf()

f_turing(x) = begin
    print(typeof(x))
    vi.vals[1] = x
    mf(vi).logp
end

gradient(f_turing, ([1.0]))

xukai92 · 2018-02-22T21:11:47Z

@jrevels I managed to use ReverseDiff.jl in Turing.jl now (branch=https://github.com/yebai/Turing.jl/tree/reverse-diff)

When I integrate it with Turing.jl, I experienced a huge performance change by using a local variable to accumulate log-probability (d641f42) instead of accumulating log-probability using a field of a composite-type variable. To be more specific, when I increase the dimensionality, the time used to get AD increases 100x slower than before.

The change I made is something like

type LP
    lp::Union{Float64,ReverseDiff.TrackedReal}
end

acc!(x::Union{Float64,ReverseDiff.TrackedReal}, d::Distribution, lp::LP) = lp.lp += logpdf(d, x)

lp = LP(0.0)

f3(x) = begin
    lp.lp = 0.0
    acc!(x[1],  Normal(0, 1), lp)
    for i = 2:length(x)
        acc!(x[i], Normal(x[i-1], 1), lp)
    end
    lp.lp
end

to

lp = LP(0.0)

f1(x) = begin
    _lp::Union{Float64,ReverseDiff.TrackedReal} = 0.0
    _lp += logpdf(Normal(0, 1), x[1])
    for i = 2:length(x)
        _lp += logpdf(Normal(x[i-1], 1), x[i])
    end
    lp.lp = _lp
    lp.lp
end

I didn't understand why that happened and also didn't managed to get a simplest case to show the difference. I wonder how ReverseDiff.jl is sensitive to the following situations?

Nested function
Unstable type

ChrisRackauckas · 2018-02-22T21:13:53Z

type LP
    lp::Union{Float64,ReverseDiff.TrackedReal}
end

that's not type stable to access. Use a type parameter

type LP{T}
    lp::T
end

xukai92 · 2018-02-22T21:20:52Z

Thanks Chirs. Is that true if it's a local variable Julia might be able to infer the type?

ChrisRackauckas · 2018-02-22T21:24:16Z

If the code is type-stable, yes. Those type markings you're putting everywhere, except the one I mentioned, are unnecessary for type inference.

xukai92 · 2018-02-22T21:36:53Z

I see. A follow-up question is that how much performance of ReverseDiff.jl's is influenced by un-stable type? Using the example above, I only get ~3x speed-up. But what I experienced in Turing is much larger. Is it the fact that if part of the code is type unstable, all other function calls may be influenced?

ChrisRackauckas · 2018-02-22T21:37:54Z

Yes, inference problems "leak". If you don't know the type of a, then you don't know the type of c = a+b, and then ...

emilemathieu closed this as completed Aug 18, 2017

emilemathieu reopened this Aug 28, 2017

emilemathieu closed this as completed Aug 30, 2017

emilemathieu reopened this Nov 9, 2017

xukai92 mentioned this issue Feb 24, 2018

AISTATS camera-ready plan #424

Closed

12 tasks

xukai92 mentioned this issue Apr 25, 2018

Broken wiki example #441

Closed

xukai92 closed this as completed Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoDiff #331

AutoDiff #331

emilemathieu commented Aug 18, 2017

emilemathieu commented Aug 18, 2017

yebai commented Aug 18, 2017

emilemathieu commented Aug 28, 2017

yebai commented Aug 29, 2017

emilemathieu commented Aug 29, 2017

ChrisRackauckas commented Aug 29, 2017

emilemathieu commented Nov 9, 2017

ChrisRackauckas commented Nov 9, 2017

yebai commented Jan 18, 2018 •

edited

Loading

xukai92 commented Jan 18, 2018

xukai92 commented Jan 18, 2018

ChrisRackauckas commented Jan 18, 2018

xukai92 commented Jan 18, 2018

ChrisRackauckas commented Jan 18, 2018

yebai commented Jan 18, 2018 •

edited

Loading

ChrisRackauckas commented Jan 18, 2018

jrevels commented Jan 18, 2018

xukai92 commented Jan 19, 2018

yebai commented Jan 24, 2018 •

edited

Loading

xukai92 commented Jan 24, 2018

yebai commented Jan 24, 2018 •

edited

Loading

jrevels commented Jan 24, 2018

xukai92 commented Jan 25, 2018

xukai92 commented Feb 22, 2018 •

edited

Loading

ChrisRackauckas commented Feb 22, 2018 •

edited

Loading

xukai92 commented Feb 22, 2018

ChrisRackauckas commented Feb 22, 2018

xukai92 commented Feb 22, 2018

ChrisRackauckas commented Feb 22, 2018

AutoDiff #331

AutoDiff #331

Comments

emilemathieu commented Aug 18, 2017

emilemathieu commented Aug 18, 2017

yebai commented Aug 18, 2017

emilemathieu commented Aug 28, 2017

yebai commented Aug 29, 2017

emilemathieu commented Aug 29, 2017

ChrisRackauckas commented Aug 29, 2017

emilemathieu commented Nov 9, 2017

ChrisRackauckas commented Nov 9, 2017

yebai commented Jan 18, 2018 • edited Loading

xukai92 commented Jan 18, 2018

xukai92 commented Jan 18, 2018

ChrisRackauckas commented Jan 18, 2018

xukai92 commented Jan 18, 2018

ChrisRackauckas commented Jan 18, 2018

yebai commented Jan 18, 2018 • edited Loading

ChrisRackauckas commented Jan 18, 2018

jrevels commented Jan 18, 2018

xukai92 commented Jan 19, 2018

yebai commented Jan 24, 2018 • edited Loading

xukai92 commented Jan 24, 2018

yebai commented Jan 24, 2018 • edited Loading

jrevels commented Jan 24, 2018

xukai92 commented Jan 25, 2018

xukai92 commented Feb 22, 2018 • edited Loading

ChrisRackauckas commented Feb 22, 2018 • edited Loading

xukai92 commented Feb 22, 2018

ChrisRackauckas commented Feb 22, 2018

xukai92 commented Feb 22, 2018

ChrisRackauckas commented Feb 22, 2018

yebai commented Jan 18, 2018 •

edited

Loading

yebai commented Jan 18, 2018 •

edited

Loading

yebai commented Jan 24, 2018 •

edited

Loading

yebai commented Jan 24, 2018 •

edited

Loading

xukai92 commented Feb 22, 2018 •

edited

Loading

ChrisRackauckas commented Feb 22, 2018 •

edited

Loading