Activations #860

dsweber2 · 2019-09-10T17:51:03Z

Taking derivatives w.r.t. the parameters results in complaints about mutability (mwe below). To get around this, I made the storage array a Zygote.Buffer, and then return a copy after inserting everything. I tried an accumulate! based version, which worked on commit ecc9ce9, but broke when I caught up.

Simple example:

c = Chain(Dense(3,5,relu), Dense(5,1,relu))
X = Float32.([1.0; 1.0; 1.0])
gradient(()->Flux.activations(c, X)[2][1], params(c))

…tions

MikeInnes · 2019-09-11T12:59:50Z

Thanks a lot for the patch!

I wonder if we could just have activations return a tuple, similar to how applychain works? Then it should be Zygote-compatible by default.

dsweber2 · 2019-09-12T00:59:44Z

Yeah, that's definitely a nicer way to do this (in the new commit). There are a couple of extra terms in the gradient that I'm sort of mystified by, but they don't seem to be causing problems.

IdDict{Any,Any} with 6 entries:
  Float32[0.687749 -0.78338 0.341401; 0.6577… => Float32[0.0 0.0 0.0; 0.0 0.0 0.0; … ; 0.0 0.0 0.0; 0.0 0.0 0.0]
  Float32[0.0]                                => Float32[0.0]
  IOBuffer(data=UInt8[...], readable=true, w… => RefValue{Any}((data = nothing, readable = nothing, writable = nothing, seekable = not…
  IOBuffer(data=UInt8[...], readable=true, w… => RefValue{Any}((data = nothing, readable = nothing, writable = nothing, seekable = not…
  Float32[0.0, 0.0, 0.0, 0.0, 0.0]            => Float32[0.0, 0.0, 0.0, 0.0, 0.0]
  Float32[-0.745811 0.0290723 … -0.874623 -0… => Float32[0.0 0.0 … 0.0 0.0]

What do you think of adding a Chain method that takes in two arguments, the second being a list of indices of the depths at which to return the transform? That would remove the need of an activations function completely, though not really decrease the total number of lines of code.

MikeInnes · 2019-10-08T13:59:49Z

This looks good, thanks. Would be great to have a quick test for it so we don't miss it again.

What do you think of adding a Chain method that takes in two arguments, the second being a list of indices of the depths at which to return the transform?

I don't understand this; would be great to perhaps see an example of what this would look like.

dsweber2 · 2019-10-09T05:40:41Z

Something like (c::Chain)(x, i) = extraChain(c.layers, x)[i] in addition to what I wrote. Might be a more efficient implementation. Small usage example:

C = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
C(randn(10), :) # equivalent to
activations(C, randn(10))
# also allows
C(randn(10), [1,3]) # wouldn't return the second term

MikeInnes · 2019-11-08T12:39:50Z

src/layers/basic.jl

+    return (res, extraChain(Base.tail(fs), res)...)
+end
+
+extraChain(::Tuple{}, x) = []


This should probably be the empty tuple, so that the compiler can unroll everything

MikeInnes · 2019-11-08T12:41:59Z

test/layers/basic.jl

+  @testset "Activations" begin
+    c = Chain(Dense(3,5,relu), Dense(5,1,relu))
+    X = Float32.([1.0; 1.0; 1.0])
+    @test_nowarn gradient(()->Flux.activations(c, X)[2][1], params(c))


Can we add a regular test, i.e. making sure the output is right? Rather than chaining dense layers, it might be useful to chain something simple like Chain(x -> x^2, x -> x+1) or something so that the outputs and gradients are trivial.

Otherwise really happy with this patch!

Actually, I see that there are some other tests above here; what's the need for the additional @test_nowarn here? If it's redundant, it'd be best to remove.

…tions

dsweber2 · 2019-11-14T22:11:37Z

sorry about the extra commits; looks like rebasing to master made this a bit of a mess. I just made the two suggestions you made.

MikeInnes · 2019-11-15T10:59:29Z

src/layers/basic.jl

@@ -31,6 +31,8 @@ applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))

 (c::Chain)(x) = applychain(c.layers, x)

+(c::Chain)(x) = extraChain(c.layers, x)


This definition doesn't look right to me.

MikeInnes · 2019-11-15T11:00:07Z

No worries, it just needs to target the master branch rather than the old zygote branch. CI appears to have an issue BTW.

dsweber2 · 2019-11-16T20:51:57Z

CI seems to be because I wasn't fully on the master branch, though nightly is having some issue with initializing Zygote. The extra (c::Chain) definition was from the rewrite I was talking about, it should be gone now.

MikeInnes · 2019-11-19T16:23:12Z

bors r+

860: Activations r=MikeInnes a=dsweber2 Taking derivatives w.r.t. the parameters results in complaints about mutability (mwe below). To get around this, I made the storage array a `Zygote.Buffer`, and then return a copy after inserting everything. I tried an `accumulate!` based version, which worked on commit ecc9ce9, but broke when I caught up. Simple example: ``` c = Chain(Dense(3,5,relu), Dense(5,1,relu)) X = Float32.([1.0; 1.0; 1.0]) gradient(()->Flux.activations(c, X)[2][1], params(c)) ``` Co-authored-by: dsweber2 <david.weber2@gmail.com> Co-authored-by: Mosè Giordano <m.giordano@ucl.ac.uk>

bors · 2019-11-19T16:38:10Z

Build failed

ci/gitlab/gitlab.com

MikeInnes · 2019-11-19T16:44:21Z

Failure is #923.

MikeInnes · 2019-11-19T16:44:35Z

Thanks @dsweber2!

dsweber2 and others added 6 commits September 10, 2019 00:54

make activations zygote friendly

540b736

Restore purity

38790dd

make activations zygote friendly

82261b5

Merge branch 'activations' of github.com:dsweber2/Flux.jl into activa…

bb84aee

…tions

adding the extra commits broke the accumulate version

1bb25dc

deal with empty Chain

f412191

recursive way of doing activations

46abfbb

super simple test

3b7b780

MikeInnes reviewed Nov 8, 2019

View reviewed changes

dsweber2 added 8 commits November 14, 2019 13:40

make activations zygote friendly

cdaaca8

adding the extra commits broke the accumulate version

d0202a2

deal with empty Chain

99679f7

recursive way of doing activations

6475f6a

super simple test

db92b0e

bring activations into function call

0fe3ac4

simpler test

58c7947

Merge branch 'activations' of github.com:dsweber2/Flux.jl into activa…

89afa20

…tions

MikeInnes changed the base branch from zygote to master November 15, 2019 10:53

MikeInnes reviewed Nov 15, 2019

View reviewed changes

dsweber2 added 2 commits November 15, 2019 12:03

keeping activations separate

20eb840

Merge branch 'master' into activations

dea2953

MikeInnes merged commit 5839e16 into FluxML:master Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activations #860

Activations #860

dsweber2 commented Sep 10, 2019

MikeInnes commented Sep 11, 2019

dsweber2 commented Sep 12, 2019

MikeInnes commented Oct 8, 2019

dsweber2 commented Oct 9, 2019

MikeInnes Nov 8, 2019

MikeInnes Nov 8, 2019

MikeInnes Nov 15, 2019

dsweber2 commented Nov 14, 2019

MikeInnes Nov 15, 2019

MikeInnes commented Nov 15, 2019

dsweber2 commented Nov 16, 2019

MikeInnes commented Nov 19, 2019

bors bot commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

MikeInnes commented Nov 19, 2019

		@@ -31,6 +31,8 @@ applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))

		(c::Chain)(x) = applychain(c.layers, x)

		(c::Chain)(x) = extraChain(c.layers, x)

Activations #860

Activations #860

Conversation

dsweber2 commented Sep 10, 2019

MikeInnes commented Sep 11, 2019

dsweber2 commented Sep 12, 2019

MikeInnes commented Oct 8, 2019

dsweber2 commented Oct 9, 2019

MikeInnes Nov 8, 2019

Choose a reason for hiding this comment

MikeInnes Nov 8, 2019

Choose a reason for hiding this comment

MikeInnes Nov 15, 2019

Choose a reason for hiding this comment

dsweber2 commented Nov 14, 2019

MikeInnes Nov 15, 2019

Choose a reason for hiding this comment

MikeInnes commented Nov 15, 2019

dsweber2 commented Nov 16, 2019

MikeInnes commented Nov 19, 2019

bors bot commented Nov 19, 2019

Build failed

MikeInnes commented Nov 19, 2019

MikeInnes commented Nov 19, 2019