Have destructure return only trainable params #1742

CarloLucibello · 2021-10-11T22:04:56Z

Have destructure return only trainable params
+
functorize RefValue (this part should go into Functors.jl)

Adding also tests from #1614

DhairyaLGandhi · 2021-10-11T22:33:49Z

This seems quite a large amount of changes for most of Flux assumptions compared to a more specific fix needed for #1733

test/utils.jl

darsnack · 2021-10-12T13:38:56Z

Will take a more detailed pass later today. One early suggestion: doesn't it make more sense to move destructure and friends to functor.jl instead of the other way round? Also, let's just go ahead and move the RefValue fix to Functors.jl now.

test/functor.jl

CarloLucibello · 2021-10-16T08:47:09Z

Moved destructure to functors.jl, and created test/functors.jl where I moved some of the tests and added new ones (I left some comments above to mark them).

Unfortunately, I didn't manage to use in params the same infrastructure of destructure without causing regressions in some cases that are caught by the newly added tests. I'll leave that for future work.

@ChrisRackauckas is there any specific package whose test I can run to see if the changes to destructure are breaking something downstream?

ChrisRackauckas · 2021-10-16T13:02:24Z

The DiffEqFlux and NeuralPDE tests should be the only two using this.

ChrisRackauckas · 2021-10-16T13:04:10Z

Though we're in chaos because Zygote updates seem to have broken a lot, again. DiffEqFlux tests fail because Zygote returns zero for the gradients on FFJORD (SciML/DiffEqFlux.jl#635), and NeuralPDE fails because of a tuple multiplication in an update of ChainRules (SciML/NeuralPDE.jl#412).

😱 😱 😱 😱 😱 (Please help me) 😱 😱 😱 😱 😱

DhairyaLGandhi · 2021-10-16T13:19:52Z

I would go for a more specific fix for the sciml failures. We have seen the *(::Tuple) failures in a number of places. @ChrisRackauckas #1727 (comment) could you try with this definition of destructure. The failures related to CR would either need to be fixed there or we use a different adjoint.

CarloLucibello · 2021-10-17T06:24:06Z

I would go for a more specific fix for the sciml failures. We have seen the *(::Tuple) failures in a number of places. @ChrisRackauckas #1727 (comment) could you try with this definition of destructure. The failures related to CR would either need to be fixed there or we use a different adjoint.

This comment is not very clear, but maybe you are confusing things. The RefValue problem related to #1727 (comment) has already been fixed in Functors and Zygote.
Here we fix the residual problem of having destructure return only trainable params (as the title suggests).
The CR/Zygote changes impacting NeurlPDE and DiffEqFlux are not related to Flux or this PR, and undoubtedly will be fixed somewhere else.

CarloLucibello · 2021-10-20T07:52:27Z

Both DiffEqFlux and NeuralPDE's tests error for me on Flux#master and on this branch as well.
Let's wait for things in those 2 repos to be fixed before testing again and merge this.

test/functor.jl

mcabbott

Slightly random comments, mostly on things not new to this PR:

mcabbott · 2022-01-15T19:14:33Z

src/functor.jl

+"""
+    destructure(m)
+Flatten a model's parameters into a single weight vector.
+    julia> m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)


This wants to be a jldoctest, and should include something like BatchNorm, to illustrate which parameter count becomes the length of θ.

It should also say that x isa AbstractArray{<:Number} and unique objectid(x) are the criteria for inclusion.

A related question is: Is this the right test? Should these be independent?

julia> x = [1,2,3]; julia> objectid(x), objectid(x'), objectid(transpose(x)) (0x3aed6805416fa931, 0xab1cb79f2fe03e53, 0x731d5dafc3a51f2b)

You could for instance keep calling parent until it stops, and that the parameter. Or maybe such types should be unwrapped by Functors?

As you know the situation with wrapped shared parameters is pretty complex, I'm not going to address those issues here

mcabbott · 2022-01-15T19:15:07Z

src/functor.jl

+function destructure(m)
+  xs = Zygote.Buffer([])
+  collect_params!(xs, m)
+  return vcat(vec.(copy(xs))...), p -> _restructure(m, p)


Why does this copy? (And splat?)

And, how easy would it be to avoid Buffer somehow, to make this ready for not using Zygote?

With

function destructure(m) xs = AbstractArray[] collect_params!(xs, m) return vcat(vec.(xs)...), p -> _restructure(m, p) end

Flux's tests still pass. I still have to test the interaction with DiffEqFlux, NeuralPDE, let's see

DiffEqFlux tests pass with both Buffer() and AbstractArray[].

@ChrisRackauckas you see any particular reason to keep Buffer?

I don't see a reason to use Buffer here.

I guess that's for some potential higher order AD issue?

Or just first-order AD. AFAICT Flux's current test suite never tests the gradient of destructure (only restructuring) 🙈...

🙈 is the Flux motto, really.

src/functor.jl

ToucheSir · 2022-01-15T21:45:02Z

src/functor.jl

+          # plus 2 non-trainable, 10 parameters, summarysize 836 bytes.
+```
+
+Only numerical arrays are collected by `destructe`. Moreover, if the same array is nested multiple times in the same model (e.g. shared by some layers)


Suggested change

Only numerical arrays are collected by `destructe`. Moreover, if the same array is nested multiple times in the same model (e.g. shared by some layers)

Only numerical arrays are collected by `destructure`. Moreover, if the same array is nested multiple times in the same model (e.g. shared by some layers),

mcabbott · 2022-01-16T00:52:50Z

src/functor.jl

+@adjoint function _restructure(m, xs)
+  m̄, numel = _restructure(m, xs), length(xs)
+  function _restructure_pullback(dm)
+    xs′ = destructure(dm)[1]


This gradient can easily be wrong, it looks for duplicates in the gradient which can come from e.g. adding two parameters x + y. It is completely unaware of duplicates in the original.

Demonstration:
#1826 (comment)

So at very least, we must (1) disable the removal of duplicates from destructure used here, and (2) throw an error if you try to use this adjoint when the original model had any gradients.

Or, failing that, we should remove it from v0.13 until someone can write a version which actually works.

CarloLucibello · 2022-02-05T20:08:16Z

@mcabbott should I close this? Maybe you have some better solution at this point

mcabbott · 2022-02-05T20:22:21Z

I have written many things, e.g. FluxML/Functors.jl#31, but they aren't going to be ready tomorrow. Maybe v0.13 should change the documented behaviour to trainable, without aspiring to fix all the bugs?

How difficult would it be to at least throw an error if there are repeated paramters?

Edit -- maybe closer now, FluxML/Optimisers.jl#54 ?

mcabbott · 2022-03-05T16:42:43Z

This is now the last issue on the v0.13 milestone.

I think what we should do is simply delete destructure completely, and call the one from Optimisers. Apart from fixing bugs it ought to be a drop-in replacement for the one here -- that is, it also only keeps trainable parameters.

darsnack · 2022-03-05T16:49:13Z

Agreed, we should try that and run the downstream tests. If everything passes, then there is no reason not to.

CarloLucibello · 2022-03-05T17:28:37Z

agreed. I'll leave this PR open as a reminder for the milestone, but I think it is better to start clean in a new PR and cherry-pick from here some of the tests

CarloLucibello marked this pull request as draft October 11, 2021 22:05

CarloLucibello changed the title ~~destructure returns only trainable params~~ Have destructure return only trainable params Oct 11, 2021

CarloLucibello commented Oct 12, 2021

View reviewed changes

test/utils.jl Outdated Show resolved Hide resolved

CarloLucibello marked this pull request as ready for review October 12, 2021 08:14

CarloLucibello force-pushed the cl/params2 branch from 8fa7352 to c8a1d1e Compare October 12, 2021 08:19

CarloLucibello mentioned this pull request Oct 13, 2021

functor RefValue FluxML/Functors.jl#26

Merged

CarloLucibello force-pushed the cl/params2 branch from c8a1d1e to a649870 Compare October 16, 2021 08:38

CarloLucibello commented Oct 16, 2021

View reviewed changes

test/functor.jl Show resolved Hide resolved

CarloLucibello commented Oct 16, 2021

View reviewed changes

test/functor.jl Show resolved Hide resolved

CarloLucibello added this to the v0.13 milestone Dec 14, 2021

mcabbott mentioned this pull request Jan 12, 2022

v0.13 deprecations #1751

Merged

destructure returns only trainable params

4dc70b5

CarloLucibello force-pushed the cl/params2 branch from 0400686 to 4dc70b5 Compare January 15, 2022 18:21

mcabbott reviewed Jan 15, 2022

View reviewed changes

test/functor.jl Outdated Show resolved Hide resolved

mcabbott reviewed Jan 15, 2022

View reviewed changes

CarloLucibello added 4 commits January 15, 2022 21:16

docs and simplify tests

0f24e95

address review comments

3a8eed4

destructure docstring

deed805

more docs improve

f179c4d

ToucheSir reviewed Jan 15, 2022

View reviewed changes

mcabbott reviewed Jan 16, 2022

View reviewed changes

mcabbott mentioned this pull request Mar 5, 2022

Use destructure from Optimisers.jl #1901

Merged

CarloLucibello closed this Mar 8, 2022

CarloLucibello deleted the cl/params2 branch April 7, 2022 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have destructure return only trainable params #1742

Have destructure return only trainable params #1742

CarloLucibello commented Oct 11, 2021 •

edited

Loading

DhairyaLGandhi commented Oct 11, 2021

darsnack commented Oct 12, 2021

CarloLucibello commented Oct 16, 2021 •

edited

Loading

ChrisRackauckas commented Oct 16, 2021

ChrisRackauckas commented Oct 16, 2021

DhairyaLGandhi commented Oct 16, 2021

CarloLucibello commented Oct 17, 2021

CarloLucibello commented Oct 20, 2021

mcabbott left a comment

mcabbott Jan 15, 2022

mcabbott Jan 15, 2022

CarloLucibello Jan 18, 2022

mcabbott Jan 15, 2022

CarloLucibello Jan 15, 2022

CarloLucibello Jan 18, 2022

ChrisRackauckas Jan 18, 2022

ChrisRackauckas Jan 18, 2022

ToucheSir Jan 18, 2022

mcabbott Jan 18, 2022

ToucheSir Jan 15, 2022

mcabbott Jan 16, 2022

CarloLucibello commented Feb 5, 2022

mcabbott commented Feb 5, 2022 •

edited

Loading

mcabbott commented Mar 5, 2022

darsnack commented Mar 5, 2022

CarloLucibello commented Mar 5, 2022

	Only numerical arrays are collected by `destructe`. Moreover, if the same array is nested multiple times in the same model (e.g. shared by some layers)
	Only numerical arrays are collected by `destructure`. Moreover, if the same array is nested multiple times in the same model (e.g. shared by some layers),

Have destructure return only trainable params #1742

Have destructure return only trainable params #1742

Conversation

CarloLucibello commented Oct 11, 2021 • edited Loading

DhairyaLGandhi commented Oct 11, 2021

darsnack commented Oct 12, 2021

CarloLucibello commented Oct 16, 2021 • edited Loading

ChrisRackauckas commented Oct 16, 2021

ChrisRackauckas commented Oct 16, 2021

DhairyaLGandhi commented Oct 16, 2021

CarloLucibello commented Oct 17, 2021

CarloLucibello commented Oct 20, 2021

mcabbott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Feb 5, 2022

mcabbott commented Feb 5, 2022 • edited Loading

mcabbott commented Mar 5, 2022

darsnack commented Mar 5, 2022

CarloLucibello commented Mar 5, 2022

CarloLucibello commented Oct 11, 2021 •

edited

Loading

CarloLucibello commented Oct 16, 2021 •

edited

Loading

mcabbott commented Feb 5, 2022 •

edited

Loading