-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistency between params and destructure #1733
Comments
They're pretty different functions and meant for different reasons. |
Seems exactly the same use to me, training, which means that both functions should output what is meant to be trained. I think we need some input from SciML people, since they are prominent consumers of |
Not just training, some libraries use it for initial guesses as well. I'd prefer to not make trainable a "special" requirement for defining layers |
Differing use cases aside, making |
I thought it only grabbed the |
We can switch to using the new |
It currently excluded some values that we would want included at higher order cases, and on our end it's actually easier to ignore the objects that don't have gradients which can differ depending on layer configurations by grabbing the gradients from the NamedTuple directly. That is a nice and generic approach. |
In my understanding,
destructure
is supposed to behave similarly toparams
and collect trainable params of a struct into a vector. Instead, it collects all arrays, as one can see in this example using BatchNorm:Should we modify
destructure
to act likeparams
, therefore recurse overtrainable(m)
instead of just applyingfmap
as it currently does?Also, maybe we should add to both
params
anddestructure
a keyword argumentwhich
with possible values:trainable
,:buffer
,:all
, and default value:trainable
.Related to #1727
The text was updated successfully, but these errors were encountered: