Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.Optimise.update! updating grads instead of params? #2121

Closed
Vilin97 opened this issue Nov 23, 2022 · 4 comments
Closed

Flux.Optimise.update! updating grads instead of params? #2121

Vilin97 opened this issue Nov 23, 2022 · 4 comments
Labels

Comments

@Vilin97
Copy link

Vilin97 commented Nov 23, 2022

Package Version

v0.13.7

Julia Version

1.8.2

OS / Environment

Windows 111

Describe the bug

Flux.Optimise.update! seems to update grads instead of params. I must be doing something wrong but this is the result I am getting.

Steps to Reproduce

using Flux
actual(x) = -x
x_train = hcat(0:5...)
y_train = actual.(x_train)
# predict = Dense(1 => 1)
predict = Chain(
  Dense(1 => 50, relu),
  Dense(50 => 50, relu),
  Dense(50 => 50, relu),
  Dense(50 => 1))
loss_(x, y) = sum( (predict(x) - y).^2 ) / sum(y_train.^2) ;
opt = Descent(10^-4)
parameters = Flux.params(predict)
grads = gradient(() -> loss_(x_train, y_train), parameters)
gr = maximum.([grads.grads[p] for p in Flux.params(predict)])
loss_(x_train, y_train)
p1=first(Flux.params(predict))
Flux.Optimise.update!(opt, Flux.params(predict), grads)
p2=first(Flux.params(predict)) # the parameter does not change 
loss_(x_train, y_train) # loss does not change
gr = maximum.([grads.grads[p] for p in Flux.params(predict)]) # all gradients go down by a factor of 10^4 -- the learning rate!

Expected Results

I was expecting the params(predict) to change, and the loss to go down.

Observed Results

Instead, the grads changed, and the loss and parameters of the NN did not change.

Relevant log output

No response

@Vilin97 Vilin97 added the bug label Nov 23, 2022
@Vilin97
Copy link
Author

Vilin97 commented Nov 23, 2022

I am 99% sure this is not a bug, and I am just doing something weird. But perhaps the fact that I am getting this behavior and cannot figure out what I am doing wrong points to an issue in documentation.

@mcabbott
Copy link
Member

Pasting that in I get initial & final loss 1.1265075f0 - 1.1257615f0, slightly changed. With Descent(0.1) instead 1.6649617f0 - 0.6593005f0, a bigger change.

Flux.Optimise does mutate the gradients. #2098 removed one effect of this (on v0.13.8) but not the one seen here.

@Vilin97
Copy link
Author

Vilin97 commented Nov 23, 2022

Hmm, you are right. I cannot reproduce the behavior I was observing anymore.
I do notice something weird though. The NN is unable to approximate the x -> -x function! Does this point to a mistake in my code? I would expect that approximating such an easy function would be a piece of cake.

using Flux, Random
Random.seed!(123)
actual(x) = -x
x_train = hcat(0:5...)
y_train = actual.(x_train)
loss_(x, y) = sum( (predict(x) - y).^2 ) / sum(y_train.^2) ;
for k in 1:6
    predict = Chain(
        Dense(1 => 50, relu),
        Dense(50 => 50, relu),
        Dense(50 => 50, relu),
        Dense(50 => 1));
    parameters = Flux.params(predict)
    grads = gradient(() -> loss_(x_train, y_train), parameters)
    learning_rate = 10. ^-k
    opt = Descent(learning_rate)
    loss_(x_train, y_train) # 1.241
    for _ in 1:10000 
        Flux.Optimise.update!(opt, Flux.params(predict), grads) 
    end
    @show learning_rate, loss_(x_train, y_train) 
    # (learning_rate, loss_(x_train, y_train)) = (0.1, 0.51563776f0)        
    # (learning_rate, loss_(x_train, y_train)) = (0.010000000000000002, 1.0894512f0)
    # (learning_rate, loss_(x_train, y_train)) = (0.001, 0.92247385f0)      
    # (learning_rate, loss_(x_train, y_train)) = (0.0001, 1.2685742f0)      
    # (learning_rate, loss_(x_train, y_train)) = (1.0e-5, 0.66605574f0)     
    # (learning_rate, loss_(x_train, y_train)) = (1.0e-6, 1.1450504f0) 
end

@Vilin97
Copy link
Author

Vilin97 commented Nov 27, 2022

The problem was not recomputing grads after each update! step.

@Vilin97 Vilin97 closed this as completed Nov 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants