Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload weights after plateau #3245

Closed
wants to merge 3 commits into from
Closed

Conversation

DanBmh
Copy link
Contributor

@DanBmh DanBmh commented Aug 12, 2020

Reload checkpoint weights after reaching a plateau that we use the best_dev weights again

@community-tc-integration
Copy link

No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

@lissyx
Copy link
Collaborator

lissyx commented Aug 13, 2020

@DanBmh Thanks, can you elaborate a little bit? My mind is kind of somewhere else, so I'm unsure I get the point here.

@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 13, 2020

Currently training looks like this:

epoch 5: val_loss=62
e6: vl=59
e7: vl=60
e8: vl=61
Reached a plateau, LearningRate:=LR*0.1
e9: vl=60       <- Here we're using the weights from e8, with the suggested changes we're using e6 instead
e10: vl=58   <- We have an improvement but the network has to do some more work to fix the errors from e7+e8

The old approach did still work well but I think we can make it even better by reloading the weights from the best_dev checkpoint.

@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 14, 2020

Not ready yet!
Found an error when using --drop_source_layers flag

@DanBmh DanBmh changed the title Reload weights after plateau WIP: Reload weights after plateau Aug 14, 2020
@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 14, 2020

Working again:)

@DanBmh DanBmh changed the title WIP: Reload weights after plateau Reload weights after plateau Aug 14, 2020
@lissyx lissyx requested a review from reuben August 18, 2020 16:07
Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd like Reuben's opinion.

Comment on lines 648 to 649
# Reload checkpoint that we use the best_dev weights again
load_or_init_graph_for_training(session, allow_drop_layers=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this function unchanged and add a new explicitly load_best_checkpoint function, we don't want this call to load last silently for example.

@reuben
Copy link
Contributor

reuben commented Aug 19, 2020

Re-opened as #3261 to run tests.

reuben added a commit that referenced this pull request Aug 20, 2020
@reuben
Copy link
Contributor

reuben commented Aug 20, 2020

Merged in #3261

@reuben reuben closed this Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants