Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc update (saving.md): removed outdated info; Typo fix. #1762

Merged
merged 3 commits into from
Dec 12, 2021

Conversation

NightMachinery
Copy link
Contributor

No description provided.

ToucheSir
ToucheSir previously approved these changes Nov 7, 2021
@DhairyaLGandhi
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Nov 7, 2021
1762: Doc update (saving.md): removed outdated info; Typo fix. r=DhairyaLGandhi a=NightMachinary



Co-authored-by: NightMachinary <rudiwillalwaysloveyou@gmail.com>
Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a nice addition! The statement about IdDicts seems to be an issue with BSON rather than anything so I would retain the blurb about saving optimiser state. Would you mind holding on to the code snippet?

@DhairyaLGandhi
Copy link
Member

bors r-

@bors
Copy link
Contributor

bors bot commented Nov 7, 2021

Canceled.

@DhairyaLGandhi DhairyaLGandhi dismissed their stale review November 7, 2021 11:28

We don't necessarily need the change in this case

@CarloLucibello
Copy link
Member

I think the idea here is to not give false claims as "You can even store optimizer state alongside the model, to resume training exactly where you left off.", since we don't have the infrastructure to resume training.
We need something like #737 (comment) but in the meantime we should avoid giving false information

opt = ADAM()
@save "model-$(now()).bson" model opt
```
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets.
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.

I don't think BSON is the problem here, seems more on the flux side

Copy link
Contributor Author

@NightMachinery NightMachinery Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's impossible to serialize IdDicts? I suggested the following strategy but @ToucheSir said objectid was not guaranteed to return unique IDs.

  • Check if all the keys of the IdDict are present in the objects we are going to save to the BSON
  • Put some ID tag on these objects (possibly their objectid)
  • Store these IDs for keys of the IdDict
  • When loading an IdDict, we can construct a new IdDict and populate it from the newly loaded objects

If IdDicts are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?


Anyhow, the reason I added the BSON blurb was because BSON can handle IdDicts if a custom hack is employed. I.e., I wanted to emphasize that optimizers can be saved with some manual code, but they are not automagically "handled." Can you perhaps rephrase this better? Perhaps we can put a link to the workaround in the docs?

I don't think most new users will know what an IdDict is, or if BSON supports it or not. As a matter of fact, seeing the issue/PR history of BSON, I thought BSON did support IdDicts.

Another thing that comes to mind is BSON's support for closures. We should mention this, as it is quite confusing, and not usually supported. (E.g., Python's FastAI disavows lambdas in its models exactly for this reason.) BSON itself is very sparsely documented; What are the limitations? E.g., when a closure is serialized, are its captured variables also serialized? Will this set captured globals in the global scope?

a = 0
f = ()-> global a += 1

Perhaps our best approach is to add a limitations section to BSON's readme, then link to it from the Flux docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IdDicts are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?

We are planning to switch away from the dict of params approach entirely, see #1481. This can't happen overnight, as a lot of downstream code relies on implicit params. In fact, it would be the biggest breaking change since Tracker was swapped out for Zygote ~2 years ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref. #737 (comment), I was wrong about the inability to save IdDicts in optimizer state. I think it would be best to revert the changes in this section entirely in favour of adding a note after https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md?plain=1#L127. That note could a) warn that models and optimizers must be saved together to have the latter work when restoring, and b) explain why (BSON.@save filename xs... can handle shared references/values in xs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ToucheSir The other section of the PR is just a typo; I can close this PR, and you can submit a new one with the details about BSON and the typo fixed. BTW, adding this paragraph in that new PR is still helpful IMO:

Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.

Or we can merge this PR, and you can re-add the section on BSON, with more details this time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way to go about things would be to restore the deleted code block below. That and this paragraph should be enough to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ToucheSir I pushed an update. Does it resolve all the issues?

docs/src/saving.md Outdated Show resolved Hide resolved
Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

bors bot added a commit that referenced this pull request Dec 12, 2021
1762: Doc update (saving.md): removed outdated info; Typo fix. r=ToucheSir a=NightMachinary



Co-authored-by: NightMachinary <rudiwillalwaysloveyou@gmail.com>
Co-authored-by: NightMachinary <36224762+NightMachinary@users.noreply.github.com>
@bors
Copy link
Contributor

bors bot commented Dec 12, 2021

Build failed:

@ToucheSir ToucheSir merged commit 878b39c into FluxML:master Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants