-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc update (saving.md): removed outdated info; Typo fix. #1762
Conversation
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a nice addition! The statement about IdDicts seems to be an issue with BSON rather than anything so I would retain the blurb about saving optimiser state. Would you mind holding on to the code snippet?
bors r- |
Canceled. |
We don't necessarily need the change in this case
I think the idea here is to not give false claims as "You can even store optimizer state alongside the model, to resume training exactly where you left off.", since we don't have the infrastructure to resume training. |
docs/src/saving.md
Outdated
opt = ADAM() | ||
@save "model-$(now()).bson" model opt | ||
``` | ||
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets. | |
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets. |
I don't think BSON is the problem here, seems more on the flux side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's impossible to serialize IdDict
s? I suggested the following strategy but @ToucheSir said objectid
was not guaranteed to return unique IDs.
- Check if all the keys of the
IdDict
are present in the objects we are going to save to the BSON- Put some ID tag on these objects (possibly their
objectid
)- Store these IDs for keys of the
IdDict
- When loading an
IdDict
, we can construct a newIdDict
and populate it from the newly loaded objects
If IdDict
s are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?
Anyhow, the reason I added the BSON
blurb was because BSON
can handle IdDict
s if a custom hack is employed. I.e., I wanted to emphasize that optimizers can be saved with some manual code, but they are not automagically "handled." Can you perhaps rephrase this better? Perhaps we can put a link to the workaround in the docs?
I don't think most new users will know what an IdDict
is, or if BSON
supports it or not. As a matter of fact, seeing the issue/PR history of BSON
, I thought BSON
did support IdDict
s.
Another thing that comes to mind is BSON
's support for closures. We should mention this, as it is quite confusing, and not usually supported. (E.g., Python's FastAI disavows lambdas in its models exactly for this reason.) BSON
itself is very sparsely documented; What are the limitations? E.g., when a closure is serialized, are its captured variables also serialized? Will this set captured globals in the global scope?
a = 0
f = ()-> global a += 1
Perhaps our best approach is to add a limitations section to BSON
's readme, then link to it from the Flux docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If IdDicts are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?
We are planning to switch away from the dict of params approach entirely, see #1481. This can't happen overnight, as a lot of downstream code relies on implicit params. In fact, it would be the biggest breaking change since Tracker was swapped out for Zygote ~2 years ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref. #737 (comment), I was wrong about the inability to save IdDict
s in optimizer state. I think it would be best to revert the changes in this section entirely in favour of adding a note after https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md?plain=1#L127. That note could a) warn that models and optimizers must be saved together to have the latter work when restoring, and b) explain why (BSON.@save filename xs...
can handle shared references/values in xs
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ToucheSir The other section of the PR is just a typo; I can close this PR, and you can submit a new one with the details about BSON
and the typo fixed. BTW, adding this paragraph in that new PR is still helpful IMO:
Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.
Or we can merge this PR, and you can re-add the section on BSON, with more details this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way to go about things would be to restore the deleted code block below. That and this paragraph should be enough to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ToucheSir I pushed an update. Does it resolve all the issues?
Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Build failed: |
No description provided.