Doc update (saving.md): removed outdated info; Typo fix. #1762

NightMachinery · 2021-11-06T19:28:46Z

No description provided.

DhairyaLGandhi · 2021-11-07T11:24:31Z

bors r+

1762: Doc update (saving.md): removed outdated info; Typo fix. r=DhairyaLGandhi a=NightMachinary Co-authored-by: NightMachinary <rudiwillalwaysloveyou@gmail.com>

DhairyaLGandhi

Thanks, this looks like a nice addition! The statement about IdDicts seems to be an issue with BSON rather than anything so I would retain the blurb about saving optimiser state. Would you mind holding on to the code snippet?

DhairyaLGandhi · 2021-11-07T11:26:33Z

bors r-

bors · 2021-11-07T11:26:34Z

Canceled.

We don't necessarily need the change in this case

CarloLucibello · 2021-11-07T12:21:27Z

I think the idea here is to not give false claims as "You can even store optimizer state alongside the model, to resume training exactly where you left off.", since we don't have the infrastructure to resume training.
We need something like #737 (comment) but in the meantime we should avoid giving false information

CarloLucibello · 2021-11-07T12:23:03Z

docs/src/saving.md

-opt = ADAM()
-@save "model-$(now()).bson" model opt
-```
+Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets.


Suggested change

Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets.

Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.

I don't think BSON is the problem here, seems more on the flux side

Do you think it's impossible to serialize IdDicts? I suggested the following strategy but @ToucheSir said objectid was not guaranteed to return unique IDs.

Check if all the keys of the IdDict are present in the objects we are going to save to the BSON

Put some ID tag on these objects (possibly their objectid)

Store these IDs for keys of the IdDict

When loading an IdDict, we can construct a new IdDict and populate it from the newly loaded objects

If IdDicts are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?

Anyhow, the reason I added the BSON blurb was because BSON can handle IdDicts if a custom hack is employed. I.e., I wanted to emphasize that optimizers can be saved with some manual code, but they are not automagically "handled." Can you perhaps rephrase this better? Perhaps we can put a link to the workaround in the docs?

I don't think most new users will know what an IdDict is, or if BSON supports it or not. As a matter of fact, seeing the issue/PR history of BSON, I thought BSON did support IdDicts.

Another thing that comes to mind is BSON's support for closures. We should mention this, as it is quite confusing, and not usually supported. (E.g., Python's FastAI disavows lambdas in its models exactly for this reason.) BSON itself is very sparsely documented; What are the limitations? E.g., when a closure is serialized, are its captured variables also serialized? Will this set captured globals in the global scope?

a = 0 f = ()-> global a += 1

Perhaps our best approach is to add a limitations section to BSON's readme, then link to it from the Flux docs.

If IdDicts are fundamentally hard to serialize, perhaps Flux should switch to another dictionary type that is designed to be serializable?

We are planning to switch away from the dict of params approach entirely, see #1481. This can't happen overnight, as a lot of downstream code relies on implicit params. In fact, it would be the biggest breaking change since Tracker was swapped out for Zygote ~2 years ago.

Ref. #737 (comment), I was wrong about the inability to save IdDicts in optimizer state. I think it would be best to revert the changes in this section entirely in favour of adding a note after https://github.com/FluxML/Flux.jl/blob/master/docs/src/saving.md?plain=1#L127. That note could a) warn that models and optimizers must be saved together to have the latter work when restoring, and b) explain why (BSON.@save filename xs... can handle shared references/values in xs).

@ToucheSir The other section of the PR is just a typo; I can close this PR, and you can submit a new one with the details about BSON and the typo fixed. BTW, adding this paragraph in that new PR is still helpful IMO:

Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.

Or we can merge this PR, and you can re-add the section on BSON, with more details this time.

The easiest way to go about things would be to restore the deleted code block below. That and this paragraph should be enough to merge.

@ToucheSir I pushed an update. Does it resolve all the issues?

docs/src/saving.md

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

ToucheSir

bors r+

1762: Doc update (saving.md): removed outdated info; Typo fix. r=ToucheSir a=NightMachinary Co-authored-by: NightMachinary <rudiwillalwaysloveyou@gmail.com> Co-authored-by: NightMachinary <36224762+NightMachinary@users.noreply.github.com>

bors · 2021-12-12T19:15:04Z

Build failed:

buildkite/flux-dot-jl

Doc update (saving.md): removed outdated info; Typo fix.

9b716fc

NightMachinery mentioned this pull request Nov 6, 2021

saving ADAM optimizer is broken [@save] [BSON] #737

Closed

ToucheSir previously approved these changes Nov 7, 2021

View reviewed changes

bors bot added a commit that referenced this pull request Nov 7, 2021

Merge #1762

1c6edbc

1762: Doc update (saving.md): removed outdated info; Typo fix. r=DhairyaLGandhi a=NightMachinary Co-authored-by: NightMachinary <rudiwillalwaysloveyou@gmail.com>

DhairyaLGandhi previously requested changes Nov 7, 2021

View reviewed changes

CarloLucibello reviewed Nov 7, 2021

View reviewed changes

NightMachinery dismissed ToucheSir’s stale review via d238cc3 December 12, 2021 18:43

Fixed: saving.md

4d3fd75

NightMachinery force-pushed the doc1 branch from d238cc3 to 4d3fd75 Compare December 12, 2021 18:45

ToucheSir reviewed Dec 12, 2021

View reviewed changes

docs/src/saving.md Outdated Show resolved Hide resolved

Update docs/src/saving.md

3207000

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

ToucheSir approved these changes Dec 12, 2021

View reviewed changes

ToucheSir merged commit 878b39c into FluxML:master Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc update (saving.md): removed outdated info; Typo fix. #1762

Doc update (saving.md): removed outdated info; Typo fix. #1762

NightMachinery commented Nov 6, 2021

DhairyaLGandhi commented Nov 7, 2021

DhairyaLGandhi left a comment •

edited

Loading

DhairyaLGandhi commented Nov 7, 2021

bors bot commented Nov 7, 2021

CarloLucibello commented Nov 7, 2021

CarloLucibello Nov 7, 2021

NightMachinery Nov 7, 2021 •

edited

Loading

ToucheSir Nov 7, 2021

ToucheSir Dec 12, 2021

NightMachinery Dec 12, 2021

ToucheSir Dec 12, 2021

NightMachinery Dec 12, 2021

ToucheSir left a comment

bors bot commented Dec 12, 2021

	Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state, which is not automatically handled by `BSON`), and the randomness used to partition the original data into the training and validation sets.
	Note that to resume a model's training, you might need to restore other stateful parts of your training loop. Possible examples are stateful optimizers (which usually utilize an `IdDict` to store their state), and the randomness used to partition the original data into the training and validation sets.

Doc update (saving.md): removed outdated info; Typo fix. #1762

Doc update (saving.md): removed outdated info; Typo fix. #1762

Conversation

NightMachinery commented Nov 6, 2021

DhairyaLGandhi commented Nov 7, 2021

DhairyaLGandhi left a comment • edited Loading

Choose a reason for hiding this comment

DhairyaLGandhi commented Nov 7, 2021

bors bot commented Nov 7, 2021

CarloLucibello commented Nov 7, 2021

CarloLucibello Nov 7, 2021

Choose a reason for hiding this comment

NightMachinery Nov 7, 2021 • edited Loading

Choose a reason for hiding this comment

ToucheSir Nov 7, 2021

Choose a reason for hiding this comment

ToucheSir Dec 12, 2021

Choose a reason for hiding this comment

NightMachinery Dec 12, 2021

Choose a reason for hiding this comment

ToucheSir Dec 12, 2021

Choose a reason for hiding this comment

NightMachinery Dec 12, 2021

Choose a reason for hiding this comment

ToucheSir left a comment

Choose a reason for hiding this comment

bors bot commented Dec 12, 2021

DhairyaLGandhi left a comment •

edited

Loading

NightMachinery Nov 7, 2021 •

edited

Loading