Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc][tune] Add tune checkpoint user guide. #33145

Merged
merged 14 commits into from
Mar 22, 2023

Conversation

xwjiang2010
Copy link
Contributor

@xwjiang2010 xwjiang2010 commented Mar 8, 2023

Add tune checkpoint user guide. Also updated storage-options and trainable page.

Why are these changes needed?

Related issue number

Closes #32659

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Also updated storage-options and trainable page.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
xwjiang2010 and others added 3 commits March 8, 2023 17:42
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
@ollie-iterators
Copy link

There is a warning for file "/ray/doc/source/ray-air/examples/index.rst" for line 29 that says: "WARNING: unknown document: /ray-air/examples/serving"

@angelinalg
Copy link
Contributor

Nice job, @xwjiang2010! I left some comments, mostly copy editing. Let me know if you have any questions.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few suggestions:

  • Can we rename the guide to just tune-checkpoints? This is what the old guide URL was, and it'd be nice to get redirected to this guide.
  • The appendix in the Tune storage guide was placed at the end because it's pretty unrelated to that user guide which about configuring storage/syncing. I think we should actually move it the beginning of this Tune checkpoint user guide. We'll first make the distinction of what type of checkpoint we're talking about in this guide.
  • Can we put the new code blocks in doc_code files that get linted+tested?
  • I think PTL does a fairly good job with their basic checkpointing guide. We can maybe follow this as an example: https://pytorch-lightning.readthedocs.io/en/stable/common/checkpointing_basic.html

doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/key-concepts.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-distributed.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-distributed.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-storage.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
@xwjiang2010
Copy link
Contributor Author

talked with Justin offline.

We are moving "three types of experiment data" to appendix under "tune-trial-checkpoint".
"tune-trial-checkpoint" guide will be solely for trial checkpoint.

xwjiang2010 and others added 5 commits March 13, 2023 11:56
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
@justinvyu justinvyu self-requested a review March 16, 2023 06:30
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good after this round!

doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/doc_code/trial_checkpoint.py Outdated Show resolved Hide resolved
doc/source/tune/doc_code/trial_checkpoint.py Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-storage.rst Outdated Show resolved Hide resolved
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
@xwjiang2010
Copy link
Contributor Author

@justinvyu Thanks for the great suggestions and tips! I updated the PR and did another round of proof read. Should be ready for a pass now.

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this looks good. Mostly just minor suggestions at this point, ready for @richardliaw to do a final pass!

doc/source/tune/tutorials/tune-storage.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/doc_code/trial_checkpoint.py Outdated Show resolved Hide resolved
doc/source/tune/doc_code/trial_checkpoint.py Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
doc/source/tune/tutorials/tune-trial-checkpoint.rst Outdated Show resolved Hide resolved
@xwjiang2010 xwjiang2010 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 22, 2023
Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamping to unblock, please address Justin comments

@xwjiang2010
Copy link
Contributor Author

xwjiang2010 commented Mar 22, 2023

All comments are already addressed!

@xwjiang2010 xwjiang2010 merged commit 979b9db into ray-project:master Mar 22, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
* [doc][tune] Add tune checkpoint user guide.

Also updated storage-options and trainable page.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix doc tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* move _tune-persisted-experiment-data

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address timeout issue

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
* [doc][tune] Add tune checkpoint user guide.

Also updated storage-options and trainable page.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix doc tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* move _tune-persisted-experiment-data

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address timeout issue

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
@xwjiang2010 xwjiang2010 deleted the tune_checkpoint_guide branch July 26, 2023 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tune] Working with checkpoints user guide
5 participants