-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc][tune] Add tune checkpoint user guide. #33145
[doc][tune] Add tune checkpoint user guide. #33145
Conversation
Also updated storage-options and trainable page. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
There is a warning for file "/ray/doc/source/ray-air/examples/index.rst" for line 29 that says: "WARNING: unknown document: /ray-air/examples/serving" |
Nice job, @xwjiang2010! I left some comments, mostly copy editing. Let me know if you have any questions. |
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A few suggestions:
- Can we rename the guide to just
tune-checkpoints
? This is what the old guide URL was, and it'd be nice to get redirected to this guide. - The appendix in the Tune storage guide was placed at the end because it's pretty unrelated to that user guide which about configuring storage/syncing. I think we should actually move it the beginning of this Tune checkpoint user guide. We'll first make the distinction of what type of checkpoint we're talking about in this guide.
- Can we put the new code blocks in
doc_code
files that get linted+tested? - I think PTL does a fairly good job with their basic checkpointing guide. We can maybe follow this as an example: https://pytorch-lightning.readthedocs.io/en/stable/common/checkpointing_basic.html
talked with Justin offline. We are moving "three types of experiment data" to appendix under "tune-trial-checkpoint". |
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good after this round!
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
@justinvyu Thanks for the great suggestions and tips! I updated the PR and did another round of proof read. Should be ready for a pass now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think this looks good. Mostly just minor suggestions at this point, ready for @richardliaw to do a final pass!
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamping to unblock, please address Justin comments
All comments are already addressed! |
* [doc][tune] Add tune checkpoint user guide. Also updated storage-options and trainable page. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * fix doc tests. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * move _tune-persisted-experiment-data Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * fix test Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address timeout issue Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> --------- Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Signed-off-by: elliottower <elliot@elliottower.com>
* [doc][tune] Add tune checkpoint user guide. Also updated storage-options and trainable page. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * fix doc tests. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * move _tune-persisted-experiment-data Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * fix test Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address timeout issue Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> * address comments Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> --------- Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Signed-off-by: Jack He <jackhe2345@gmail.com>
Add tune checkpoint user guide. Also updated storage-options and trainable page.
Why are these changes needed?
Related issue number
Closes #32659
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.