Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Lightning-AI / pytorch-lightning Public

Notifications You must be signed in to change notification settings
Fork 3.4k
Star 28.4k

Code
Issues 811
Pull requests 64
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[see #10061 instead] Unify checkpoint load paths #9693

Closed

jjenniferdai wants to merge 15 commits into Lightning-AI:master from jjenniferdai:unify-cp-load-paths

Closed

[see #10061 instead] Unify checkpoint load paths #9693

jjenniferdai wants to merge 15 commits into Lightning-AI:master from jjenniferdai:unify-cp-load-paths

Conversation 28 Commits 15 Checks 0 Files changed

Conversation

Copy link

Contributor

jjenniferdai commented Sep 24, 2021 •

edited

Loading

What does this PR do?

Fixes #9405

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Sorry, something went wrong.

All reactions

jjenniferdai and others added 8 commits

September 23, 2021 18:16


          first commit wip

6ecf45b


          test_lambda_fix

5a0d60c


          more test updates

5b7df74


          updates

b5dee8e


          resume_start doc update

2bb5bc5


          Merge branch 'master' into unify-cp-load-paths

48d200b


          [pre-commit.ci] auto fixes from pre-commit.com hooks

ef2d2cd

for more information, see https://pre-commit.ci


          mypy

205a380

ananthsub reviewed

View reviewed changes

Copy link

Contributor

ananthsub left a comment

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking really promising @jjenniferdai ! unifying these paths will make the API consistent and help us simplify the trainer internals

fyi @carmocca who was interested in this too from #9405

Sorry, something went wrong.

carmocca reacted with heart emoji

All reactions

❤️ 1 reaction

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py

                       # restore optimizers, etc.
-                      self.checkpoint_connector.restore_training_state()
+                      if self.state.fn == TrainerFn.FITTING:
+                          self.checkpoint_connector.restore_training_state()

Copy link

Contributor

ananthsub Sep 25, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore training state includes things which can be resumed even if not fitting, such as the loop state.

imo we shouldn't add the check for fitting here, but rather inside the select parts inside of restore_training_state

Sorry, something went wrong.

All reactions

Copy link

Contributor Author

jjenniferdai Sep 27, 2021 •

edited

Loading

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(above comment?)

Sorry, something went wrong.

All reactions

Copy link

Contributor

rohitgr7 Sep 30, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we do restore loops there and now some other attributes as well. I'd suggest waiting for this one to get merged: #9413

Sorry, something went wrong.

jjenniferdai reacted with eyes emoji

All reactions

👀 1 reaction

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

ananthsub added design

Includes a design discussion

Related to checkpointing

refactor labels

ananthsub added this to the v1.5 milestone

carmocca reviewed

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/connectors/checkpoint_connector.py Outdated Show resolved Hide resolved

jjenniferdai and others added 4 commits

September 27, 2021 13:34


          Merge branch 'master' into unify-cp-load-paths

cd5b5c0


          [pre-commit.ci] auto fixes from pre-commit.com hooks

ea41b41

for more information, see https://pre-commit.ci


          add resume_end, depr trainer.resume_checkpoint_path

7d89b88


          fit arg order

df7d4a9

jjenniferdai changed the title ~~[wip] Unify checkpoint load paths~~ Unify checkpoint load paths

jjenniferdai marked this pull request as ready for review

September 28, 2021 18:11

jjenniferdai requested review from awaelchli, Borda, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners

September 28, 2021 18:11

mergify bot added the has conflicts label


          Merge branch 'master' into unify-cp-load-paths

1f350c6

jjenniferdai requested a review from rohitgr7 as a code owner

September 30, 2021 17:17

mergify bot removed the has conflicts label

ananthsub reviewed

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

jjenniferdai and others added 2 commits

September 30, 2021 11:37


          bring back properties

4315cf5


          [pre-commit.ci] auto fixes from pre-commit.com hooks

d5069ec

for more information, see https://pre-commit.ci

ananthsub reviewed

View reviewed changes

pytorch_lightning/trainer/trainer.py

                   def _fit_impl(
                       self,
                       model: "pl.LightningModule",
                       train_dataloaders: Optional[Union[TRAIN_DATALOADERS, LightningDataModule]] = None,
                       val_dataloaders: Optional[EVAL_DATALOADERS] = None,
                       datamodule: Optional[LightningDataModule] = None,
+                      ckpt_path: Optional[str] = None,

Copy link

Contributor

ananthsub Sep 30, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be typed as _PATH ?

Sorry, something went wrong.

All reactions

pytorch_lightning/trainer/trainer.py

+                      if self.state.fn == TrainerFn.FITTING:
+                          self.checkpoint_connector.restore_training_state()
+                      self.checkpoint_connector.resume_end()

Copy link

Contributor

ananthsub Sep 30, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: why is this bumped up to here vs in _pre_training_routine ?

Sorry, something went wrong.

All reactions

Copy link

Contributor Author

jjenniferdai Sep 30, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this calls resume_start for all (not only fitting), similarly resume_end for all as well

Sorry, something went wrong.

All reactions

awaelchli mentioned this pull request

Expose strict argument to model loading in trainer.tune #9798

Closed

mergify bot added the has conflicts label

Copy link

Contributor

tchaton commented Oct 12, 2021

Hey @jjenniferdai,

Would you mind resolving the conflicts ?

Best,
T.C

jjenniferdai reacted with thumbs up emoji

All reactions

👍 1 reaction

Sorry, something went wrong.

Copy link

Contributor

tchaton commented Oct 19, 2021

Dear @@jjenniferdai,

Any updates ?

Best,
T.C

All reactions

Sorry, something went wrong.

This was referenced Oct 20, 2021

Revert "Support serialized checkpoint loading" #10057

Merged

Unify checkpoint load paths [redo #9693] #10061

Merged

jjenniferdai changed the title ~~Unify checkpoint load paths~~ [see #10061 instead] Unify checkpoint load paths

Copy link

Contributor Author

jjenniferdai commented Oct 22, 2021

sorry all for the git issues :( please see #10061 instead

All reactions

Sorry, something went wrong.

jjenniferdai closed this

tchaton pushed a commit that referenced this pull request


          Unify checkpoint load paths [redo #9693] (#10061)

6d79184

ninginthecloud pushed a commit to ninginthecloud/pytorch-lightning that referenced this pull request


          Unify checkpoint load paths [redo Lightning-AI#9693] (Lightning-AI#10061

2db5dd0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

carmocca carmocca left review comments

ananthsub ananthsub left review comments

awaelchli Awaiting requested review from awaelchli

Borda Awaiting requested review from Borda Borda is a code owner

justusschock Awaiting requested review from justusschock justusschock is a code owner

kaushikb11 Awaiting requested review from kaushikb11

SeanNaren Awaiting requested review from SeanNaren

tchaton Awaiting requested review from tchaton tchaton is a code owner

williamFalcon Awaiting requested review from williamFalcon

rohitgr7 Awaiting requested review from rohitgr7

Assignees

No one assigned

Labels

Related to checkpointing

Includes a design discussion

has conflicts refactor

Projects

None yet

Milestone

pl:1.5

Development

Successfully merging this pull request may close these issues.

[checkpoint] Resolve 2 different checkpoint loading paths across fit vs validate/test/predict

6 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.