Load callback states while testing. #5542

rohitgr7 · 2021-01-16T22:22:12Z

🚀 Feature

Load callback states while testing.

Motivation

Pitch

Two possible API changes:

with an additional argument restore_states:

test(ckpt_path, restore_states=True/False)  # give an option whether to load states or not
test(model, ckpt_path, restore_states=True/False)  # same as above but will just load checkpoint states and not the model

# raise an error
test(ckpt_path=None, restore_states=True)

or without any additional argument:

test(ckpt_path)  # always load states
test(ckpt_path=None)  # don't load any states.
test(model, ckpt_path)  # reload checkpoint states only from ckpt_path

Alternatives

Alternatively, one can just reload checkpoints manually, call on_load_checkpoint for all the callbacks manually, and test.

PS: There may be a better solution. Open to suggestions :)
cc: @ananthsub

cc @Borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @tchaton @akihironitta

The text was updated successfully, but these errors were encountered:

awaelchli · 2021-01-18T04:45:06Z

@rohitgr7 what's the difference between your proposal and the existing

trainer = Trainer(resume_from_checkpoint=x)
trainer.test()

?

rohitgr7 · 2021-01-18T18:50:19Z

trainer = Trainer(resume_from_checkpoint=x)
trainer.test()

in the recent PR the reload from resume_from_checkpoint was disabled completely since it was reloading the model state too also resume_from_checkpoint is meant to resume the training I guess. If we use this then while testing with different checkpoints we need to pass that checkpoint path in both Trainer(resume_from_checkpoint=ckpt) and trainer.test(ckpt_path=ckpt). I think with resume_from_checkpoint it will be easier to handle, it's just that this argument name sounds a bit misleading to me in case of testing.

ananthsub · 2021-01-24T01:54:49Z

in the recent PR the reload from resume_from_checkpoint was disabled completely since it was reloading the model state too

Isn't this a breaking API change? I commented on #5388 about this too. What happens to callbacks whose states also depend on the model?

ananthsub · 2021-01-25T00:38:20Z

@rohitgr7 could we still call the checkpoint connector from setup_training inside run_evaluation, but add logic inside of restore to check for trainer.testing - if it's testing, we can load the model and callback states only and ignore the trainer states. this way we don't have to expose anything new in the trainer API. what do you think?

stale · 2021-02-27T01:08:42Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

rohitgr7 · 2021-03-07T12:10:36Z

for this, I'd suggest reloading the callback states as well along with the model state using the ckpt_path passed in .test. Open to suggestions!

stale · 2021-04-07T00:50:50Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

stale · 2021-05-08T21:28:11Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

edenlightning · 2021-05-09T14:45:13Z

@PyTorchLightning/core-contributors thoughts?

carmocca · 2021-05-10T12:17:51Z

Does trainer.test(ckpt_path) not reload callback states if the ckpt_path checkpoint includes the states for callbacks?

Do we ever want to reload callback states but not the model state?

rohitgr7 · 2021-05-10T17:41:19Z

Does trainer.test(ckpt_path) not reload callback states if the ckpt_path checkpoint includes the states for callbacks?

it does not.

I guess one can simply use a hook for this.

def on_test_start(self):
    ckpt = load_checkpoint(self.tested_ckpt_path)
    self.trainer.on_load_checkpoint(ckpt)

carmocca · 2021-05-11T13:58:12Z

@rohitgr7 but I think the Callback on_load_checkpoint also does not get called during fit

class MyCallback(Callback):
    def on_save_checkpoint(self, trainer, pl_module, checkpoint):
        return {"foo": True}

    def on_load_checkpoint(self, trainer, pl_module, callback_state):
        # DOES NOT GET CALLED
        print(callback_state)

def test_bug(tmpdir):
    model = BoringModel()

    trainer = Trainer(max_epochs=1, callbacks=[MyCallback()])
    trainer.fit(model)

    ckpt = str(tmpdir / "test.ckpt")
    trainer.save_checkpoint(ckpt)

    trainer = Trainer(resume_from_checkpoint=ckpt, max_epochs=2)
    trainer.fit(model)

I don't think we need to add a flag for this. If the state was saved, it should be reloaded

rohitgr7 · 2021-05-11T17:46:27Z

@carmocca to make it load the callback state you need to pass the callbacks in the re-run.

trainer = pl.Trainer(resume_from_checkpoint=ckpt, max_epochs=2, callbacks=[MyCallback()])

on_load_checkpoint is called only if you assign resume_from_checkpoint inside trainer.fit. During testing, it does not because we don't load any other state except the model's state_dict when a ckpt is passed to trainer.test.

To load the state of callbacks one can just manually call this trainer method, although it's just a workaround.

def on_test_start(self):
    ckpt = load_checkpoint(self.tested_ckpt_path)
    self.trainer.on_load_checkpoint(ckpt)

carmocca · 2021-05-12T11:34:05Z

to make it load the callback state you need to pass the callbacks in the re-run.

Thanks. This could use an info message

although it's just a workaround.

I get it, but this should work as in fit

rohitgr7 · 2021-05-12T16:57:33Z

to make it load the callback state you need to pass the callbacks in the re-run.

Thanks. This could use an info message

although it's just a workaround.

I get it, but this should work as in fit

yes it should. Although just wondering this case.

test(model, ckpt_path)

should we load the callback states here since ckpt_path is passed or just simply ignore it since model is passed explictly.

carmocca · 2021-05-17T16:03:42Z

should we load the callback states here since ckpt_path is passed or just simply ignore it since model is passed explictly.

ckpt_path does nothing if the model is passed

https://github.com/PyTorchLightning/pytorch-lightning/blob/20f63377f81f4771d3f128f979b3a0f9b8d219a7/pytorch_lightning/trainer/trainer.py#L569-L570

stale · 2021-06-16T20:40:20Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

tchaton · 2021-10-06T07:54:58Z

Hey @rohitgr7,

quick question: Do we have a use-case where we need to reload the callback states for testing ?

Best,
T.C

rohitgr7 · 2021-10-06T07:58:11Z

@tchaton I don't have one currently, but here's one from @ananthsub
#5161 (comment)

tchaton · 2021-10-06T10:29:57Z

Dear @ananthsub,

Any chance the Exponential Moving Average Callback could be contributed to actually provide a proper use-case for this refactor. Otherwise, I believe it is a bad pattern to force refactor without any open source applications.

Best,
T.C

hal-314 · 2022-01-31T10:08:08Z

Hi @tchaton

I'm facing the same limitation as @ananthsub as I'm training with EMA. I expected that on_load_checkpoint to be callback with trainer.evaluate or trainer.predict. Lightning docs implies that they be always called when loading from a checkpoint. I believe that docs should mention that Callbacks.on_load_checkpoint is only called with trainer.fit.

You can find a working EMA callback here (apply this changes for MultiGPU support).

rohitgr7 added feature Is an improvement or enhancement help wanted Open to be worked on labels Jan 16, 2021

ananthsub mentioned this issue Jan 24, 2021

new LightningModule hook "configure_callbacks" #5621

Merged

6 tasks

stale bot added the won't fix This will not be worked on label Feb 27, 2021

stale bot closed this as completed Mar 7, 2021

awaelchli reopened this Mar 7, 2021

stale bot removed the won't fix This will not be worked on label Mar 7, 2021

stale bot added the won't fix This will not be worked on label Apr 7, 2021

awaelchli removed the won't fix This will not be worked on label Apr 8, 2021

stale bot added the won't fix This will not be worked on label May 8, 2021

edenlightning added the design Includes a design discussion label May 9, 2021

stale bot removed the won't fix This will not be worked on label May 9, 2021

edenlightning added the discussion In a discussion stage label May 9, 2021

stale bot added the won't fix This will not be worked on label Jun 16, 2021

carmocca added this to the v1.5 milestone Jun 17, 2021

stale bot removed the won't fix This will not be worked on label Jun 17, 2021

rohitgr7 mentioned this issue Sep 30, 2021

[see #10061 instead] Unify checkpoint load paths #9693

Closed

12 tasks

tchaton added the priority: 1 Medium priority task label Oct 6, 2021

awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021

carmocca added checkpointing Related to checkpointing trainer: test trainer: validate and removed discussion In a discussion stage design Includes a design discussion labels Feb 1, 2022

carmocca modified the milestones: 1.6, future Feb 28, 2022

hal-314 mentioned this issue Jun 23, 2022

Add feature Exponential Moving Average (EMA) #10914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load callback states while testing. #5542

Load callback states while testing. #5542

rohitgr7 commented Jan 16, 2021 •

edited by github-actions bot

Loading

awaelchli commented Jan 18, 2021

rohitgr7 commented Jan 18, 2021

ananthsub commented Jan 24, 2021 •

edited

Loading

ananthsub commented Jan 25, 2021

stale bot commented Feb 27, 2021

rohitgr7 commented Mar 7, 2021

stale bot commented Apr 7, 2021

stale bot commented May 8, 2021

edenlightning commented May 9, 2021

carmocca commented May 10, 2021 •

edited

Loading

rohitgr7 commented May 10, 2021 •

edited

Loading

carmocca commented May 11, 2021

rohitgr7 commented May 11, 2021

carmocca commented May 12, 2021 •

edited

Loading

rohitgr7 commented May 12, 2021

carmocca commented May 17, 2021

stale bot commented Jun 16, 2021

tchaton commented Oct 6, 2021

rohitgr7 commented Oct 6, 2021

tchaton commented Oct 6, 2021 •

edited

Loading

hal-314 commented Jan 31, 2022

Load callback states while testing. #5542

Load callback states while testing. #5542

Comments

rohitgr7 commented Jan 16, 2021 • edited by github-actions bot Loading

🚀 Feature

Motivation

Pitch

Alternatives

awaelchli commented Jan 18, 2021

rohitgr7 commented Jan 18, 2021

ananthsub commented Jan 24, 2021 • edited Loading

ananthsub commented Jan 25, 2021

stale bot commented Feb 27, 2021

rohitgr7 commented Mar 7, 2021

stale bot commented Apr 7, 2021

stale bot commented May 8, 2021

edenlightning commented May 9, 2021

carmocca commented May 10, 2021 • edited Loading

rohitgr7 commented May 10, 2021 • edited Loading

carmocca commented May 11, 2021

rohitgr7 commented May 11, 2021

carmocca commented May 12, 2021 • edited Loading

rohitgr7 commented May 12, 2021

carmocca commented May 17, 2021

stale bot commented Jun 16, 2021

tchaton commented Oct 6, 2021

rohitgr7 commented Oct 6, 2021

tchaton commented Oct 6, 2021 • edited Loading

hal-314 commented Jan 31, 2022

rohitgr7 commented Jan 16, 2021 •

edited by github-actions bot

Loading

ananthsub commented Jan 24, 2021 •

edited

Loading

carmocca commented May 10, 2021 •

edited

Loading

rohitgr7 commented May 10, 2021 •

edited

Loading

carmocca commented May 12, 2021 •

edited

Loading

tchaton commented Oct 6, 2021 •

edited

Loading