Split callbacks #849

hadim · 2020-02-15T00:10:55Z

Following #776.

All callbacks are split into individual files. Small files are easier to read. It also makes tracking history changes easier.

pep8speaks · 2020-02-16T17:28:11Z

Hello @hadim! Thanks for updating this PR.

In the file pytorch_lightning/callbacks/model_checkpoint.py:

Line 86:101: E501 line too long (112 > 100 characters)

Comment last updated at 2020-02-22 13:39:46 UTC

hadim · 2020-02-16T20:56:26Z

Not sure about the failing CI jobs...

Borda · 2020-02-17T09:26:03Z

@jeremyjordan I would increase the PROFILER_OVERHEAD_MAX_TOLERANCE=0.001 ?

jeremyjordan · 2020-02-17T15:09:25Z

@Borda sure thing, updated on #867

Borda

I like this separation, just a few recommendations :]

pytorch_lightning/callbacks/__init__.py

pytorch_lightning/callbacks/callback.py

pytorch_lightning/callbacks/early_stopping.py

pytorch_lightning/callbacks/model_checkpoint.py

hadim · 2020-02-18T01:19:34Z

CIs are great but I never got spammed like that for a simple PR (not by reviewers by but CI services)!

Borda · 2020-02-18T05:22:59Z

CIs are great but I never got spammed like that for a simple PR (not by reviewers by but CI services)!

@hadim the Typo bot is off (wrong choice) or which CI borders you? =)
For sure we do not want to discourage any contributior we we shall keep some quality level...

Borda

@PyTorchLightning/core-contributors does anyone have any idea how to avoid duplicating package versions if this adds conda environment config?

.pep8speaks.yml

environment.yml

pytorch_lightning/callbacks/early_stopping.py

pytorch_lightning/callbacks/model_checkpoint.py

Borda · 2020-02-18T05:40:56Z

pytorch_lightning/callbacks/model_checkpoint.py

+        checkpoint_callback = ModelCheckpoint(filepath='my_path')
+        Trainer(checkpoint_callback=checkpoint_callback)
+
+        # saves checkpoints to my_path whenever 'val_loss' has a new min


yeah, having this comment above creating a saving callback would be better :]

pytorch_lightning/callbacks/model_checkpoint.py

Borda · 2020-02-18T06:16:41Z

@hadim the refactoring is almost done, great work... My kindly suggestion: Could you please move to add conda environment (which is also good addition) to separate PRs so we do not slow down this one... #199

hadim · 2020-02-19T15:55:44Z

Ready to merge!

Borda

LGTM 🚀 just as it is a large change I would like else from @PyTorchLightning/core-contributors to approve it...
@hadim GREAT job! pls, add this change to CHANGELOG...

hadim · 2020-02-19T16:14:52Z

Careful when you commit. flake8 failed... I'll fix it.

Borda · 2020-02-19T16:18:29Z

Careful when you commit. flake8 failed... I'll fix it.

yeah I was about to fix it, this is the drawback while editing in a browser (the only way how to touch the PR so far I know...)

hadim · 2020-02-19T16:25:18Z

@Borda I also fixed the formatting issues on CHANGELOG.

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

williamFalcon · 2020-02-22T01:30:48Z

Awesome PR!

Mind fixing the GPU tests? :)

___________________________________________________________________________________________ test_amp_single_gpu ____________________________________________________________________________________________

tmpdir = local('/tmp/pytest-of-waf251/pytest-5/test_amp_single_gpu0')

    def test_amp_single_gpu(tmpdir):
        """Make sure DDP + AMP work."""
        tutils.reset_seed()

        if not tutils.can_run_gpu_test():
            return

        hparams = tutils.get_hparams()
        model = LightningTestModel(hparams)

        trainer_options = dict(
            default_save_path=tmpdir,
            show_progress_bar=True,
            max_epochs=1,
            gpus=1,
            distributed_backend='ddp',
            precision=16
        )

>       tutils.run_model_test(trainer_options, model)

tests/test_amp.py:32:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/models/utils.py:63: in run_model_test
    result = trainer.fit(model)
pytorch_lightning/trainer/trainer.py:903: in fit
    mp.spawn(self.ddp_train, nprocs=self.num_gpus, args=(model,))
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py:162: in spawn
    process.start()
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/process.py:121: in start
    self._popen = self._Popen(self)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/context.py:283: in _Popen
    return Popen(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_spawn_posix.py:32: in __init__
    super().__init__(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_fork.py:19: in __init__
    self._launch(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_spawn_posix.py:47: in _launch
    reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = <SpawnProcess name='SpawnProcess-1' parent=25414 initial>, file = <_io.BytesIO object at 0x7f57fb2b93b0>, protocol = None

    def dump(obj, file, protocol=None):
        '''Replacement for pickle.dump() using ForkingPickler.'''
>       ForkingPickler(file, protocol).dump(obj)
E       AttributeError: Can't pickle local object 'ModelCheckpoint.__init__.<locals>.<lambda>'

../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/reduction.py:60: AttributeError
------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------
INFO:root:GPU available: True, used: True
INFO:root:VISIBLE GPUS: 0
INFO:root:Using 16bit precision.
-------------------------------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------------------------------
INFO     root:distrib_data_parallel.py:216 GPU available: True, used: True
INFO     root:distrib_data_parallel.py:264 VISIBLE GPUS: 0
INFO     root:auto_mix_precision.py:21 Using 16bit precision.
_____________________________________________________________________________________________ test_amp_gpu_ddp _____________________________________________________________________________________________

tmpdir = local('/tmp/pytest-of-waf251/pytest-5/test_amp_gpu_ddp0')

    def test_amp_gpu_ddp(tmpdir):
        """Make sure DDP + AMP work."""
        if not tutils.can_run_gpu_test():
            return

        tutils.reset_seed()
        tutils.set_random_master_port()

        hparams = tutils.get_hparams()
        model = LightningTestModel(hparams)

        trainer_options = dict(
            default_save_path=tmpdir,
            show_progress_bar=True,
            max_epochs=1,
            gpus=2,
            distributed_backend='ddp',
            precision=16
        )

>       tutils.run_model_test(trainer_options, model)

tests/test_amp.py:81:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/models/utils.py:63: in run_model_test
    result = trainer.fit(model)
pytorch_lightning/trainer/trainer.py:903: in fit
    mp.spawn(self.ddp_train, nprocs=self.num_gpus, args=(model,))
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py:162: in spawn
    process.start()
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/process.py:121: in start
    self._popen = self._Popen(self)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/context.py:283: in _Popen
    return Popen(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_spawn_posix.py:32: in __init__
    super().__init__(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_fork.py:19: in __init__
    self._launch(process_obj)
../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/popen_spawn_posix.py:47: in _launch
    reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = <SpawnProcess name='SpawnProcess-3' parent=25414 initial>, file = <_io.BytesIO object at 0x7f57fb5ce180>, protocol = None

    def dump(obj, file, protocol=None):
        '''Replacement for pickle.dump() using ForkingPickler.'''
>       ForkingPickler(file, protocol).dump(obj)
E       AttributeError: Can't pickle local object 'ModelCheckpoint.__init__.<locals>.<lambda>'

../../media/falcon_kcgscratch1/software/miniconda3/envs/pl4/lib/python3.8/multiprocessing/reduction.py:60: AttributeError

hadim · 2020-02-22T01:45:48Z

I can't easily run the test on multi GPUs. Looking a the traceback the error seems related to multiprocessing and pickle on ModelCheckpoint. I don't see where the error could be...

@Borda any idea?

ethanwharris · 2020-02-22T10:17:22Z

I can't easily run the test on multi GPUs. Looking a the traceback the error seems related to multiprocessing and pickle on ModelCheckpoint. I don't see where the error could be...

Lambda functions can't be pickled. Distributed stuff needs to pickle the objects to move them on to the right machines. The solution here would be to remove the lambda function in the init of the ModelCheckpoint class, i.e. this line:

self.save_function = lambda x: None

should be removed. Hope that helps!

…ng ddp)

hadim · 2020-02-22T13:40:29Z

Done. Thanks @ethanwharris .

Borda · 2020-02-22T15:23:16Z

@hadim what was the fix? it seems that the same is happening in #833

hadim · 2020-02-22T15:30:33Z

Removing the lambda function model_checkpoint. Look at 2244f8c

Note that I haven't tested it since I don't have easy access to a ddp machine.

* add .vscode in .gitignore * Split callbacks in individual files + add a property to Callback for easy trainer instance access * formatting * Add a conda env file for quick and easy env setup to develop on PL * Adress comments * add fix to kth_best_model * add some typing to callbacks * fix typo * add autopep8 config to pyproject.toml * format again * format * fix toml * fix toml again * consistent max line length in all config files * remove conda env file * Update pytorch_lightning/callbacks/early_stopping.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * docstring * Update pytorch_lightning/callbacks/model_checkpoint.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * fix logic error * format * simplify if/else * format * fix linting issue in changelog * edit changelog about new callback mechanism * fix remaining formating issue on CHANGELOG * remove lambda function because it's compatible with pickle (used during ddp) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Borda changed the title ~~[DO NOT MERGE] Split callbacks~~ [WIP] Split callbacks Feb 15, 2020

Borda added the feature Is an improvement or enhancement label Feb 15, 2020

Borda added this to the 0.6.1 milestone Feb 15, 2020

hadim force-pushed the split_callbacks branch from 7f09ddd to 2401027 Compare February 16, 2020 17:28

hadim changed the title ~~[WIP] Split callbacks~~ Split callbacks Feb 16, 2020

hadim requested a review from Borda February 16, 2020 20:55

hadim force-pushed the split_callbacks branch from fe47a9c to 23470f0 Compare February 17, 2020 17:30

Borda requested changes Feb 17, 2020

View reviewed changes

hadim force-pushed the split_callbacks branch 3 times, most recently from 4dd0f91 to 9499527 Compare February 17, 2020 23:23

hadim mentioned this pull request Feb 18, 2020

Callbacks continued #889

Merged

Borda requested changes Feb 18, 2020

View reviewed changes

hadim force-pushed the split_callbacks branch from 240df7f to 9bbf29c Compare February 19, 2020 15:02

Borda approved these changes Feb 19, 2020

View reviewed changes

Borda requested review from neggert and a team February 19, 2020 16:12

Borda added the ready PRs ready to be merged label Feb 19, 2020

Borda mentioned this pull request Feb 20, 2020

Custom callbacks #595

Closed

hadim and others added 11 commits February 21, 2020 13:07

Update pytorch_lightning/callbacks/model_checkpoint.py

6eae113

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

docstring

9331417

Update pytorch_lightning/callbacks/model_checkpoint.py

8234709

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Update pytorch_lightning/callbacks/model_checkpoint.py

5561188

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

fix logic error

9931d7f

format

2c3746e

simplify if/else

fcf3cae

format

b3757cf

fix linting issue in changelog

275673b

edit changelog about new callback mechanism

5a5d80f

fix remaining formating issue on CHANGELOG

6b61f43

hadim force-pushed the split_callbacks branch from 980a0d1 to 6b61f43 Compare February 21, 2020 18:07

williamFalcon added Failed GPU tests and removed ready PRs ready to be merged labels Feb 22, 2020

remove lambda function because it's compatible with pickle (used duri…

2244f8c

…ng ddp)

Borda mentioned this pull request Feb 22, 2020

resolving documentation warnings #833

Merged

4 tasks

awaelchli mentioned this pull request Feb 22, 2020

Relax hparams in model saving/loading #907

Closed

Borda mentioned this pull request Feb 22, 2020

Unify usage of multiple callbacks #896

Closed

williamFalcon merged commit 89d5772 into Lightning-AI:master Feb 23, 2020

hadim deleted the split_callbacks branch February 23, 2020 02:46

Borda removed the Failed GPU tests label Feb 23, 2020

Borda mentioned this pull request Feb 24, 2020

simplification over all kinds of "path" settings #882

Closed

rohitgr7 mentioned this pull request Nov 29, 2021

Deprecate and remove on_epoch_start/end and on_batch_start/end hooks #10807

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split callbacks #849

Split callbacks #849

hadim commented Feb 15, 2020 •

edited

Loading

pep8speaks commented Feb 16, 2020 •

edited

Loading

hadim commented Feb 16, 2020

Borda commented Feb 17, 2020

jeremyjordan commented Feb 17, 2020

Borda left a comment

hadim commented Feb 18, 2020

Borda commented Feb 18, 2020

Borda left a comment

Borda Feb 18, 2020

Borda commented Feb 18, 2020 •

edited

Loading

hadim commented Feb 19, 2020

Borda left a comment •

edited

Loading

hadim commented Feb 19, 2020

Borda commented Feb 19, 2020

hadim commented Feb 19, 2020 •

edited

Loading

williamFalcon commented Feb 22, 2020

hadim commented Feb 22, 2020

ethanwharris commented Feb 22, 2020 •

edited

Loading

hadim commented Feb 22, 2020

Borda commented Feb 22, 2020

hadim commented Feb 22, 2020

Split callbacks #849

Split callbacks #849

Conversation

hadim commented Feb 15, 2020 • edited Loading

pep8speaks commented Feb 16, 2020 • edited Loading

Comment last updated at 2020-02-22 13:39:46 UTC

hadim commented Feb 16, 2020

Borda commented Feb 17, 2020

jeremyjordan commented Feb 17, 2020

Borda left a comment

Choose a reason for hiding this comment

hadim commented Feb 18, 2020

Borda commented Feb 18, 2020

Borda left a comment

Choose a reason for hiding this comment

Borda Feb 18, 2020

Choose a reason for hiding this comment

Borda commented Feb 18, 2020 • edited Loading

hadim commented Feb 19, 2020

Borda left a comment • edited Loading

Choose a reason for hiding this comment

hadim commented Feb 19, 2020

Borda commented Feb 19, 2020

hadim commented Feb 19, 2020 • edited Loading

williamFalcon commented Feb 22, 2020

hadim commented Feb 22, 2020

ethanwharris commented Feb 22, 2020 • edited Loading

hadim commented Feb 22, 2020

Borda commented Feb 22, 2020

hadim commented Feb 22, 2020

hadim commented Feb 15, 2020 •

edited

Loading

pep8speaks commented Feb 16, 2020 •

edited

Loading

Borda commented Feb 18, 2020 •

edited

Loading

Borda left a comment •

edited

Loading

hadim commented Feb 19, 2020 •

edited

Loading

ethanwharris commented Feb 22, 2020 •

edited

Loading