Skip to content

Commit

Permalink
Simplify progress bar args (Lightning-AI#1108)
Browse files Browse the repository at this point in the history
* show progress bar dependent on refresh_rate

* test progress_bar_refresh control show bar

* remove show_progress_bar from other tests

* borda fixes

* flake8 fix

* changelog update prog bar refresh rate

* move show_progress_bar to deprecated 0.9 api

* rm show_progress_bar references, test deprecated

* Update pytorch_lightning/trainer/__init__.py

* fix test

* changelog

* minor CHANGELOG.md format

* Update pytorch_lightning/trainer/__init__.py

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Gerard Bentley <gbkh2015@mymail.pomona.edu>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
  • Loading branch information
5 people authored and akarnachev committed Apr 4, 2020
1 parent 3cffda3 commit 5ba1728
Show file tree
Hide file tree
Showing 15 changed files with 79 additions and 55 deletions.
21 changes: 10 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Changed

- Changed `progress_bar_refresh_rate` trainer flag to disable progress bar when set to 0. ([#1108](https://github.com/PyTorchLightning/pytorch-lightning/pull/1108))
- Enhanced `load_from_checkpoint` to also forward params to the model ([#1307](https://github.com/PyTorchLightning/pytorch-lightning/pull/1307))
- Updated references to self.forward() to instead use the `__call__` interface. ([#1211](https://github.com/PyTorchLightning/pytorch-lightning/pull/1211))
- Added option to run without an optimizer by returning `None` from `configure_optimizers`. ([#1279](https://github.com/PyTorchLightning/pytorch-lightning/pull/1279))
Expand All @@ -44,6 +45,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
### Deprecated

- Deprecated Trainer argument `print_nan_grads` ([#1097](https://github.com/PyTorchLightning/pytorch-lightning/pull/1097))
- Deprecated Trainer argument `show_progress_bar` ([#1108](https://github.com/PyTorchLightning/pytorch-lightning/pull/1108))

### Removed

Expand Down Expand Up @@ -72,9 +74,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Added `reload_dataloaders_every_epoch=False` flag for trainer. Some users require reloading data every epoch ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Added `progress_bar_refresh_rate=50` flag for trainer. Throttle refresh rate on notebooks ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Added `reload_dataloaders_every_epoch=False` flag for trainer. Some users require reloading data every epoch ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Added `progress_bar_refresh_rate=50` flag for trainer. Throttle refresh rate on notebooks ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Updated governance docs
- Added a check to ensure that the metric used for early stopping exists before training commences ([#542](https://github.com/PyTorchLightning/pytorch-lightning/pull/542))
- Added `optimizer_idx` argument to `backward` hook ([#733](https://github.com/PyTorchLightning/pytorch-lightning/pull/733))
Expand All @@ -97,7 +99,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Added TPU gradient clipping ([#963](https://github.com/PyTorchLightning/pytorch-lightning/pull/963))
- Added max/min number of steps in `Trainer` ([#728](https://github.com/PyTorchLightning/pytorch-lightning/pull/728))


### Changed

- Improved `NeptuneLogger` by adding `close_after_fit` argument to allow logging after training([#908](https://github.com/PyTorchLightning/pytorch-lightning/pull/1084))
Expand All @@ -109,17 +110,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Freezed models `hparams` as `Namespace` property ([#1029](https://github.com/PyTorchLightning/pytorch-lightning/pull/1029))
- Dropped `logging` config in package init ([#1015](https://github.com/PyTorchLightning/pytorch-lightning/pull/1015))
- Renames model steps ([#1051](https://github.com/PyTorchLightning/pytorch-lightning/pull/1051))
* `training_end` >> `training_epoch_end`
* `validation_end` >> `validation_epoch_end`
* `test_end` >> `test_epoch_end`
- `training_end` >> `training_epoch_end`
- `validation_end` >> `validation_epoch_end`
- `test_end` >> `test_epoch_end`
- Refactor dataloading, supports infinite dataloader ([#955](https://github.com/PyTorchLightning/pytorch-lightning/pull/955))
- Create single file in `TensorBoardLogger` ([#777](https://github.com/PyTorchLightning/pytorch-lightning/pull/777))

### Deprecated

- Deprecated `pytorch_lightning.logging` ([#767](https://github.com/PyTorchLightning/pytorch-lightning/pull/767))
- Deprecated `LightningModule.load_from_metrics` in favour of `LightningModule.load_from_checkpoint` ([#995](https://github.com/PyTorchLightning/pytorch-lightning/pull/995), [#1079](https://github.com/PyTorchLightning/pytorch-lightning/pull/1079))
- Deprecated `@data_loader` decorator ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Deprecated `@data_loader` decorator ([#926](https://github.com/PyTorchLightning/pytorch-lightning/pull/926))
- Deprecated model steps `training_end`, `validation_end` and `test_end` ([#1051](https://github.com/PyTorchLightning/pytorch-lightning/pull/1051), [#1056](https://github.com/PyTorchLightning/pytorch-lightning/pull/1056))

### Removed
Expand Down Expand Up @@ -309,9 +310,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added the flag `log_gpu_memory` to `Trainer` to deactivate logging of GPU
memory utilization
- Added SLURM resubmit functionality (port from test-tube)
- Added the flag `log_gpu_memory` to `Trainer` to deactivate logging of GPU memory utilization
- Added optional weight_save_path to trainer to remove the need for a checkpoint_callback when using cluster training
- Added option to use single gpu per node with `DistributedDataParallel`

Expand Down
9 changes: 4 additions & 5 deletions pytorch_lightning/trainer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,8 @@ def on_train_end(self):
# default used by the Trainer
trainer = Trainer(progress_bar_refresh_rate=1)
# disable progress bar
trainer = Trainer(progress_bar_refresh_rate=0)
reload_dataloaders_every_epoch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -702,12 +704,9 @@ def on_train_end(self):
show_progress_bar
^^^^^^^^^^^^^^^^^
If true shows tqdm progress bar
.. warning:: .. deprecated:: 0.7.2
Example::
# default used by the Trainer
trainer = Trainer(show_progress_bar=True)
Set `progress_bar_refresh_rate` to 0 instead. Will remove 0.9.0.
test_percent_check
^^^^^^^^^^^^^^^^^^
Expand Down
19 changes: 19 additions & 0 deletions pytorch_lightning/trainer/deprecated_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,22 @@ def nb_sanity_val_steps(self, nb):
"`num_sanity_val_steps` since v0.5.0"
" and this method will be removed in v0.8.0", DeprecationWarning)
self.num_sanity_val_steps = nb


class TrainerDeprecatedAPITillVer0_9(ABC):

def __init__(self):
super().__init__() # mixin calls super too

@property
def show_progress_bar(self):
"""Back compatibility, will be removed in v0.9.0"""
warnings.warn("Argument `show_progress_bar` is now set by `progress_bar_refresh_rate` since v0.7.2"
" and this method will be removed in v0.9.0", DeprecationWarning)
return self.progress_bar_refresh_rate >= 1

@show_progress_bar.setter
def show_progress_bar(self, tf):
"""Back compatibility, will be removed in v0.9.0"""
warnings.warn("Argument `show_progress_bar` is now set by `progress_bar_refresh_rate` since v0.7.2"
" and this method will be removed in v0.9.0", DeprecationWarning)
2 changes: 1 addition & 1 deletion pytorch_lightning/trainer/distrib_data_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ def ddp_train(self, gpu_idx, model):
self.node_rank = 0

# show progressbar only on progress_rank 0
self.show_progress_bar = self.show_progress_bar and self.node_rank == 0 and gpu_idx == 0
self.progress_bar_refresh_rate = self.progress_bar_refresh_rate if self.node_rank == 0 and gpu_idx == 0 else 0

# determine which process we are and world size
if self.use_ddp:
Expand Down
2 changes: 1 addition & 1 deletion pytorch_lightning/trainer/distrib_parts.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ def tpu_train(self, tpu_core_idx, model):
self.tpu_global_core_rank = xm.get_ordinal()

# avoid duplicating progress bar
self.show_progress_bar = self.show_progress_bar and self.tpu_global_core_rank == 0
self.progress_bar_refresh_rate = self.progress_bar_refresh_rate if self.tpu_global_core_rank == 0 else 0

# track current tpu
self.current_tpu_idx = tpu_core_idx
Expand Down
5 changes: 2 additions & 3 deletions pytorch_lightning/trainer/evaluation_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,6 @@ class TrainerEvaluationLoopMixin(ABC):
num_val_batches: int
fast_dev_run: ...
process_position: ...
show_progress_bar: ...
process_output: ...
training_tqdm_dict: ...
proc_rank: int
Expand Down Expand Up @@ -278,7 +277,7 @@ def _evaluate(self, model: LightningModule, dataloaders, max_batches: int, test_
dl_outputs.append(output)

# batch done
if batch_idx % self.progress_bar_refresh_rate == 0:
if self.progress_bar_refresh_rate >= 1 and batch_idx % self.progress_bar_refresh_rate == 0:
if test_mode:
self.test_progress_bar.update(self.progress_bar_refresh_rate)
else:
Expand Down Expand Up @@ -361,7 +360,7 @@ def run_evaluation(self, test_mode: bool = False):
desc = 'Testing' if test_mode else 'Validating'
total = max_batches if max_batches != float('inf') else None
pbar = tqdm(desc=desc, total=total, leave=test_mode, position=position,
disable=not self.show_progress_bar, dynamic_ncols=True, file=sys.stdout)
disable=not self.progress_bar_refresh_rate, dynamic_ncols=True, file=sys.stdout)
setattr(self, f'{"test" if test_mode else "val"}_progress_bar', pbar)

# run evaluation
Expand Down
21 changes: 14 additions & 7 deletions pytorch_lightning/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
from pytorch_lightning.trainer.callback_config import TrainerCallbackConfigMixin
from pytorch_lightning.trainer.callback_hook import TrainerCallbackHookMixin
from pytorch_lightning.trainer.data_loading import TrainerDataLoadingMixin
from pytorch_lightning.trainer.deprecated_api import TrainerDeprecatedAPITillVer0_8
from pytorch_lightning.trainer.deprecated_api import (TrainerDeprecatedAPITillVer0_8,
TrainerDeprecatedAPITillVer0_9)
from pytorch_lightning.trainer.distrib_data_parallel import TrainerDDPMixin
from pytorch_lightning.trainer.distrib_parts import TrainerDPMixin, parse_gpu_ids, determine_root_gpu_device
from pytorch_lightning.trainer.evaluation_loop import TrainerEvaluationLoopMixin
Expand Down Expand Up @@ -66,12 +67,13 @@ class Trainer(
TrainerCallbackConfigMixin,
TrainerCallbackHookMixin,
TrainerDeprecatedAPITillVer0_8,
TrainerDeprecatedAPITillVer0_9,
):
DEPRECATED_IN_0_8 = (
'gradient_clip', 'nb_gpu_nodes', 'max_nb_epochs', 'min_nb_epochs',
'add_row_log_interval', 'nb_sanity_val_steps'
)
DEPRECATED_IN_0_9 = ('use_amp',)
DEPRECATED_IN_0_9 = ('use_amp', 'show_progress_bar')

def __init__(
self,
Expand All @@ -86,7 +88,7 @@ def __init__(
gpus: Optional[Union[List[int], str, int]] = None,
num_tpu_cores: Optional[int] = None,
log_gpu_memory: Optional[str] = None,
show_progress_bar: bool = True,
show_progress_bar=None, # backward compatible, todo: remove in v0.9.0
progress_bar_refresh_rate: int = 1,
overfit_pct: float = 0.0,
track_grad_norm: int = -1,
Expand Down Expand Up @@ -161,9 +163,12 @@ def __init__(
log_gpu_memory: None, 'min_max', 'all'. Might slow performance
show_progress_bar: If true shows tqdm progress bar
show_progress_bar:
.. warning:: .. deprecated:: 0.7.2
Set `progress_bar_refresh_rate` to postive integer to enable. Will remove 0.9.0.
progress_bar_refresh_rate: How often to refresh progress bar (in steps)
progress_bar_refresh_rate: How often to refresh progress bar (in steps). Value ``0`` disables progress bar.
overfit_pct: How much of training-, validation-, and test dataset to check.
Expand Down Expand Up @@ -414,7 +419,9 @@ def __init__(

# can't init progress bar here because starting a new process
# means the progress_bar won't survive pickling
self.show_progress_bar = show_progress_bar
# backward compatibility
if show_progress_bar is not None:
self.show_progress_bar = show_progress_bar

# logging
self.log_save_interval = log_save_interval
Expand Down Expand Up @@ -821,7 +828,7 @@ def run_pretrain_routine(self, model: LightningModule):
pbar = tqdm(desc='Validation sanity check',
total=self.num_sanity_val_steps * len(self.val_dataloaders),
leave=False, position=2 * self.process_position,
disable=not self.show_progress_bar, dynamic_ncols=True)
disable=not self.progress_bar_refresh_rate, dynamic_ncols=True)
self.main_progress_bar = pbar
# dummy validation progress bar
self.val_progress_bar = tqdm(disable=True)
Expand Down
2 changes: 1 addition & 1 deletion pytorch_lightning/trainer/training_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,7 @@ def optimizer_closure():
self.get_model().on_batch_end()

# update progress bar
if batch_idx % self.progress_bar_refresh_rate == 0:
if self.progress_bar_refresh_rate >= 1 and batch_idx % self.progress_bar_refresh_rate == 0:
self.main_progress_bar.update(self.progress_bar_refresh_rate)
self.main_progress_bar.set_postfix(**self.training_tqdm_dict)

Expand Down
6 changes: 1 addition & 5 deletions tests/models/test_amp.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ def test_amp_single_gpu(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=True,
max_epochs=1,
gpus=1,
distributed_backend='ddp',
Expand All @@ -42,7 +41,6 @@ def test_no_amp_single_gpu(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=True,
max_epochs=1,
gpus=1,
distributed_backend='dp',
Expand All @@ -66,7 +64,6 @@ def test_amp_gpu_ddp(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=True,
max_epochs=1,
gpus=2,
distributed_backend='ddp',
Expand All @@ -90,7 +87,6 @@ def test_amp_gpu_ddp_slurm_managed(tmpdir):
model = LightningTestModel(hparams)

trainer_options = dict(
show_progress_bar=True,
max_epochs=1,
gpus=[0],
distributed_backend='ddp',
Expand Down Expand Up @@ -128,7 +124,7 @@ def test_cpu_model_with_amp(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
logger=tutils.get_default_testtube_logger(tmpdir),
max_epochs=1,
train_percent_check=0.4,
Expand Down
15 changes: 7 additions & 8 deletions tests/models/test_cpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ def test_early_stopping_cpu_model(tmpdir):
gradient_clip_val=1.0,
overfit_pct=0.20,
track_grad_norm=2,
show_progress_bar=True,
logger=tutils.get_default_testtube_logger(tmpdir),
train_percent_check=0.1,
val_percent_check=0.1,
Expand All @@ -48,7 +47,7 @@ def test_lbfgs_cpu_model(tmpdir):
trainer_options = dict(
default_save_path=tmpdir,
max_epochs=2,
show_progress_bar=False,
progress_bar_refresh_rate=0,
weights_summary='top',
train_percent_check=1.0,
val_percent_check=0.2,
Expand All @@ -67,7 +66,7 @@ def test_default_logger_callbacks_cpu_model(tmpdir):
max_epochs=1,
gradient_clip_val=1.0,
overfit_pct=0.20,
show_progress_bar=False,
progress_bar_refresh_rate=0,
train_percent_check=0.01,
val_percent_check=0.01,
)
Expand Down Expand Up @@ -95,7 +94,7 @@ def test_running_test_after_fitting(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=8,
train_percent_check=0.4,
val_percent_check=0.2,
Expand Down Expand Up @@ -133,7 +132,7 @@ class CurrentTestModel(LightTrainDataloader, LightTestMixin, TestModelBase):
checkpoint = tutils.init_checkpoint_callback(logger)

trainer_options = dict(
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=1,
train_percent_check=0.4,
val_percent_check=0.2,
Expand Down Expand Up @@ -226,7 +225,7 @@ def test_cpu_model(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
logger=tutils.get_default_testtube_logger(tmpdir),
max_epochs=1,
train_percent_check=0.4,
Expand All @@ -247,7 +246,7 @@ def test_all_features_cpu_model(tmpdir):
gradient_clip_val=1.0,
overfit_pct=0.20,
track_grad_norm=2,
show_progress_bar=False,
progress_bar_refresh_rate=0,
logger=tutils.get_default_testtube_logger(tmpdir),
accumulate_grad_batches=2,
max_epochs=1,
Expand Down Expand Up @@ -344,7 +343,7 @@ def test_single_gpu_model(tmpdir):

trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=1,
train_percent_check=0.1,
val_percent_check=0.1,
Expand Down
9 changes: 4 additions & 5 deletions tests/models/test_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ def test_multi_gpu_model_ddp2(tmpdir):
model, hparams = tutils.get_default_model()
trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=True,
max_epochs=1,
train_percent_check=0.4,
val_percent_check=0.2,
Expand All @@ -49,7 +48,7 @@ def test_multi_gpu_model_ddp(tmpdir):
model, hparams = tutils.get_default_model()
trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=1,
train_percent_check=0.4,
val_percent_check=0.2,
Expand All @@ -69,7 +68,7 @@ def test_ddp_all_dataloaders_passed_to_fit(tmpdir):

model, hparams = tutils.get_default_model()
trainer_options = dict(default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=1,
train_percent_check=0.4,
val_percent_check=0.2,
Expand Down Expand Up @@ -165,7 +164,7 @@ def test_multi_gpu_none_backend(tmpdir):
model, hparams = tutils.get_default_model()
trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
max_epochs=1,
train_percent_check=0.1,
val_percent_check=0.1,
Expand All @@ -184,7 +183,7 @@ def test_multi_gpu_model_dp(tmpdir):
model, hparams = tutils.get_default_model()
trainer_options = dict(
default_save_path=tmpdir,
show_progress_bar=False,
progress_bar_refresh_rate=0,
distributed_backend='dp',
max_epochs=1,
train_percent_check=0.1,
Expand Down
Loading

0 comments on commit 5ba1728

Please sign in to comment.