ModelCheckpoint save_function() not set? #4079

celsofranssa · 2020-10-11T15:29:35Z

I am training a PL model using the following code snippet:

    # logger
    tb_logger = pl_loggers.TensorBoardLogger(cfg.logs.path, name='rnn_exp')

    # checkpoint callback
    checkpoint_callback = ModelCheckpoint(
        filepath=cfg.checkpoint.path + "encoder_rnn{epoch:02d}",
        save_top_k=1,
        mode="min" # monitor is defined in val_step: EvalResult(checkpoint_on=val_loss)
    )

    # early stopping callback
    early_stopping_callback = EarlyStopping(
        monitor="val_loss",
        patience=cfg.val.patience,
        mode="min"
    )

    tokenizer = ...
    dm = MyDataModule(cfg, tokenizer)

    model = RNNEncoder(cfg)

    trainer = Trainer(
        fast_dev_run=False,
        max_epochs=cfg.train.max_epochs,
        gpus=1,
        logger=tb_logger,
        callbacks=[checkpoint_callback, early_stopping_callback]
    )

    # training
    dm.setup('fit')
    trainer.fit(model, datamodule=dm)

However, after the first epoch, the model presents the following error, probably when calling the model checkpoint callback:

    trainer.fit(model, datamodule=dm)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit
    results = self.accelerator_backend.train(model)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train
    results = self.trainer.run_pretrain_routine(model)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1239, in run_pretrain_routine
    self.train()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
    self.run_training_epoch()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 516, in run_training_epoch
    self.run_evaluation(test_mode=False)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 603, in run_evaluation
    self.on_validation_end()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 176, in on_validation_end
    callback.on_validation_end(self, self.get_model())
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 27, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 380, in on_validation_end
    self._do_check_save(filepath, current, epoch, trainer, pl_module)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 421, in _do_check_save
    self._save_model(filepath, trainer, pl_module)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in _save_model
    raise ValueError(".save_function() not set")
ValueError: .save_function() not set

Could you tell me if I forgot to configure something?

awaelchli · 2020-10-11T15:33:26Z

currently you need to set the ModelCheckpoint via Trainer(checkpoint_callback=...)
#3990 will enable passing it to callbacks

celsofranssa · 2020-10-11T15:37:26Z

Thanks, @awaelchli,

I've just thought that. Thanks a lot for the help.

celsofranssa added bug Something isn't working help wanted Open to be worked on labels Oct 11, 2020

celsofranssa closed this as completed Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelCheckpoint save_function() not set? #4079

ModelCheckpoint save_function() not set? #4079

celsofranssa commented Oct 11, 2020

awaelchli commented Oct 11, 2020 •

edited

Loading

celsofranssa commented Oct 11, 2020

ModelCheckpoint save_function() not set? #4079

ModelCheckpoint save_function() not set? #4079

Comments

celsofranssa commented Oct 11, 2020

awaelchli commented Oct 11, 2020 • edited Loading

celsofranssa commented Oct 11, 2020

awaelchli commented Oct 11, 2020 •

edited

Loading