Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect handling of on_batch_end edge cases in run_training_batch #509

Merged
merged 3 commits into from
Nov 19, 2019

Conversation

jeffling
Copy link
Contributor

@jeffling jeffling commented Nov 15, 2019

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This fixes a bug

ValueError: not enough values to unpack (expected 3, got 2)

When trying to return -1 to exit early during on_batch_start, and also when the batch is empty.

For future work, we should have a test for this regression.

https://github.com/williamFalcon/pytorch-lightning/blob/8f8cea1c5759524c6fc2eb33b972bba64e5ce0f4/pytorch_lightning/trainer/train_loop_mixin.py#L100

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

This fixes a bug 

`ValueError: not enough values to unpack (expected 3, got 2)`
@jeffling jeffling changed the title Fix returning only 2 values (3 needed) on an early exit. Fix incorrect handling of on_batch_end edge cases in run_training_batch Nov 15, 2019
The return value was actually a dict even though that variable is initialized as a list.
@jeffling
Copy link
Contributor Author

jeffling commented Nov 15, 2019

It's also very unfortunate that we initialize all_log_metrics as a list and turn it into a dict somewhere. Ideally, things more or less stay the same type. However, I don't have time to clean it up now, just noting this.

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even change its call from

output = self.run_training_batch(batch, batch_nb)
batch_result, grad_norm_dic, batch_step_metrics = output

to

batch_result, grad_norm_dic, batch_step_metrics = \
    self.run_training_batch(batch, batch_nb)

@jeffling
Copy link
Contributor Author

I think we can do as as a followup, I'd like to fix the nit I pointed as well. Right now this is breaking documented functionality so we should merge this in as a hotfix and do a neatness refactor later

@williamFalcon williamFalcon merged commit 619143a into Lightning-AI:master Nov 19, 2019
@williamFalcon
Copy link
Contributor

@jeffling thanks! please follow up on the issue you mentioned :)

@jeffling jeffling deleted the patch-2 branch November 22, 2019 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants