Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix global step increment on training_epoch_end #3673

Merged
merged 7 commits into from
Sep 28, 2020

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Sep 27, 2020

What does this PR do?

global_step gets incremented if training_epoch_end is implemented.
This shouldn't be necessary and will lead to a misalignment in logs.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added the bug Something isn't working label Sep 27, 2020
@rohitgr7
Copy link
Contributor

Looks like all tests are passing. Maybe add a test too to make sure its all good in future 🙂

@awaelchli
Copy link
Contributor Author

That's the plan :)

@awaelchli
Copy link
Contributor Author

William will solve this in his branch

@awaelchli awaelchli closed this Sep 27, 2020
@awaelchli awaelchli reopened this Sep 27, 2020
@williamFalcon williamFalcon marked this pull request as ready for review September 27, 2020 23:09
@mergify mergify bot requested a review from a team September 27, 2020 23:09
@williamFalcon
Copy link
Contributor

The horovod test is failing randomly.... also verified on my local.

image

@tgaddair mind taking a look in a follow on PR?
merging this one to unblock releases.

@pep8speaks
Copy link

pep8speaks commented Sep 27, 2020

Hello @awaelchli! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-27 23:50:23 UTC


# Called every 3 steps, meaning for 1 epoch of 11 batches, it is called 3 times with gamma=0.1
assert pytest.approx(init_lr * 0.1) == adjusted_lr2
# @pytest.mark.skipif(platform.system() == "Windows", reason="Horovod is not supported on Windows")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgaddair weird error... happens only on some machines some times haha

@mergify mergify bot requested a review from a team September 27, 2020 23:51
@williamFalcon williamFalcon merged commit f37e9e8 into master Sep 28, 2020
@awaelchli awaelchli deleted the bugfix/epochend-globalstep branch September 28, 2020 03:53
@Borda Borda added this to the 0.9.x milestone Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants