-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pt): make PT training step idx consistent with TF #4221
fix(pt): make PT training step idx consistent with TF #4221
Conversation
Fix deepmodeling#4206. Currently, the training step index displayed in TF and PT has different meanings: - In TF, step 0 means no training; step 1 means a training step has been performed. The maximum training step is equal to the number of steps. - In PT, step 0 means a training step has been performed. The maximum training step is the number of steps minus 1. This PR corrects the defination of the step index in PT and makes them consistent. There are still a difference: TF shows step 0 but PT shows step 1. Showing step 0 in PT needs heavy refactor and thus is not included in this PR. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request primarily modify the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- deepmd/pt/train/training.py (8 hunks)
🧰 Additional context used
🔇 Additional comments (5)
deepmd/pt/train/training.py (5)
772-775
: Improved step display logicThe introduction of
display_step_id
and the modified condition for displaying training progress are good improvements. These changes make the step count more intuitive (starting from 1) and ensure that the first step is always displayed, aligning with the PR objective of making the PT training step index consistent with TF.
Line range hint
827-840
: Consistent use of improved step displayThe use of
display_step_id
in logging training messages maintains consistency with the earlier changes. This ensures that the more intuitive step counting (starting from 1) is used throughout the logging process.
Line range hint
867-880
: Consistent step display in validation loggingThe use of
display_step_id
in validation message logging maintains consistency with the training message logging. This ensures a uniform and intuitive step counting across all types of logging in the training process.
Line range hint
889-905
: Comprehensive application of improved step countingThe use of
display_step_id
in formatting training messages and printing training information ensures that the improved step counting is consistently applied throughout all aspects of output and logging. This change maintains coherence with earlier modifications and provides a uniform user experience.
Line range hint
772-933
: Summary of changes: Improved and consistent step countingThe changes in this file effectively implement an improved step counting mechanism, making the PT training step index consistent with TF as per the PR objective. The new
display_step_id
is used consistently across various logging and display functions, including training progress, validation logging, and TensorBoard integration. This enhances the user experience by providing a more intuitive step count that starts from 1.The changes are well-implemented and maintain consistency throughout the code. However, do consider the suggested modification to the TensorBoard logging condition to ensure the first step is always logged, regardless of the
tensorboard_freq
setting.Overall, these changes represent a solid improvement to the training process visualization and logging.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4221 +/- ##
=======================================
Coverage 83.50% 83.51%
=======================================
Files 541 541
Lines 52486 52487 +1
Branches 3043 3047 +4
=======================================
+ Hits 43830 43832 +2
Misses 7708 7708
+ Partials 948 947 -1 ☔ View full report in Codecov by Sentry. |
Fix #4206.
Currently, the training step index displayed in TF and PT has different meanings:
This PR corrects the definition of the step-index in PT and makes them consistent.
There is still a difference after this PR: TF shows step 0, but PT shows step 1. Showing the loss of step 0 in PT needs heavy refactoring and is thus not included in this PR.
Summary by CodeRabbit
New Features
Bug Fixes