Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: optimize training loop (#4426)
Improvements to the training process: * [`deepmd/pt/train/training.py`](diffhunk://#diff-a90c90dc0e6a17fbe2e930f91182805b83260484c9dc1cfac3331378ffa34935R659): Added a check to skip setting the model to training mode if it already is. The profiling result shows it takes some time to recursively set it to all models. * [`deepmd/pt/train/training.py`](diffhunk://#diff-a90c90dc0e6a17fbe2e930f91182805b83260484c9dc1cfac3331378ffa34935L686-L690): Modified the gradient clipping function to include the `error_if_nonfinite` parameter, and removed the manual check for non-finite gradients and the associated exception raising. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Improved training loop with enhanced error handling and control flow. - Updated gradient clipping logic for better error detection. - Refined logging functionality for training and validation results. - **Bug Fixes** - Prevented redundant training calls by adding conditional checks. - **Documentation** - Clarified method logic in the `Trainer` class without changing method signatures. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information