Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] support CPU parallel training with PT #4132

Closed
njzjz opened this issue Sep 16, 2024 · 0 comments · Fixed by #4224
Closed

[Feature Request] support CPU parallel training with PT #4132

njzjz opened this issue Sep 16, 2024 · 0 comments · Fixed by #4224
Assignees
Milestone

Comments

@njzjz
Copy link
Member

njzjz commented Sep 16, 2024

Summary

Support CPU parallel training in the PyTorch backend.

Detailed Description

PyTorch does support gloo for distributed training, but the following lines seem to limit the backend to be nccl.

assert dist.is_nccl_available()
dist.init_process_group(backend="nccl")

Further Information, Files, and Links

No response

@iProzd iProzd self-assigned this Sep 26, 2024
@iProzd iProzd moved this to mustfix in DeePMD-kit V3.0.0 RC Sep 26, 2024
@njzjz njzjz added this to the v3.0.0 milestone Sep 26, 2024
@iProzd iProzd linked a pull request Oct 16, 2024 that will close this issue
github-merge-queue bot pushed a commit that referenced this issue Oct 23, 2024
Fix #4132.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced backend selection for distributed training, allowing for
flexible use of NCCL or Gloo based on availability.
  
- **Bug Fixes**
	- Corrected indentation for improved code clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@njzjz njzjz closed this as completed Oct 23, 2024
@github-project-automation github-project-automation bot moved this from mustfix to Done in DeePMD-kit V3.0.0 RC Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants