-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct the multinode training doc #5747
Conversation
@@ -73,7 +76,7 @@ def _train_update(device, step, loss, tracker, epoch, writer): | |||
|
|||
|
|||
def train_mnist(flags, **kwargs): | |||
if flags.ddp: | |||
if flags.ddp or flags.pjrt_distributed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting... Do we need a dedicated flag still, or can we just also check for torchrun some other way? I saw dist.is_torchelastic_launched elsewhere in the codebase: https://github.com/pytorch/xla/blob/83778f0/torch_xla/_internal/rendezvous.py#L20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nothing blocking. Thanks Xiongfei!
Thanks for the review! |
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
* fix Jon's comment * add pjrt_distributed flag back. * updated the doc * fix typo * fix typo
This PR
--pjrt_distributed
flag back due to Merge--pjrt_distributed
flag with--ddp
flag. #5732 (comment).