Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the multinode training doc #5747

Merged
merged 5 commits into from
Oct 31, 2023
Merged

Correct the multinode training doc #5747

merged 5 commits into from
Oct 31, 2023

Conversation

vanbasten23
Copy link
Collaborator

This PR

@vanbasten23 vanbasten23 marked this pull request as ready for review October 30, 2023 18:08
docs/pjrt.md Show resolved Hide resolved
@@ -73,7 +76,7 @@ def _train_update(device, step, loss, tracker, epoch, writer):


def train_mnist(flags, **kwargs):
if flags.ddp:
if flags.ddp or flags.pjrt_distributed:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting... Do we need a dedicated flag still, or can we just also check for torchrun some other way? I saw dist.is_torchelastic_launched elsewhere in the codebase: https://github.com/pytorch/xla/blob/83778f0/torch_xla/_internal/rendezvous.py#L20

Copy link
Collaborator

@jonb377 jonb377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nothing blocking. Thanks Xiongfei!

@vanbasten23
Copy link
Collaborator Author

Thanks for the review!

@vanbasten23 vanbasten23 merged commit 4038f8e into master Oct 31, 2023
18 checks passed
mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* fix Jon's comment

* add pjrt_distributed flag back.

* updated the doc

* fix typo

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants