-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPMD] suppor DTensor API integration #5776
Conversation
6cef706
to
ccb2c4c
Compare
Tested locally, @JackCaoG @jonb377 @alanwaketan I am moving/grouping the SPMD API under |
e60ce71
to
19fd03a
Compare
Had a chat with @JackCaoG , we will move out of experimental and to distributed from BETA. |
torch_xla/distributed/spmd/api.py
Outdated
|
||
log = logging.getLogger(__name__) | ||
|
||
TORCH_XLA_INITIALIZED = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this TORCH_XLA_INITIALIZED
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we don't need this anymore, due to the fact that this code now lives in torch_xla.
torch_xla/distributed/spmd/api.py
Outdated
) -> None: | ||
if TORCH_XLA_INITIALIZED: | ||
# TODO(yeounoh) replace this with xr.use_spmd() when we deprecate the flag. | ||
os.environ["XLA_USE_SPMD"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call xr.use_spmd()
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there could be a case where the caller already has initialized some tensors before entering the API. We may want to support both, at least in the implementation. I will leave as a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read the whole or too careful since I felt like most of them are just moving the code around. Is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Yeounoh!
docs/spmd.md
Outdated
@@ -46,8 +46,8 @@ import numpy as np | |||
import torch | |||
import torch_xla.core.xla_model as xm | |||
import torch_xla.runtime as xr | |||
import torch_xla.experimental.xla_sharding as xs | |||
from torch_xla.experimental.xla_sharding import Mesh | |||
import torch_xla.distributed.spmd.xla_sharding as xs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making the import path just import torch_xla.distributed.spmd as xs
? Not sure if xs
is still the right abbreviation in that case (maybe we can think of it as x
la s
pmd), but it would make the spmd
package the single entrypoint in userspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let's do that.
be48d90
to
c0cdd77
Compare
…adow xla DTensor API.
c7b779a
to
794cbc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving forward, should we use dtensor directly even within our codebase?
@jonb377 do you want to update HF to reflect this change? We can maybe a couple of days in case this get reverted : ).
@alanwaketan for HF my major concern is the backward compatibility. I think We should at least wait until 2.2 release is out before updating. |
Sounds good to me. |
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
In response to the change pytorch/xla#5776 and #92909 Pull Request resolved: #113214 Approved by: https://github.com/wanchaol
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API. * support backward compatibility of the old imports * Move spmd out of experimental * Update spmd.md for distributed/spmd
This is a follow-up to pytorch/pytorch#92909, DTensor & XLA integration. In this PR, we address
toch_xla/distributed/spmd
xla
DTensor API totorch_xla/distributed/spmd
so DTensor implementation can access.