-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[transformer] Allow for skipping stream synch #1505
[transformer] Allow for skipping stream synch #1505
Conversation
only when pytorch/pytorch#82450 is included in pytorch. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one of my latest PRs created a confusion is test naming. I prefer using test_learning
and test_inference
for more (subjectively) convenient usage when running the tests. So I made several suggestions for them. Everything else looks good to me 👍
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
on DGX A100, |
* Optionally disable stream synchronization after batched p2p communication * Add test cases with `sync_batch_comm=False` only when pytorch/pytorch#82450 is included in pytorch. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * utilize existing test methods Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * consistent naming Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com> * silly boy, to skip the sync, set False Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * cosmetic Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * Test with async_pipelinign w/o sync after batch_isend_irecv Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * again, set sync_batch_comm to False Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com> * Remove `torch.testing._internal.common_cuda` Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com>
* Optionally disable stream synchronization after batched p2p communication * Add test cases with `sync_batch_comm=False` only when pytorch/pytorch#82450 is included in pytorch. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * utilize existing test methods Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * consistent naming Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com> * silly boy, to skip the sync, set False Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * cosmetic Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * Test with async_pipelinign w/o sync after batch_isend_irecv Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> * again, set sync_batch_comm to False Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com> * Remove `torch.testing._internal.common_cuda` Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Aidyn-A <Aidyn-A@users.noreply.github.com>
Optionally disable stream synchronization after batched p2p communication
exported from nvcr.io/nvidia/pytorch:22.09-py3 container with some test cases
cc @eqy @Aidyn-A @ptrblck @Fuzzkatt