[dtensor] tensor ops to use strategy based sharding prop #100607

wanchaol · 2023-05-04T01:22:22Z

Stack from ghstack (oldest at bottom):

-> [dtensor] tensor ops to use strategy based sharding prop #100607

This is the first series of PR that adopts operator impls to use a
strategy based approach, each op utilizes OpStrategy and PlacementStrategy
to generate their own strategy. By utilizing the strategy based
approach along with the op graph, we could enable more advanced op
implementation (decomp is possible), and turn the sharding prop to be
more like a contraint satisfication problem.

This PR alone only adds some basic tensor op strategies, and it directly
works on the op graph that was used for metadata propagation. The tensor ops
added in this PR mainly follows one of the arg strategy. The next set of
PRs would add more op strategies to other ops.

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. [ghstack-poisoned]

pytorch-bot · 2023-05-04T01:22:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100607

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 97cfefb:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. ghstack-source-id: 3ae706a06ac66e74925e2133c28e2f904c37eda9 Pull Request resolved: #100607

XilunWu

sorry for the late review! Some questions and suggestions. I have a question on the high-level side, the new change utilizes fx graph, does that mean DTensor will eventually move from eager execution to graph execution (i.e. compiler mode)?

torch/distributed/_tensor/op_schema.py

torch/distributed/_tensor/ops/tensor_ops.py

torch/distributed/_tensor/sharding_prop.py

XilunWu · 2023-05-08T17:27:34Z

torch/distributed/_tensor/sharding_prop.py

+                        # for eager execution, inputs only have one possible sharding
+                        node_to_strategy[node] = OpStrategy([strategy])


This part is a bit difficult for me to understand. "for eager execution, inputs only have one possible sharding", does it mean the original sharding of DTensor input? Will it be different in compiler mode?

Yep the one possible sharding is the original sharding of the DTensor inputs. In compile mode I think there might be multiple possible shardings.

torch/distributed/_tensor/sharding_prop.py

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. [ghstack-poisoned]

wanchaol · 2023-05-10T06:31:49Z

does that mean DTensor will eventually move from eager execution to graph execution (i.e. compiler mode)?

We are already using the small op graph for metadata propagation (i.e. output shape/stride), so I used it for strategy based sharding prop too, I don't really know whether we should use this op graph for runtime execution yet. I feel we should keep using the eager execution and only do sharding prop on the graph, if we later found it might be good to directly use the op graph to run, we should evaluate the perf and switch to use that for execution afterwards.

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. [ghstack-poisoned]

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. ghstack-source-id: 6c1ef3f48757ee3b9cff5b43ca9cee282946eb11 Pull Request resolved: #100607

XilunWu

lgtm! Thanks for adding strategy-based sharding propagation that may simplify implementing new tensor ops.

wanchaol · 2023-05-10T21:36:51Z

@pytorchbot merge

pytorchmergebot · 2023-05-10T21:38:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-10T21:47:35Z

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel, trunk

Details for Dev Infra team

Raised by workflow job

wanchaol · 2023-05-10T21:55:06Z

@pytorchbot merge

pytorchmergebot · 2023-05-10T21:57:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-10T21:57:21Z

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel, trunk

Details for Dev Infra team

Raised by workflow job

wanchaol · 2023-05-10T22:55:39Z

@pytorchbot merge

pytorchmergebot · 2023-05-10T22:57:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-10T22:57:55Z

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel, trunk

Details for Dev Infra team

Raised by workflow job

wanchaol · 2023-05-10T23:37:12Z

@pytorchbot merge

pytorchmergebot · 2023-05-10T23:39:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-10T23:39:18Z

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel, trunk

Details for Dev Infra team

Raised by workflow job

wanchaol · 2023-05-10T23:42:01Z

@pytorchbot rebase

pytorchmergebot · 2023-05-10T23:45:04Z

@pytorchbot successfully started a rebase job. Check the current status here

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. [ghstack-poisoned]

pytorchmergebot · 2023-05-10T23:45:27Z

Successfully rebased gh/wanchaol/306/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/100607)

This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. ghstack-source-id: f8fd083f5929bfcb2a75a97b5fa0ce04066c3d5b Pull Request resolved: #100607

wanchaol · 2023-05-11T02:34:55Z

@pytorchbot merge

pytorchmergebot · 2023-05-11T02:37:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

wanchaol requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, fegin, kiukchung and d4l3k as code owners May 4, 2023 01:22

wanchaol requested a review from XilunWu May 4, 2023 01:22

XilunWu reviewed May 8, 2023

View reviewed changes

wanchaol added the release notes: distributed (dtensor) release notes category label May 10, 2023

wanchaol requested a review from XilunWu May 10, 2023 17:01

wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label May 10, 2023

XilunWu approved these changes May 10, 2023

View reviewed changes

pytorchmergebot added the merging label May 10, 2023

pytorchmergebot removed the merging label May 10, 2023

pytorchmergebot added the merging label May 10, 2023

pytorchmergebot removed the merging label May 10, 2023

pytorchmergebot added the merging label May 10, 2023

pytorchmergebot removed the merging label May 10, 2023

pytorchmergebot added the merging label May 10, 2023

pytorchmergebot removed the merging label May 10, 2023

pytorchmergebot added the merging label May 11, 2023

pytorchmergebot added Merged and removed merging labels May 11, 2023

pytorchmergebot closed this in a1aa32e May 11, 2023

facebook-github-bot deleted the gh/wanchaol/306/head branch June 8, 2023 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor] tensor ops to use strategy based sharding prop #100607

[dtensor] tensor ops to use strategy based sharding prop #100607

wanchaol commented May 4, 2023 •

edited

Loading

pytorch-bot bot commented May 4, 2023 •

edited

Loading

XilunWu left a comment

XilunWu May 8, 2023

wanchaol May 10, 2023

wanchaol commented May 10, 2023

XilunWu left a comment

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 11, 2023

pytorchmergebot commented May 11, 2023

		# for eager execution, inputs only have one possible sharding
		node_to_strategy[node] = OpStrategy([strategy])

[dtensor] tensor ops to use strategy based sharding prop #100607

[dtensor] tensor ops to use strategy based sharding prop #100607

Conversation

wanchaol commented May 4, 2023 • edited Loading

pytorch-bot bot commented May 4, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100607

✅ No Failures

XilunWu left a comment

Choose a reason for hiding this comment

XilunWu May 8, 2023

Choose a reason for hiding this comment

wanchaol May 10, 2023

Choose a reason for hiding this comment

wanchaol commented May 10, 2023

XilunWu left a comment

Choose a reason for hiding this comment

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

Merge started

pytorchmergebot commented May 10, 2023

Merge failed

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

Merge started

pytorchmergebot commented May 10, 2023

Merge failed

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

Merge started

pytorchmergebot commented May 10, 2023

Merge failed

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

Merge started

pytorchmergebot commented May 10, 2023

Merge failed

wanchaol commented May 10, 2023

pytorchmergebot commented May 10, 2023

pytorchmergebot commented May 10, 2023

wanchaol commented May 11, 2023

pytorchmergebot commented May 11, 2023

Merge started

wanchaol commented May 4, 2023 •

edited

Loading

pytorch-bot bot commented May 4, 2023 •

edited

Loading