[TorchDISC] compile disc nodes with a fake cluster algorithm #173

Yancey1989 · 2022-03-17T11:24:38Z

This PR implements the compilationo stage with a fake cluster algorithm.

tanyokwok

Good job!

tanyokwok · 2022-03-18T10:00:54Z

torch_disc/torch_disc/csrc/backend_impl.cpp

+const std::set<int8_t> TSBackendDeviceType::supported_device_types_ = {
+    (int8_t)at::kCPU, (int8_t)at::kCUDA};
+
+class DISCBackendImpl : public torch::lazy::BackendImplInterface {


I think we should discuss this with PyTorch LTC. We could propose our roadmap and our requirements on LTC?

Good point, maybe we should write a detailed design document before communicating with LTC team.

linearhit · 2022-03-21T12:03:51Z

torch_disc/torch_disc/csrc/disc_jit.cpp

+  // 2. conversion from torchscript Graph to mhlo Dialect on DISC nodes
+  // 2. register DISC engine
+  InferShapes(graph, arguments);
+  FusionDiscNodes(graph);


We'd better avoid to use "fusion" here, which can easily be misunderstood

Good point. Actually, this function just CLUSTER instead of FUSION, maybe better to ClusterDiscNodes ?

linearhit · 2022-03-21T12:06:21Z

torch_disc/torch_disc/csrc/disc_jit.cpp

+  // 1. clustering and group DISC nodes
+  // 2. conversion from torchscript Graph to mhlo Dialect on DISC nodes
+  // 2. register DISC engine
+  InferShapes(graph, arguments);


I guess we are involving the lazy arguments for dtype and rank analysis, am i understanding it correct?
Without an explicit "static_rank_backward_analysis" process, it should be not enough for us to guarantee that all the compile-time-constants for static rank to be known at compile time. To be discussed offline.

linearhit

mark for offline discussion.

compile Disc nodes with a fake cluster algorithm.

…#173) compile Disc nodes with a fake cluster algorithm.

compile disc node with a fake cluster algo

b9ff1d4

Yancey1989 changed the title ~~[WIP] compile disc nodes with a fake cluster algorithm~~ [WIP] [TorchDISC] compile disc nodes with a fake cluster algorithm Mar 18, 2022

Yancey1989 added 6 commits March 18, 2022 14:42

fix mhlo conversaion failed

94c2e63

using pre-build pytorch wheel

6f4bdd8

update

6a2115c

cleanup code

30898d5

update

f1f1b0b

update

a8aac34

Yancey1989 changed the title ~~[WIP] [TorchDISC] compile disc nodes with a fake cluster algorithm~~ [TorchDISC] compile disc nodes with a fake cluster algorithm Mar 18, 2022

polish code

41260f2

Yancey1989 requested review from tanyokwok and linearhit March 18, 2022 09:01

tanyokwok approved these changes Mar 18, 2022

View reviewed changes

Yancey1989 merged commit ab2bca8 into alibaba:features/torch_disc_devel Mar 18, 2022

Yancey1989 deleted the compile_disc_nodes branch March 18, 2022 10:37

Yancey1989 mentioned this pull request Mar 21, 2022

[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

Closed

4 tasks

linearhit reviewed Mar 21, 2022

View reviewed changes

linearhit reviewed Mar 22, 2022

View reviewed changes

Yancey1989 added a commit that referenced this pull request Apr 13, 2022

[TorchDISC] compile disc nodes with a fake cluster algorithm (#173)

d662fc3

compile Disc nodes with a fake cluster algorithm.

Yancey1989 added a commit to Yancey1989/BladeDISC that referenced this pull request Apr 18, 2022

[TorchDISC] compile disc nodes with a fake cluster algorithm (alibaba…

c0368ce

…#173) compile Disc nodes with a fake cluster algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchDISC] compile disc nodes with a fake cluster algorithm #173

[TorchDISC] compile disc nodes with a fake cluster algorithm #173

Yancey1989 commented Mar 17, 2022 •

edited

Loading

tanyokwok left a comment

tanyokwok Mar 18, 2022

Yancey1989 Mar 18, 2022

linearhit Mar 21, 2022

Yancey1989 Mar 22, 2022

linearhit Mar 21, 2022 •

edited

Loading

linearhit left a comment

[TorchDISC] compile disc nodes with a fake cluster algorithm #173

[TorchDISC] compile disc nodes with a fake cluster algorithm #173

Conversation

Yancey1989 commented Mar 17, 2022 • edited Loading

tanyokwok left a comment

Choose a reason for hiding this comment

tanyokwok Mar 18, 2022

Choose a reason for hiding this comment

Yancey1989 Mar 18, 2022

Choose a reason for hiding this comment

linearhit Mar 21, 2022

Choose a reason for hiding this comment

Yancey1989 Mar 22, 2022

Choose a reason for hiding this comment

linearhit Mar 21, 2022 • edited Loading

Choose a reason for hiding this comment

linearhit left a comment

Choose a reason for hiding this comment

Yancey1989 commented Mar 17, 2022 •

edited

Loading

linearhit Mar 21, 2022 •

edited

Loading