[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

Yancey1989 · 2022-03-10T07:57:10Z

Background

BladeDISC is an end-to-end compiler that supports dynamic shape features, and dynamic shape is widely used on the training scene, this issue descript how to improve PyTorch training performance with DISC based on the LazyTensorCore(LTC) mechanism.

feature branch: https://github.com/alibaba/BladeDISC/tree/features/torch_disc_devel

Design Overview

According to LTC, a MARK API should be called manually at the end of each iteration to sync and execute a Graph on a physical device.
Lowering To TorchScript, LTC uses TorchScript as the backend engine, ref TSBackendImpl, we can use it lower Lazy IR to TorchScript IR.
Cluster DISC SubGraph,
DISC Compilation Stage
a. mhlo conversation, DISC uses MLIR::mhloas the front-end, we should convert TorchScript IR to mhlo before compilation.
b. compiling to an executable program, call DISC entry function to compile mhlo IR to an executable file (a dynamic library file).
c. disc execution, call DISC RAL to execute the executable program with input Tensors.
TorchScript Execution, finally call torch::jit::GraphExecutorto execute the TorchScript IR and return the result Tensors.

Implement and TODO Actions

To implement the above features, we should build a Pybind library _torch_disc.so to expose step_mark API with some important C++ functions, the TODO actions as the following:

setup a building environment to build _torch_disc.so with Torch LTC, Mhlo Builder, and DISC. [TorchDISC] setup torch-disc building environment and CI job #158
cluster DISC nodes into sub-graph (maybe implement cluster algorithms with a fake function). [TorchDISC] compile disc nodes with a fake cluster algorithm #173
compilation DISC sub-graph and register the DISC engine [TorchDISC]register Disc CustomClassHolder to bind RAL #188
demonstration TorchDISC with MNIST training. cache disc executable to finish a MNIST model training #207 [bugfix] make Mnist training converge to baseline #230

Reference

PyTorch LazyTensor branch: https://github.com/pytorch/pytorch/tree/lazy_tensor_staging/lazy_tensor_core
PyTorch/XLA backend example: https://github.com/pytorch/xla/tree/asuhan/xla_ltc_plugin

The text was updated successfully, but these errors were encountered:

tanyokwok · 2022-03-10T08:08:47Z

library _torch_disc.so to expose step_mark API

What difference between our disc.step_mark() and PyTorch LTC's lazy_tensor_core.step_mark(). I think eventually we can reuse the API from PyTorch LTC?

tanyokwok · 2022-03-10T08:11:52Z

Perhaps we could add mnist and bert as the driven cases into our tasks?

Yancey1989 · 2022-03-10T08:29:16Z

What difference between our disc.step_mark() and PyTorch LTC's lazy_tensor_core.step_mark(). I think eventually we can reuse the API from PyTorch LTC?

Not exactly the same, disc.step_mark() should include the additional MHLO conversion and clustering stage.

Perhaps we could add mnist and bert as the driven cases into our tasks?

That's a great idea, will add a task in the TODO actions.

Yancey1989 · 2022-04-27T01:56:39Z

I will close this PR, all the PoC tasks have been done and we merged the feature branch to master, will keep tracing this feature with project .

Please feel free to check the design doc about this PoC: #236 .

Yancey1989 assigned Yancey1989, tanyokwok and linearhit Mar 10, 2022

Yancey1989 changed the title ~~[PoC] TorchDISC to improve training workload~~ [PoC] TorchDISC to improve PyTorch training workload Mar 10, 2022

Yancey1989 added feature training labels Mar 11, 2022

Yancey1989 changed the title ~~[PoC] TorchDISC to improve PyTorch training workload~~ [PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC Apr 6, 2022

Yancey1989 mentioned this issue Apr 11, 2022

Design Doc: Accelerate PyTorch training via BladeDISC and PyTorch LTC #236

Closed

Yancey1989 closed this as completed Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

Yancey1989 commented Mar 10, 2022 •

edited

Loading

tanyokwok commented Mar 10, 2022 •

edited

Loading

tanyokwok commented Mar 10, 2022 •

edited

Loading

Yancey1989 commented Mar 10, 2022

Yancey1989 commented Apr 27, 2022 •

edited

Loading

[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

[PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC #156

Comments

Yancey1989 commented Mar 10, 2022 • edited Loading

Background

Design Overview

Implement and TODO Actions

Reference

tanyokwok commented Mar 10, 2022 • edited Loading

tanyokwok commented Mar 10, 2022 • edited Loading

Yancey1989 commented Mar 10, 2022

Yancey1989 commented Apr 27, 2022 • edited Loading

Yancey1989 commented Mar 10, 2022 •

edited

Loading

tanyokwok commented Mar 10, 2022 •

edited

Loading

tanyokwok commented Mar 10, 2022 •

edited

Loading

Yancey1989 commented Apr 27, 2022 •

edited

Loading