Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. #9483

Merged
merged 1 commit into from
Dec 1, 2021

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Nov 10, 2021

We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:

  • Lowering was choosing definitional GlobalVars which were not pointer-equal to the
    referential GlobalVars left behind in the rewritten Calls. We fixed that in
    te_compiler.cc, though better would be to push GlobalVars deeper into the
    lowering machinery.
  • device_copy was rewritten to a call to @__copy without any definition. We retain
    it as if it were an 'external'.
  • Calls to already-compiled BYOC functions were indistinguishable from calls
    to (non-primitive) Relay functions. We move them into the call_lowered calling
    convention, and leave behind a Function tagged with "ExternalSymbol". Better would
    be a first-class representation for externals in the IRModule but one step at a time.
  • Functions with dynamic shapes tagged for BYOC compilation were not tracking their
    connection to their dynamic shape function. We now use exactly the same attributes
    as for non-BYOC primitives.

In addition to units tests I've confirmed identical VM bytecode for a small test suite
of ONNX models.

@mbs-octoml mbs-octoml force-pushed the mbs-vm-lowerte branch 2 times, most recently from bbf4002 to 6b5534f Compare November 13, 2021 01:49
@mbs-octoml mbs-octoml force-pushed the mbs-vm-lowerte branch 2 times, most recently from eb403cc to 47cae68 Compare November 19, 2021 18:46
@mbs-octoml mbs-octoml changed the title [DRAFT] Scratch pad for switching VM to use the LowerTE pass. [DRAFT][Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. Nov 20, 2021
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 20, 2021
As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 22, 2021
As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 22, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be ealy split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 23, 2021
As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.
tmoreau89 pushed a commit that referenced this pull request Nov 23, 2021
…festAlloc. (#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of #9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 23, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be ealy split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
- Don't attach host targets to the CompilationConfig's 'primitive_targets' array.
  Since Targets and SEScopes are compared by pointer equality, and the same Target
  with and without a host are distinct objects, this was causing unnecessary copies
  in code which is already dealing with the explicit host_target or host_se_scope
  anyway. I've left the hosts in the legacy_target_map. (The sooner we sort out
  multi-target compilation and hosts the better!)
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 23, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be ealy split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
Mousius pushed a commit that referenced this pull request Nov 24, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main #9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in #9483.
@mbs-octoml mbs-octoml changed the title [DRAFT][Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. [Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. Nov 24, 2021
@mbs-octoml mbs-octoml marked this pull request as ready for review November 24, 2021 23:07
Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but otherwise LGTM, can see the final bits and bobs coming together into a mostly unified compilation flow.

Copy link
Contributor

@electriclilies electriclilies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, but I'm less than half way through! Overall a few clarifying questions and nits.
I will continue tomorrow.

include/tvm/relay/function.h Outdated Show resolved Hide resolved
src/relay/analysis/call_graph.cc Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/graph_executor_codegen.cc Show resolved Hide resolved
src/relay/backend/te_compiler.cc Show resolved Hide resolved
src/relay/backend/te_compiler.cc Show resolved Hide resolved
src/relay/backend/te_compiler.cc Show resolved Hide resolved
src/relay/backend/te_compiler.cc Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor Author

Thanks @electriclilies appreciate your review. Is there any chance of finishing today so I can kick off a ci run overnight? Keen to get this off my plate given the # changes,

Copy link
Contributor

@electriclilies electriclilies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I found some typos and left some clarifying questions. I'm glad to see some of the shape func machinery made more explicit! It would be fine to address these in a follow up PR since they are pretty minor

src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/relay/backend/te_compiler_cache.cc Outdated Show resolved Hide resolved
src/relay/backend/vm/compiler.cc Outdated Show resolved Hide resolved
src/relay/backend/vm/compiler.cc Show resolved Hide resolved
src/relay/op/memory/device_copy.cc Show resolved Hide resolved
src/relay/transforms/device_planner.cc Show resolved Hide resolved
src/relay/transforms/type_infer.cc Outdated Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor Author

Thanks for slogging through @jroesch and @electriclilies. The next few should be more manageable. Or at least that's what I tell myself.

We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
@jroesch jroesch merged commit 3047709 into apache:main Dec 1, 2021
masahi pushed a commit to masahi/tvm that referenced this pull request Dec 1, 2021
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…festAlloc. (apache#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…festAlloc. (apache#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
@mbs-octoml mbs-octoml deleted the mbs-vm-lowerte branch December 1, 2021 17:13
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…festAlloc. (apache#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
…festAlloc. (apache#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…festAlloc. (apache#9542)

* Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc.

As part of apache#9483 we need to prepare some critical Relay passes for running after
lowering and conversion to DPS. For DCE we need to make sure we never remove
side-effecting let-bound expressions, such as for allocation or evaluation of
an external function with unknown effectfulness.

Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting
for functions by tracking both 'eval' and 'call' purity, but must fallback to
assuming call-impurity in more difficult cases (eg calling a function passed as
a parameter, calling a function projected from a tuple, etc). However it seems
plenty good enough.

Purity must also be accounted for when determining the usage count of let-bound
variables, so reworked that. Collapsed the let-bound value accumulation pass into
the usage counting pass to make up for inserting the new purity analysis pass.

A few tests assume DCE eliminates dead reference writes. The previous
implementation certainly did that, but by eliminating *all* writes.

Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded
analysis would probably do just fine and we don't need to get hung up on alias
analysis). In the meantime, added an 'ignore_impurity' flag (default False)
and set to true just in the few unit tests which rely on the unsound impl.

* [checkpoint] Merge Lily's suggestions.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
This is a grab bag of fallout changes from switching the VM to use LoweTEPass
which can be easily split out of the main apache#9483 PR.

- AnnotateSpans can be used from C++ (though, unfortunately, it didn't help
  me with debugging since spans are universally dropped in most passes).
- Can get a human readable dump of the VM's PackedFunc names and indexes for
  debugging.
- If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had
  a lot of difficulty tracking down where duplicate GlobalVars for the same
  name_hint were getting created and propagated.
- GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice
  where will return 'null' properties if call/expr is not of call_lowered
  form. Mildly more convenient, though switching all the above to ICHECK
  and push 'if (op == the relevant op)' into all use sites would also be just
  fine.
- Misc VLOG improvements made while tracking down issues in apache#9483.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.

We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.

The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.

We worked our way through the following glitches:
 - Dynamic shape functions are now given their true type (rather than the type of
   the primitive function they are paired with).
 - Lowering was choosing definitional GlobalVars which were not pointer-equal to the
   referential GlobalVars left behind in the rewritten Calls. We fixed that in
   te_compiler.cc, though better would be to push GlobalVars deeper into the
   lowering machinery.
 - device_copy was rewritten to a call to @__copy without any definition. Though we
   tried adding it as a global this (obviously in retrospect...) won't typecheck if
   there are multiple device_copies in the program. Instead leave device_copy unchanged
   during lowering and update each executor codegen to look for them specially.
 - Calls to already-compiled BYOC functions were indistinguishable from calls
   to (non-primitive) Relay functions. We move them into the call_lowered calling
   convention, and leave behind a Function tagged with "ExternalSymbol". Better would
   be a first-class representatn for externals in the IRModule but one step at a time.
 - Functions with dynamic shapes tagged for BYOC compilation were not tracking their
   connection to their dynamic shape function. We now use exactly the same attributes
   as for non-BYOC primitives.
 - VerilatorRuntime can legitimately be deleted before initialized.
 - IRModule attributes must be preserved. In particular, since LowerTEPass can
   be invoked more than once we need to be careful to preserve any existing external
   modules and other attributes gatherd from an earlier LowerTEPass.
 - GetUniqueName accounts for existing definitions in the module, but is not used
   for external functions since their intended names are communicated to the codegen
   toolchain via the already fixed "global_symbol" attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants