-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. #9483
Conversation
bbf4002
to
6b5534f
Compare
eb403cc
to
47cae68
Compare
As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass.
As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be ealy split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl.
…festAlloc. (#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of #9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be ealy split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483. - Don't attach host targets to the CompilationConfig's 'primitive_targets' array. Since Targets and SEScopes are compared by pointer equality, and the same Target with and without a host are distinct objects, this was causing unnecessary copies in code which is already dealing with the explicit host_target or host_se_scope anyway. I've left the hosts in the legacy_target_map. (The sooner we sort out multi-target compilation and hosts the better!)
47cae68
to
fa4eaeb
Compare
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be ealy split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main #9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in #9483.
fa4eaeb
to
9fa526e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but otherwise LGTM, can see the final bits and bobs coming together into a mostly unified compilation flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, but I'm less than half way through! Overall a few clarifying questions and nits.
I will continue tomorrow.
Thanks @electriclilies appreciate your review. Is there any chance of finishing today so I can kick off a ci run overnight? Keen to get this off my plate given the # changes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I found some typos and left some clarifying questions. I'm glad to see some of the shape func machinery made more explicit! It would be fine to address these in a follow up PR since they are pretty minor
Thanks for slogging through @jroesch and @electriclilies. The next few should be more manageable. Or at least that's what I tell myself. |
7b031a1
to
9484ec5
Compare
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
9484ec5
to
4fabfa7
Compare
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
…festAlloc. (apache#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
…festAlloc. (apache#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
…festAlloc. (apache#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
…festAlloc. (apache#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
…festAlloc. (apache#9542) * Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. As part of apache#9483 we need to prepare some critical Relay passes for running after lowering and conversion to DPS. For DCE we need to make sure we never remove side-effecting let-bound expressions, such as for allocation or evaluation of an external function with unknown effectfulness. Introduce a new purity pre-pass. It makes a half-hearted attempt at accounting for functions by tracking both 'eval' and 'call' purity, but must fallback to assuming call-impurity in more difficult cases (eg calling a function passed as a parameter, calling a function projected from a tuple, etc). However it seems plenty good enough. Purity must also be accounted for when determining the usage count of let-bound variables, so reworked that. Collapsed the let-bound value accumulation pass into the usage counting pass to make up for inserting the new purity analysis pass. A few tests assume DCE eliminates dead reference writes. The previous implementation certainly did that, but by eliminating *all* writes. Filed CORE-118 to extend DCE to soundly elim dead writes (a very simple-minded analysis would probably do just fine and we don't need to get hung up on alias analysis). In the meantime, added an 'ignore_impurity' flag (default False) and set to true just in the few unit tests which rely on the unsound impl. * [checkpoint] Merge Lily's suggestions.
This is a grab bag of fallout changes from switching the VM to use LoweTEPass which can be easily split out of the main apache#9483 PR. - AnnotateSpans can be used from C++ (though, unfortunately, it didn't help me with debugging since spans are universally dropped in most passes). - Can get a human readable dump of the VM's PackedFunc names and indexes for debugging. - If TVM_LOG_DEBUG defined then include types and ids of GlobalVars. I had a lot of difficulty tracking down where duplicate GlobalVars for the same name_hint were getting created and propagated. - GetCallLoweredProps follows same API as GetDeviceCopy and GetOnDevice where will return 'null' properties if call/expr is not of call_lowered form. Mildly more convenient, though switching all the above to ICHECK and push 'if (op == the relevant op)' into all use sites would also be just fine. - Misc VLOG improvements made while tracking down issues in apache#9483.
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's compiler.cc with LowerTEPass. This clears the way for performing post-lowering IRModule->IRModule transformations which combine Relay and TIR analysis. In particular, it will allow us to use the PlanDevices pass to propagate memory scope constraints across PrimFuncs. We run LowerTEPass fairly early in the pipeline, which required quite a few passes to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after rather than before lowering, and so must now work in a mixed Function/PrimFunc world. The "vm.shape_func" operator has been removed since a) lowering has already generated the necessary dynamic shape function, and b) the call to that function can be represented by an 'ordinary' vm.invoke_tvm_op call. We worked our way through the following glitches: - Dynamic shape functions are now given their true type (rather than the type of the primitive function they are paired with). - Lowering was choosing definitional GlobalVars which were not pointer-equal to the referential GlobalVars left behind in the rewritten Calls. We fixed that in te_compiler.cc, though better would be to push GlobalVars deeper into the lowering machinery. - device_copy was rewritten to a call to @__copy without any definition. Though we tried adding it as a global this (obviously in retrospect...) won't typecheck if there are multiple device_copies in the program. Instead leave device_copy unchanged during lowering and update each executor codegen to look for them specially. - Calls to already-compiled BYOC functions were indistinguishable from calls to (non-primitive) Relay functions. We move them into the call_lowered calling convention, and leave behind a Function tagged with "ExternalSymbol". Better would be a first-class representatn for externals in the IRModule but one step at a time. - Functions with dynamic shapes tagged for BYOC compilation were not tracking their connection to their dynamic shape function. We now use exactly the same attributes as for non-BYOC primitives. - VerilatorRuntime can legitimately be deleted before initialized. - IRModule attributes must be preserved. In particular, since LowerTEPass can be invoked more than once we need to be careful to preserve any existing external modules and other attributes gatherd from an earlier LowerTEPass. - GetUniqueName accounts for existing definitions in the module, but is not used for external functions since their intended names are communicated to the codegen toolchain via the already fixed "global_symbol" attribute.
We replace use of the TECompiler::{Lower,LowerShapeFunc} methods from the VM's
compiler.cc with LowerTEPass. This clears the way for performing post-lowering
IRModule->IRModule transformations which combine Relay and TIR analysis. In particular,
it will allow us to use the PlanDevices pass to propagate memory scope constraints
across PrimFuncs.
We run LowerTEPass fairly early in the pipeline, which required quite a few passes
to become 'post-lowering friendly'. In particular, ManifestAlloc is now run after
rather than before lowering, and so must now work in a mixed Function/PrimFunc world.
The "vm.shape_func" operator has been removed since a) lowering has already generated
the necessary dynamic shape function, and b) the call to that function can be
represented by an 'ordinary' vm.invoke_tvm_op call.
We worked our way through the following glitches:
referential GlobalVars left behind in the rewritten Calls. We fixed that in
te_compiler.cc, though better would be to push GlobalVars deeper into the
lowering machinery.
it as if it were an 'external'.
to (non-primitive) Relay functions. We move them into the call_lowered calling
convention, and leave behind a Function tagged with "ExternalSymbol". Better would
be a first-class representation for externals in the IRModule but one step at a time.
connection to their dynamic shape function. We now use exactly the same attributes
as for non-BYOC primitives.
In addition to units tests I've confirmed identical VM bytecode for a small test suite
of ONNX models.