Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC][Optimization] Run accelerator specific optimizations #6068

Merged
merged 2 commits into from
Jul 16, 2020

Conversation

zhiics
Copy link
Member

@zhiics zhiics commented Jul 15, 2020

This PR enables the invocation of accelerator specific optimization flow to make BYOC consistent to TVM compilation flow. For example, we separate the optimization and codegen. Optimization is now invoked when a function is partitioned, and codegen only focuses on generating the needed runtime module. This is needed by libraries such as ARM compute library in #5915

Note that we now have explicitly registered various type of APIs for the BYOC flow and implicitly invoke them to accomplish the end-to-end flow. It would require users/vendors to change multiple places for registration. @jroesch also has a concern for this. @comaniac Lets think of a way to centralize the APIs and make an RFC.

@masahi @lhutton1 @mbaret @trevor-m

@trevor-m
Copy link
Contributor

trevor-m commented Jul 15, 2020

It looks like this optimization callback is invoked after/during partitioning. What about optimizations that we need to do before annotation?

For example, if the library can only support conv2d with constant weights, we need to call bind_params_by_name and FoldConstant on the graph before annotation, so that the conv2d annotator can check if the conv2d args are constant.

Another example is ConvertLayout to convert to NCHW, and then only allowing NCHW ops in the annotation.

@zhiics
Copy link
Member Author

zhiics commented Jul 15, 2020

Yes, this is only for the passes that need to be executed on the partitioned program. For passes that should be executed before annotation, we should consider it how to pack the general flow of BYOC and optimization. We haven't followed up much on this as well. Maybe we should think about it together in the RFC mentioned above.

@mbaret
Copy link
Contributor

mbaret commented Jul 15, 2020

My initial thought is that putting this into partition_graph looks a bit odd as running optimizations seems outside of the scope of the pass. What's motivated that instead of just having the codegen itself run the optimizations it needs?

@comaniac
Copy link
Contributor

comaniac commented Jul 15, 2020

My initial thought is that putting this into partition_graph looks a bit odd as running optimizations seems outside of the scope of the pass. What's motivated that instead of just having the codegen itself run the optimizations it needs?

The most important reason has been demonstrated in #5915. Since the codegen should return a runtime module without mutating the graph (e.g., run transpose), running optimizations inside the codegen results in inconsistency between the module Relay/TVM processed and the module codegen processed. In the ACL case, it cannot use the unified weight serialization mechanism from MetadataModule but has to deal with by itself. This is not only tedious but also increasing the binary size, because MetadataModule still maintains the original weights that will never be used by ACL runtime.

@zhiics
Copy link
Member Author

zhiics commented Jul 15, 2020

A better implementation should be invoking a target specific optimization after partitioning. This is more to make #5915 clean. The reason is what @comaniac mentioned above. We should think about all these in the RFC.

The flow would be the following:
target required processing passes -> annotation -> merge regions -> partitioning -> target required post processing passes -> relay compilation/codegen and target codegen

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is a temporary solution, until a pass to walk over extern functions and apply optimization is ready.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @zhiics @masahi that this is more like a temporary workaround until we fully integrate BYOC flow to the TVM target and build flow.

@masahi masahi merged commit f9e2b95 into apache:master Jul 16, 2020
@masahi
Copy link
Member

masahi commented Jul 16, 2020

Thanks @zhiics @comaniac @trevor-m @mbaret

@zhiics zhiics deleted the ext_opt branch July 16, 2020 01:23
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
)

* register and invoke optimization pipeline for external codegen

* add unit test
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
)

* register and invoke optimization pipeline for external codegen

* add unit test
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
)

* register and invoke optimization pipeline for external codegen

* add unit test
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
)

* register and invoke optimization pipeline for external codegen

* add unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants