-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC][Optimization] Run accelerator specific optimizations #6068
Conversation
It looks like this optimization callback is invoked after/during partitioning. What about optimizations that we need to do before annotation? For example, if the library can only support conv2d with constant weights, we need to call bind_params_by_name and FoldConstant on the graph before annotation, so that the conv2d annotator can check if the conv2d args are constant. Another example is ConvertLayout to convert to NCHW, and then only allowing NCHW ops in the annotation. |
Yes, this is only for the passes that need to be executed on the partitioned program. For passes that should be executed before annotation, we should consider it how to pack the general flow of BYOC and optimization. We haven't followed up much on this as well. Maybe we should think about it together in the RFC mentioned above. |
My initial thought is that putting this into partition_graph looks a bit odd as running optimizations seems outside of the scope of the pass. What's motivated that instead of just having the codegen itself run the optimizations it needs? |
The most important reason has been demonstrated in #5915. Since the codegen should return a runtime module without mutating the graph (e.g., run transpose), running optimizations inside the codegen results in inconsistency between the module Relay/TVM processed and the module codegen processed. In the ACL case, it cannot use the unified weight serialization mechanism from MetadataModule but has to deal with by itself. This is not only tedious but also increasing the binary size, because MetadataModule still maintains the original weights that will never be used by ACL runtime. |
A better implementation should be invoking a The flow would be the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is a temporary solution, until a pass to walk over extern functions and apply optimization is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR enables the invocation of accelerator specific optimization flow to make BYOC consistent to TVM compilation flow. For example, we separate the optimization and codegen. Optimization is now invoked when a function is partitioned, and codegen only focuses on generating the needed runtime module. This is needed by libraries such as ARM compute library in #5915
Note that we now have explicitly registered various type of APIs for the BYOC flow and implicitly invoke them to accomplish the end-to-end flow. It would require users/vendors to change multiple places for registration. @jroesch also has a concern for this. @comaniac Lets think of a way to centralize the APIs and make an RFC.
@masahi @lhutton1 @mbaret @trevor-m