-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] UMA Universal Modular Accelerator Interface #60
Conversation
Update code snippets
Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelJKlaiber for this work!
I have done a round of review.
One thing that is missing is how UMA Partition operators (in the Reference Level explaintation). It would be great to describe what exactly being done using the registered patterns.
One thing I'd like anwsered here what sort of control it will allow on the passes run there : MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph.
For e.g. some backend might not want the compiler regions merged, how would that be controlled ?
Also how would one register post-partitioning passes ?
UMA Partitioner: | ||
* Register relay passes | ||
* Register patterns - supported sub-graph operations | ||
* Order: pre-partitioning passes, Graph partitioning, post-partitioning passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for a new user/developer, it might beneficial explaining where to position a relay pass between "post-partitioning passes" and a "_register_relay_pass" -- which might seem not obvious who dont have deeper understanding of TVM . I think it is mainly because, post-partitioning passes run before OptimizeImpl(...) sequence of passes are run in the core compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the pass registrations (_register_relay_pass
and _register_tir_pass
) we are following the idea of phases, at which the passes are registered def _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass) -> None
. E.g. phase 0 would be pre-partitioning, phase 1 would be post-partitioning but before OptimizeImpl.
I agree, that the phases and their implications need proper documentation to help users decide where to place a pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Should we use well defined enums instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the text here (where we make a decision at the end). I think we should enumerate the following options, and highlight the reasoning of choice :
P1. Int based : _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass)
P2. Enum based : _register_relay_pass(self, phase: tvm.transform.uma.Phase, relay_pass: tvm.transform.Pass)
P3. Seperate registerations :
_register_pre_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_optimization_relay_pass(self, relay_pass: tvm.transform.Pass)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use well defined enums instead ?
Yes, we were talking about this internally as well. Enums are probably the preferred solution. We will update the section with the options and add our reasoning of choice. Thanks for the great input!
mod, params = relay.frontend.from_pytorch(scripted_model, [("input_data", input_shape)]) | ||
|
||
# Register a UMA backend | ||
UltraTrailBackend().register() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only thing the user need to do to register the backend ?
Will it do something to the effect :
TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU)
.set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B"));
.set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only thing the user need to do to register the backend ? Will it do something to the effect :
Yes.
TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU) .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B")); .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);
backend.register
interacts with TVM core in 2 ways.
- It registers the target hooks as you said.
- It registers the target pattern tables using
tvm.relay.op.contrib.register.register_pattern_table
and ensures that both use the same compiler/target_name .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm.. TVM_REGISTER_TARGET_KIND is a C/C++ macro. Do we envision to have some sort of python decorator (as part of this work) to handle the registration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manupa-arm: This part was removed. The C++ code you mention is no longer needed for target registration.
CC: @cgerum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a short addition regarding the implementation. We are using a global function RegisterTarget
in C++, that takes the target name and registers the target together with the target hooks. RegisterTarget
is called during the backend registration UltraTrailBackend().register()
. To hide this process from the user we are not using a decorator, but I think it's a similar approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense to me! It might be worth adding this to the Reference explanation :)
How/Do we deal with target specific options ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it to the Reference explanation (link).
Currently it is not possible to set target specific options. Since the UMA targets are essentially also "c" targets we did not see the need to deal with target specific options. Do you have a use-case in mind, for which this would be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we currently use such options to define accelerator variants that share the same lowering pipeline.
In the absense of that, we would need to resort to use the PassConfig, however, generally PassConfig is generally better suited to set a configuration to a specific Pass. In the above case, it would mean we need to set multiple Passes in the absense.
I would see the newly registered targets as extensions of the "c" target and Im a bit keen on not ending up having to dump a union of UMA target options to "c" target.
Following your proposal, is there a reason why we wont be able to use RegisterTarget ? We could consider including AttrDict to that effect.
Adding to this, there are two variants of these :
- relay.ext.<backend>.options
Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.
- target_kind options
This is what I alluded to in the previous comment.
Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to this, there are two variants of these :
- relay.ext..options
Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.
Yes this not exactly user friendly.
- target_kind options
This is what I alluded to in the previous comment.
Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.
Adding them to the TargetKind would obviously be the preferred solution. As would potentially be a preferred way of doing things. It would allow for a clean target = Target("ethos-u -version=ethosu-256", host=Target("c"))
but this makes the current partitioning Flow somewhat clumsy as we still need to run .partition
for the target hooks. On the other hand it could serve as a starting point for deprecating partition_for
and providing a unified API for Collage and non-collage flows. @mbs-octoml @areusch would that make sense?
So far we had planned to standardize on MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph. To get a better overview I extracted the partitioning flows of existing BYOC targets:
Looking at the existing backends it might make sense to make MergeCompilerRegions optional. We probably do not want to support custom compiler annotations as used in |
@cgerum thanks for detailed analysis! Im wondering whether should we provide an optional partitioning hook as well -- so then it can be anything (i.e. any Sequential) and let the default be a Sequential of MergeComposite, AnnotateTarget, MergeCompilerRegions, ParititionGraph. WDYT ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great proposal! I'm interested in how we can customize pass pipeline for diverse contexts and have a question regarding this. Look forward to learn more about UMA!
Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md
Considering how partitioning is handled in #62 I would probably prefer a more declarative way of specifying different partitioning patterns. @MichaelJKlaiber @PaulPalomeroBernardo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great proposal @MichaelJKlaiber! While reading the RFC I picked up on a couple of small things, feel free to ignore them :)
One overall question I have is whether this proposal is strictly limited to accelerators or whether it could also be used by any back-end that leverages the target hook functionality? For example, it seems possible to register kernel libraries (e.g. CMSIS-NN) using a similar interface?
@lhutton1, I agree in general. The primary focus of UMA at the moment is accelerators. It might make sense to bear library integration in mind. In general, I see that it should be possible. The main difference might be the choice of configuration parameters needed. @lhutton1, if you have suggestion or concrete examples, feel free to share them
|
uma-rfc: update to questions/comments added
Thanks @MichaelJKlaiber, that makes sense. So I was wondering if this is the case, perhaps in the future this interface is used by other backend's (not accelerators) we would need to think about renaming UMA to something more generic e.g. UMB Universal Modular Backend - I'm not the best with names. I'm wondering if this could easily be done in the future? |
@lhutton1, I think for naming there are no limits for creativity, e.g. it could be UMA: Universal Modular bAckend 😄 |
* Add descriptions for all API functions * Clarify backend registration and add target hook explanation * Remove schedules from API and corresponding descriptions
Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md
for A1, what do people think of an ordering spec e.g.
my concern is that whether int or enum, people are really expressing a dependency graph here and while we hope it is not terribly complicated, it's hard to intuit the meaning from an enum/int. for A2, i agree with @manupa-arm 's question, but i think it would be awesome to see the prototype and we could discuss from there. |
So here is my take on A2: In the accelerator-specific backend, a user would register target attribute names e.g.
They can be used during target creation similar to other sub_target strings
This is basically implemented by passing a list of attribute names to the target kind registration
The main downside I see with this, is that all attributes are treated as strings since the type is hardcoded. However, I'm not sure if we can avoid this at all. What do you think? For A1: |
This could probably be solved by adding
For v1 I would prefer to only support default values, and restrict supported dtypes to string, int and bool.
I agree with @PaulPalomeroBernardo on the user perspective. |
Thanks @cgerum and @PaulPalomeroBernardo . I agree, this totally makes sense like this. @manupa-arm @areusch is this sufficiently detailed for you? I propose to discuss outstanding topics in meeting to settle for UMA-v1. We could use the Community Meeting on May 25th. Or if these discussions are too specific for a broader audience, then we can setup a separate meeting. What are your thoughts? |
This aligns with A2.2 -- directly registering each attribute. I think this is fine for UMA-v1 and aligns with state of TVM targets today. Should we just put a note that for future considerations, to include a registration for string preprocessor (A2.1) to extract attributes ?
Again, I think phase approach is fine for v1 as we already have that in the core compiler (which is also int based) but I'd appreciate if we can put a "name" to ease the reasoning in future. Similarly, we could also note as future work to define dependencies on passes -- if and when the TVM core compiler improve its pass infrastructure we could be able to use that information. |
@manupa-arm Then just for clarification a few questions because I might have misunderstood your initial idea. For A2.1 you were thinking about registering an attribute preprocessor to the target So a user would only write |
More or less yes -- maybe we could (re)use "mattr" instead of "uma_attrs" looking at other target kinds -- but in principle that is what I meant. |
ok for A1 i'm good with named phases and we can modify as necessary. i think the A2.2 solution of directly registering target attrs makes sense to me. is that the direction we're aligned on here? we can discuss this next week at the community meeting, or if we're in alignment on these two items, i think all that remains is to update the RFC to reflect the discussion here and we can approve/merge. |
Apologies for not following the conversation in detail in real time. Here are some thoughts on how we can make sure an UMA-integrated accelerator is also a Collage-supported 'backend'.
Thanks, |
* Target registration with support for attribute options * Pass phases as enums
One more collage/uma overlap aspect: Collage distinguishes 'registered' backends (ie just TargetKinds) from 'activated' backends (ie Target objects in the provided build targets). I think though the proposal here is the act of registration is also activation? I need help understanding how this will look from the user's pov in combination with targets. |
Update target registration and add pass phases
Thanks @mbs-octoml for this detailed explanation. Being a Collage-supported backend is definitely something we want to achieve for UMA-integrated backends.
We will add this to the pattern registration.
They are registered in the global pattern table registry during backend registration but can also be accessed directly over the backend object if necessary.
Correct.
I think, this works well. After the backend registration (e.g.,
I agree.
Yes, we were planning to include a pre-partitioning pass phase. Passes within one pass phase should always be executed in order of their registration.
Currently a user needs to explicitly call
As you described this would eagerly partition the graph depending on the call order of
As it is now, they would not fall under the UMA integration API. With UMA we wanted to wrap one specific BYOC integration into an easy-to-use interface and we decided to go with the target hooks via TIR (
There are three steps required to make use of UMA as a user.
|
@manupa-arm @areusch I think, we are aligned on this. We decided to go with the enum-based approach for A1 and use A2.2 for UMA v1. I updated the RFC accordingly (Pass Phases, Target Hooks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (bar @mbs-octoml comments related to Collage).
Two nits to adjust the text but the design looks good to me.
Fix code snippets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I ll let @areusch and @mbs-octoml to cover Collage related concerns here ?
Ah, that's the example I was missing (sorry!). After registration I think calling backend.partition or letting CollagePartition 'do it for you' seems like a free choice, and all we have to do is make sure Collage respects all the existing pass hooks (which, since I'm moving CUTLASS over the TargetHooks it has been forced to do anyway!).
Given above I don't think either of these points is an issue: Collage will pickup both 'low level' and 'UMA-style' integrations without prejudice. There may be a temptation from users to add compiler-configuration into the backend ctor, but it sounds like we agree we'll keep that in the Target object instances, in which case everything blends nicely. So all LGTM from me, thanks for the extra explanation, and if any Collage-introduced friction shows up please just let me know and we can adjust mid-flight. Best, -m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulPalomeroBernardo @MichaelJKlaiber great! I think we are basically aligned here. I've asked for one more clarification given the discussion around use cases, then I think we can merge.
would you guys still like to discuss this on Wednesday? We have a few different topics, so I'm wondering if we may just need a brief 10 minutes to do an overview of the changes here? i think we can merge this RFC as soon as my one comment is addressed
def target_name(self): | ||
return "ultra_trail" | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you guys add a brief example of how to use this once you implement this backend class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an override of the abstract property defined in the base class UMABackend
@property
@abstractmethod
def target_name(self) -> str:
"""Name of the hardware target.
Returns
-------
out : str
The hardware target name.
"""
...
It's primarily used internally (e.g., target kind, target related global function names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @areusch, 10 mins to show the changes are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PaulPalomeroBernardo sorry i meant--can you add an example of how you might call tvm.relay.build() here, just so folks can understand it from user guide perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@areusch Ahh right, so basically move this section up to the guide-level explanation? I think that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly! i think i'm not seeing that change otherwise i'd hit merge. can you ping again when that's done? and then i think we're good here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@areusch, done :)
Final changes for UMA-RFCv1
Move backend usage to guide-level explanation
thanks @MichaelJKlaiber @PaulPalomeroBernardo @cgerum and others! the RFC is now merged. Please open a tracking issue and link it from this thread for discoverability. |
opening PR for UMA pre-RFC